Episode 1740 mins1/23/2025

Mobile Matters: The Untapped Potential of Mobile Observability

host
Mirko Novakovic
Mirko Novakovic
guest
Eric Futoran
Eric Futoran
#17 - Mobile Matters: The Untapped Potential of Mobile Observability with Eric Futoran

About this Episode

Embrace CEO Eric Futoran chats with Dash0’s Mirko Novakovic about why mobile is often overlooked in observability, how their platform exceeds crash detection, and the challenges of processing dense real-user monitoring data.

Transcription

[00:00:00] Chapter 1: Introduction and Guest Background

Mirko Novakovic: Hello everybody. My name is Mirko Novakovic. I am co-founder and CEO of Dash0. And welcome to Code RED code because we are talking about code and Rad stands for request Errors and Duration the Core Metrics of Observability. On this podcast, you will hear from leaders around our industry about what they are building, what's next in observability, and what you can do today to avoid your next outage. Today my guest is Eric Futoran. Eric is the CEO and co-founder of embrace, a mobile observability tool built on open telemetry and hyper focus on the client side of observability. Eric previously co-founded Scopely, the largest US mobile game publisher, which later sold for $5 billion. I'm glad to have you here, Eric. Welcome to Code RED.

Eric Futoran: No, thanks for having me. I really appreciate it.

[00:00:53] Chapter 2: Code RED Moments and Observability Challenges

Mirko Novakovic: Great. And we always start with our first question, which is what was your biggest Code RED moment.

Eric Futoran: I was thinking about this one. There's quite a few, and founders don't necessarily like to talk about them, but I feel like founders always like to talk about the rosy moment. So I love the question early in embraces lifespan we worked with a large fantasy sports app. And so if you think about what every especially observability companies but data companies want, like their ideal customers and their ideal growth path is kind of a nice line. And on a day to day basis, when you take in data, you don't want spikes. It's just not what you build for. But if you think about fantasy sports and my I give huge kudos to my co-founder Frederik. Fantasy sports. Their biggest days are Sundays and they spike heavily. The biggest day is actually Thanksgiving, which is the last day you'd ever want to spike. But think about the spikes. Everybody logs in, like at halftime, right before the game. And then they log off. So these huge spikes, we'd have like 30 to 100 x traffic and only a have certain types, usually a lot of ads. So a lot of network API traffic where as an observability company collecting everything crashes, errors, logs, the traditional open telemetry metrics, but also like nontraditional enrichment type metrics like or like data types metrics is the wrong word, like network data, which is usually in both a span and a metric form.

Eric Futoran: And then we put it all in this one big session payload like the core of embraces every session for every user, because in mobile everything is unique and very cardinal. So ultimately, if you really want to solve a problem, you want to look up the user and just see what they'll happen and we'll get to that later. So but that's crazy for these crazy network traffic spikes. And then our back end was 2019 ish. We were using Clickhouse. We still do. But we were if you think about it, that's bleeding edge. Clickhouse. Even before the big company Clickhouse really existed. And so the Code RED was basically the first Thanksgiving, everybody was on deck. Like, once we hit that network spike and really weren't ready for it as much as we'd like to be, we weren't. We were ready for just adding a new customer, even at volume. Sharding your database, like, preparing for it when you spike and you get these payloads, like, they blitzed us. And so we were large, large session payloads coming in.

Eric Futoran: The latency was hitting everything. Like our crash pages were slower, our user lookup was slower because everything was loading the database slower. And so there we could either do like smaller chunks which load faster but are really complex. We could do bigger chunks, which is what most companies do. But you get latency, but it's more efficient and easier to support, blah, blah, blah. It sucks. And so the to finish it off the way we solved it is not necessarily more databases, which is where more most people kind of land which obviously is inefficient or costly, kind of against the OpenTelemetry stack. We just did a bunch of stuff like obviously decoupled the session payloads. So sessions every user has multiple sessions. Breaking those up into their constituent parts. We decoupled a lot of the lures. We did a bunch of more sharding, like a bunch of crap. It never really solved the problem because you end up, like most six out of seven days a week, right? You have normal loads. And then this one day you don't. So it's just not an ideal customer, even though they're still with us today. But it was fun. Air quotes. Fun.

[00:04:30] Chapter 3: The Importance of Mobile Observability

Mirko Novakovic: Yeah, absolutely. That's a great story. And yeah let's talk about mobile observability overall. Right. I think I was thinking about it, I first used it, there was a company called criticism. Yes. Which then turned into Apteligen. I was a reseller actually in Europe. One of their early they got acquired by VMware and that's probably I don't know. Let me, let me.

Eric Futoran: Go a long time.

Mirko Novakovic: Ago, ten, ten, 12 years ago. And that was kind of the first time I saw mobile observability. But when I, when I was preparing for this call, I was thinking, everybody's talking about mobile first, right? So most of the apps these days, if I use my bank or whatever, I mostly use it on my mobile phone with a dedicated app. But for most observability players, including Dash0, but also most of the others, it's barely a core use case to do mobile observability. So it is so important. So do you know why that is? I don't really know, but it is really like that, right. For probably every vendor has something there, but it's more or less a side project. So you embrace is really focused on that topic. Right. To help developers of mobile apps figure out their problems. There are a couple.

Eric Futoran: Questions in there if you think about that intelligent criticism. Crashlytics time period. That was super early.

Mirko Novakovic: That was pretty right. It's a Google free tool right?

Eric Futoran: Crashlytics or Firebase acquired Crashlytics from Twitter which acquired Crashlytics which is crazy. And it's doing okay as part of Firebase. You get what you get for free. But it's a good product for crashes. That was just too early. Like if you think about when most game companies and most big mobile companies the original like first wave to say per se, it was like 2012 13 that all you cared about was just, can I get an app out there? Because for every person that had a problem, like a crash, there were ten other people buying phones and downloading apps. So you didn't care about optimization. And I don't even think of them as observability companies. I mean, the word wasn't even coined until like 2015-16, i feel like the. So okay, second question. Why do the other players not think we use it or have like a mobile play? There are legacy ones like New Relic, Dynatrace, and Datadog primarily. I'd say they're run products built for the back end. So if you look at them and you go in, it's a DevOps, their primary personas, DevOps and back end engineers. If you go in, you see that, like you see the traditional geo map of where your problems are and individual clients.

Eric Futoran: That engineer doesn't do anything about where it exists. They care about the problem. And how acute is it? And maybe it's a certain set of users and how to solve it. There's a bunch of error logging products. The sentries, the bug snags, the roll bars, but it's error logging. So they're crash is basically a form of error, but they're known issues. It's not really RAM. RAM and observability in the observability world is collecting every session all the data, not sampling it, not removing the cardinality to make it aggregates, which is what back end tools do. So you can actually measure and optimize effectively and then prioritize based on impact. And I don't think a lot of tools do that. So why aren't they focused on it? To be honest, I think in today's world, I mean, it's part of the reason Dash0 is there that there are kind of two opportunities in the stack, like the back end is totally getting disrupted, and OpenTelemetry is definitely helping that, right? It's standardizing the tool sets in some ways so that you can have new players like yourselves and Grafana and Chronosphere, that can go disrupt the older legacy players that aren't that say they're doing OpenTelemetry, but they're really not.

Eric Futoran: They just do it as a marketing FUD because they have to, otherwise their brand diminishes. I'm happy to say that because it's the truth and there are closed ecosystems, which we'll probably talk about. Or it's the mobile client side. Like in OpenTelemetry. The goal is to standardize the data, right? So all the teams are talking the same language. And so for us, the that play is how do we standardize with these other players like us and Grafana us and Chronosphere us and whatever snowflake even, and provide that same data layer and standardization. So why aren't they focused on it? I think you can't focus on everything. Yeah, you need it because you're right. The biggest companies today are mobile first like the I mean, we have every hotel chain Marriott, Hilton, Hyatt. It's while you don't think of them that way how do you interact with them? It's on your phone or you go to a kiosk when you're inside. It's an Android device. You go to a terminal by talking to somebody at the front desk, often an Android device, like it's all mobile now. So you need a mobile story. And obviously that's our bet.

[00:09:27] Chapter 4: Mobile App Data Collection and Analysis

Mirko Novakovic: Let's talk how it works. Right. And I'm not a professional here, but as far as I know, you have kind of an SDK, right? There's not really something like an agent because you can't really deploy an agent on every phone. So it's basically the developer of the app uses your kind of SDK during build or compile time, and it's integrated into the app. So if you put the app into the App Store, basically the embrace SDK is inside and then the data is collected basically as part of the app that's deployed, and then the data is sent to your back end. Correct?

Eric Futoran: Yeah, I love that you said agent for the sophisticated folks in the audience, if you see somebody use an agent methodology mobile run, the idea of collecting the data is correct. But the process of putting an agent just because Android's Java doesn't mean you put a Java based agent there. You'll blow up the user's experience and not collect the right data. Right back end Java is very different than client side Java and Kotlin really at this point. But yes. So we were one of the primary approver approvers and maintainers both on the Swift and the Android. But we forked both of them because it's not good enough in mobile and client side in general just to do logs, metrics, traces. There's tons of other data and other areas you have to enrich that data, which makes it nuanced. So you use our SDK, it's Kotlin, Android, Kotlin or Java, iOS, which is generally swift. There's React Native, there's a unity, there's a flutter, there's a primary native I'd call languages for mobile. And you use an SDK and then you use this in compile time. I'd say 90% of the data is automatically collected because it's really hard to instRAMent. Another reason to go to a mobile first company. InstRAMentation sucks in mobile really sucks. It sucks on web two. It's just not a lot of standardization of components and ways of collecting the data. Even like thinking about startup, every app has a different kind of where they think that startup ends. Yeah, and then it compiles. And then you deploy this into production, which is where we're best.

Mirko Novakovic: And then you get different types of data right on the app. So we started the conversation with Crashlytics and criticism, which were at the beginning all Focus on crash reports, right? I mean, we all know it, right? You download an app, you open it, it crashes, or it has something. And you wanted to know why. And on which iOS device or Android device? In which cell phone and what. Whatever. Right. But that's kind of the basic thing a developer probably wants, right? From the apps to know when it's crashing and that you provide that. That's the.

[00:12:01] Chapter 5: Overcoming Offline and Network Challenges in Mobile Apps

Eric Futoran: Step one. I'd say in the evolution of mobile like crashes is because if it crashes, it's an obvious thing that you can measure and solve. To be honest, like a good app, crash rates like 99.99 or 98 or 95 crash free. So it's a problem you should track. And we obviously do it and we do it, which is a little different. We both do it at the native layer. But if you're using like a React Native, a flutter or unity, we do both because they're layered and can happen in either layer or can happen in any layer. And then throw something in the native layer. And it's good to have all the information. The evolution then became like, okay, I've solved the majority of my crashes. How do I solve the hard ones as opposed to the ones where the line of code is kind of obvious? Then you need the information that led up to it, which is part of the origin of embrace. And then it's maybe it was a network call or an error earlier in the session that was handled, but still created a state that caused a crash later. It's hard to couple those things in an error. Logging tools there tend to be different errors that aren't fully coupled, so you don't really get to see it. Or traditional RAM solutions. They're totally separated. The next is okay, what are the other types of errors that are actually more impacting for my users that I should be tracking, or should be building DevOps SLOs around for like a scrolling, high scale scrolling app, like a Pinterest. It could be like stutters or where people can't scroll as quick because they know it impacts revenue. Or for games it could be A&R, which are like basically the equivalent of a freeze. So really hard to solve. And the Google Store only detects when they end, not when they start. And when they start is when the line of code is and so it's like, how do you connect the dots to things beyond crashes? The long story short.

Mirko Novakovic: There are basically two parts, right? One is the app on the phone running on the phone. And then there's communication from the phone probably to a back end. Right. I mean, there are offline apps, but there are also apps that do not work offline. So they have a connection. They send data back and forth. Can you work with those setups? So can you support apps that are offline. And you can basically I call it profile. Then on the phone or only the ones that do, do have connection with the back end. We do.

Eric Futoran: Both. And majority of apps now aren't don't have to be connected. They always aLLMost all of them have some sort of caching or some way to at least work temporarily offline. It's a really hard problem. If you think about it, the assumptions of the back end are different. In that case. And so actually when we and we're highlights is like when we sand the data to like a back end partner. The data can be really late. Like you could have a session, quote unquote session that's offline or, or maybe the person leaves and you don't get a chance to even send the data even though it's online for like a day or days. Most tools drop those on the floor because their back end processes expect everything to be synchronous and timely. And we have to figure out, okay, how do we kind of if you think about the we have to think about how to reorder it so that we can actually send the data effectively or even display it right in our dash. And for the technical people out there, think about the problem like it's even how you store the data, how you think about it, how you shared it. It becomes really, really, really difficult. It's been a it's actually a core like IP for us in a lot of ways.

[00:15:24] Chapter 6: Embrace's Use of OpenTelemetry and Industry Trends

Mirko Novakovic: Yeah. It's interesting. We have similar issues like we in the back end part when we have long running traces or asynchronous traces, right? You never know. You have a trace, but you actually don't know if it's finished, right? Because a span could come very late minutes. I mean, in our terms, sometimes 10s is late, right? Because everything is real time. But I can imagine with mobile phones they could be offline on a long flight or whatever, or you have no battery, and then they come back three days later and you send the data and then you still have to connect it, right? Totally.

Eric Futoran: And I think there's some assumptions on the back end that your span or trace will finish. Yes. Because it generally will like it's. But in mobile they won't. I'm totally with you. Like what if somebody just leaves mid startup which you would have a span around your startup then what is that bad. Is that good? Like, is that a natural thing? Did it crash? Did it have like a network call or third party vendor that froze the startup? Who knows? But yeah, we can't have that assumption we actually displayed in like. One of the things we analyze which is different. Yeah, we look at those things like how we actually let you filter by spans that haven't finished. As opposed to waiting for it and then sending it back to our back end.

Mirko Novakovic: And the RAM part in OpenTelemetry is still very early. And it's more focused on web, right? But you still have chosen to use OpenTelemetry as your data format, right? So you're basically using the logs, events, the traces to package and send the data. So why have you chosen OpenTelemetry if it was not really set up for the mobile world? I would say.

Eric Futoran: I wouldn't say even the web is quite there for RAM, right? Yeah, not to bash the folks there, but it's, it's definitely nascent as from a maturity perspective. And they don't fully even either Android doesn't necessarily support fully support all three types. And then there's events and then there's concept of sessions and other types that OpenTelemetry is talking about. Why do we choose OpenTelemetry? I think the primary value for us is connecting the teams. So if DevOps has like we have a large. To make it real like a large grocery chain in the UK that uses us. And the DevOps team wants is getting pressure based on user revenue, user impact, things that are client side measurements, but they're getting pressure to measure them, which turn into generally SLOs and KPIs. How do you measure them without a telemetry source that's correlated to your back end? And then how do you create that connectivity? Because it's one thing to measure. It's another thing to optimize and solve it. And that means you have to talk to the client side team and have a workflow. And to be honest, most client side teams are pretty isolated from the DevOps teams. Devops team obviously helps with certain things like builds and regressions and things like that so that they really should be talking like they do for other like back end edge teams. And so to be highly optimized, that's the flow you're looking for. I also am super keen on open. I kind of mentioned it before, but most vendors in the space, they take your data and they either give it back to you in a dashboard or very aggregated form.

Eric Futoran: That is not how it's going to work going forward. Like if you want to run your own analysis, if you want to run AI workflows, like if you want to actually people are using us for risk management. They want user index data just so they can run and LLLMs to see if there's a risk, like on Snowflake or Databricks, you should have access to your own data. There's too many good use cases. Opentelemetry gives us like the baseline to be able to solve those for folks, and an easy way to get to at least get our SDKs adopted, which you don't need to use those to use our SDKs. They're open source. Send the data to yourselves. Send the data to wherever you want. We just what we provide is the extra value, the enrichment, the extra analysis for measure and optimize the ability to send like API type data, data like metrics, logs, traces to your back end in a format that helps you do SLOs like OpenTelemetry fits really nicely in that and to be honest, like, it's also just nice to be part of that community. Like it's hard to be open source, but it's actually way more fun than not being open source, but truly open source. Not just dabbling, like putting yourself out there and letting people just rip into your SDKs is pretty insightful and kind of fun.

Mirko Novakovic: Yeah, I totally agree. And so do you. Allow I like the idea of using the data to use it in your data warehouse, doing other stuff like in snowflake. Do you allow to export your data or have features to to export it and import it into a data warehouse or something?

Eric Futoran: 100%. So the way our you can do it in two ways. You can either use our SDK, which is open source, and it's free. It's just a fork of OpenTelemetry. And to be honest, we're trying to get the sig to adopt it, but it's just a slower process. You can use it and just send the data straight from our SDK to wherever destination you want. It's more raw. It Definitely has some complexity because there'll be sessions and things like that which you're probably not built to handle, but it's raw, so fine, use it. Send it to Databricks or Snowflake or to Grafana or your Elk stack or whatever. Or if you use us, we'll take the data, enrich it, because we'll collect a ton of more types of stuff, and we know what to do with it. And then we have a series of APIs that lets you just take whatever you want and whatever format. If you want all the raw data, totally fine. You're probably not sending it to Datadog because your bill will explode. But we also give you APIs that are aggregated, like a metrics API that's aggregated, and you can decide what you want to send for those teams that want to use like Datadog RAM, but it's too expensive or it's too cardinal use us, and then you actually get the tools you need in Datadog on the back end anyway, just your client side. Teams are happier because they have a tool that's built for them.

[00:21:11] Chapter 7: Business Metrics and Observability 3.0

Mirko Novakovic: The data in OpenTelemetry is tagged right? And they use the semantic convention for things. But as far as I know, most of that is for the back end right and back end technology. Did you have to basically build your own semantic convention for mobile, or can you reuse something? Or how do you tag all the data from a mobile phone? I can see a lot of things that are necessary to do right.

Eric Futoran: Tagging is a very open concept, so you can kind of tag ever you want, for lack of a better way to put it. And we're starting to reach my limit because I'm not in the code anymore. I'm not allowed to be. They'll yell at me. Yeah. But from an index standpoint, we generally do it on user or on session for the way we think about the data, because that's how you want to ultimately want to use us. Like if you have an optimization, you may want to look up the user. I call it the CEO use case like everybody's CEO uses the app. And it's they always come up with a random stuff, and you just want to look them up and see what happened. And any session, not just the ones with errors, which is what everybody else does, like even the good sessions. So you can compare. Or maybe a good session was actually bad. So you indexed by user or by session so that you can combine the data and actually produce it in a more visual. We call it the timeline. A lot of companies do it that way. Now where they allow you to kind of visualize everything in a temporal sense as opposed to a more kind of I call it horizontal versus verticals. Most tools look at the vertical events, like a network call has a time limit time span. But most most companies assume it'll just finish. So they think of it as like an event. But from an overall tagging, you'd have to. Look, I'd have to get back to you.

Mirko Novakovic: No, no, no worries, no worries. I'm also, by the way, also not allowed to touch our code. The dangerous thing, if you are the CEO and you I used to be a coder, but.

Eric Futoran: Exactly.

Mirko Novakovic: I actually haven't coded since 2009.

Eric Futoran: So I dabbled when I was at Scopely once in a while, like unity and built some games by myself. But yeah, not really since 2012.

Mirko Novakovic: I know its ChatGPT then we can go back, right?

Eric Futoran: I haven't done in a while. I'll push a PR one like once in a while just to freak people out and it'll be like a website change or something. But I haven't done it. Well, maybe it's time.

Mirko Novakovic: Yeah. Absolutely. Absolutely. Yeah. But so I was thinking when I was thinking about RAM. Right. And the way I see RAM, I stumbled upon a blog post. It was called observability. Observability 3.0 by Hazel weekly. And she has written a really nice post about observability 1.0 2.0. And now 3.0 and 3.0 was basically it's not the only thing, but it was basically saying, okay, let's connect the business with the development teams in observability 3.0. Right. And I, I was always I always wanted to do that at Instana. And when you do that and when you think about it and the way I always thought about it is that you can't really do it without RAM because mostly if I mean, not for every business, but for most businesses. Think about the store that you mentioned or a game game or whatever. You're, you're actually your business is around the user. Right. And, and the user is on your client side and does most of the things like a session is how many articles have you watched on in the shop or whatever. Right. So this is really your business. Right. And so I think you can't really create observability 3.0 without the rampart. Right. So how do you see this business? I mean, in observability, you can say that none of the business users will ever log into Grafana or Datadog, probably. And look at those dashboards. They're very technical and that's fine. Right. They solve the problems of an SRE platform engineer and a developer. But I really like the idea of having business metrics in your observability tool, because otherwise you just copy similar data to a warehouse and then you create business dashboards. So I could imagine that for you, this is really a really nice use case, right. Having more business users on, on, on your tool, do you support that already or do you have plans to do so? We do have business.

Eric Futoran: Users in our product as opposed to a back end product that maybe you'll see it in the back end if there's like if you use the I mean, traditional Grafana, not necessarily Grafana cloud like a widget and you build a, a KPI and send it off and have like a business dashboard, but it's kind of different than. It's definitely different than observability. But we do see product managers UX using us. It's not necessarily a core use case. Our love is engineers and DevOps. Yeah. And so we don't necessarily want to veer, but we've thought about it. I mean, getting into really academic, if you think about the core of metrics like there's definitely a Venn overlap with product analytics, the amplitudes of the world, I mean they're using metrics. It's just the use cases for how you analyze the metrics are slightly different and not as timely. They're not necessarily looking at real time data when a DevOps or engineers really wants it very, very timely, like days and weeks matter. But I'm with you. I mean, that's our core use case. It's how do we convert user impact into prioritization. So it's not just like on the back end. It used to be SLAs. I think that's table stakes. That's probably even not even observability 1.0 like uptime is just assumed I'd say same thing for crashes 99.9%. It's just assumed that you'll be there. It's what's next. And honestly, I think the teams are sitting there going, getting pressured to value themselves. Like if you're a large DevOps team, you and cost is being crunched in a downturn, you better show your value. And if it doesn't tie the user to the business impact which you're right, ultimately is a user, you're not really making a use case for yourself.

Mirko Novakovic: It's kind of interesting, right? If you look at you name amplitude or there's mixpanel, we are using post-hoc. If you look at those tools, what they do is essentially they are sending events, right? You say, hey, somebody clicked here and then you apply tags on the events. So the data is very similar. What we do. Right. And the data is essentially created on the client side either in web or in your app. So I would say like from just from 1000ft. Right. If you look at it it is very similar. Yeah.

Eric Futoran: And contributes. Right. So OpenTelemetry.

Mirko Novakovic: So I can definitely see that especially with the ramp tool like yours, you could build. Your data is even more dense, right? You have more data. You have every session. You have data about the performance. You know, if it's crashing, if it's slow. So the combination of that data with just the data that you have this event, what's happening that combined could be massively valuable for product for business, but also for the engineers. Right. So I was definitely asking myself why is there no trend that those categories kind of merge together. Right.

[00:28:38] Chapter 8: Mobile Observability’s Unique Challenges

Eric Futoran: I think there will be. And there's been talk but it takes time. I think there's so especially OpenTelemetry we're pretty early in the overall adoption cycle of it that I think we just it will happen. The question is just not if, when and we're we may need a little more time for it to happen. But the product manager is pushing on us, the business users pushing on DevOps with SLOs and that trend. I'm totally with you. I think the key is for DevOps person's listening and SREs like it's not unique user charts and geo charts. It's like really understanding the real impact. Like they don't those are pretty, but they're not actionable. Like knowing one specific API call actually freezes the user, right? That's the golden mark. As opposed to just knowing an API for hundreds doesn't actually mean that it's actually impacted user on the client side, and vice versa. The client side could have network calls that never hit the back end. A lot of folks on the back end don't realize it, but like you were kind of mentioning it earlier, but that the network is up and working and the throughput exists for your mobile phone is not a safe assumption. So a lot of calls don't make it to the back end. And so it's like you're just blind to it. So there shit's happening. You just may not be aware of it. So it's important to have both sides of the equation.

Mirko Novakovic: And I can also imagine that apps are making a lot of calls to backends which are not your backend, right? I can just imagine that you call different services like PayPal and other services, and probably also App Store services, whatever that you would that will never touch your backend. Right. So you have no visible.

Eric Futoran: Yeah. You have no visibility like the fantasy sports app. A lot of those network calls that they were making that we were measuring were adtech calls. Right. That's third party vendors. Exactly. Ad vendors. As I came from an ad space once upon a time. So I say it lovingly. They're not policed so they don't have a they don't always have the best user experience in mind. They're they're a lot of times they want their network calls to fail. It's cheaper for them because if it doesn't hit their back and they don't pay for the call. But that's not great for the app. But you have no measurement of it. And then often they're in the startup like they wanted cash. They don't care about everything else you're trying to do to get that app loaded. So they'll bloat in the startup just to make sure that when the ads ready to be called, they make money. And so you got to police it. So I'm with you 100%, like there's a lot of those edge cases that are kind of obvious when you talk about them, but in reality they're really hard.

Mirko Novakovic: Yeah. I mean, that's why probably to come back to my first question, like, why? Why are the vendors not really focused on it? Because it's a very unique space, right. Where you have to have domain knowledge. You have to have the experience how to do it. You can have it as a side project, but I think if you want to do it really well, you have to focus on it, right? It's a own category. Basically.

Eric Futoran: It's own persona too, right? It's client side. Engineers are very different.

[00:31:41] Chapter 9: The Impact of AI and LLMs on Mobile Observability

Mirko Novakovic: Yeah. And no podcast without my AI questions. Right. Because AI is everywhere. So for observability, I have always two parts. Let's start with the part. We at least we see that a LLM most every of our customers is now using an LLM somewhere. So it becomes part of your normal application stack and you have to observe it, right? You have to monitor the LLMs. So it's kind of becoming the standard. And OpenTelemetry has already some libraries doing it. How is that on the mobile phone. I mean at least at Apple I know that they do it on the phone. Right. So do you have functionalities to monitor LLMs to figure out if the prompts were correct or the answers were good. So do you have some functionality there or do you see the demand for LLM monitoring on mobile?

Eric Futoran: So on the phone apps are running I LLMs and yeah, Apple has released kind of technology to help that. That's one piece. Are we using I guess AI as well. Ourselves on the phone is the second. The first one is so early. There have been a lot of startups over time that have tried to help companies run AI. I mean, it's from a cost and efficiency standpoint. It's pretty, pretty obvious use case, just really hard. It's an edge computing use case. Like if you can push all your AI to the users where and the users expectations are for things to be instantaneous, it's going to happen 100%. Can you use the resources on the phone effectively to not blow up the user experience or freeze the app? Not yet. Like very simple AI, but most of it's still done on the back end and pushed to the phone. And in terms of results like your recommendations, things like that. I do here a bunch of startups trying to start using the capabilities of Apple? I'm sure Google will come out with their own but I haven't heard anything specifically yet. So for us, we're watching it. But for us to invest there yet not knowing kind of the overall market size, it's really hard for us to go after it.

Mirko Novakovic: But it's good to understand that it's not yet so you don't see it. You obviously have a good view on all the apps, so it's not yet widely used on the phone. So as far as I understood you, it's more the phone is calling a service on the back end, which then sends the answer back.

Eric Futoran: Right, exactly. It will happen though. It has to preserve the user experience, even differentiate like if I'm a, if I'm a social network, they may be doing it right at this point because their measurement is how quick things appear right and how quick how well they engage users. That's a very small use case, but they're early adopters are generally for all technologies including mobile. Back in the day, it's always social games. And when we won't talk about they're always the earliest adopters. And so in social, they're probably happening. There's not a ton of social apps. We don't I don't have enough data. We're in a couple, but I don't I haven't seen anything specifically. We're in a bunch of games, but games have their own problems, so problems, so I don't think they're necessarily investing there quite yet. Right. It's been there, that's a whole nother conversation for another day. But yeah, monetization in games for the last couple of years has been definitely tough for game companies. And I feel for them.

Mirko Novakovic: And then in terms of AI changing observability and in your case, RAM for yourself, are you using it in the back end? What use cases do you see? Do you see it disrupting the space or is it improving things? How do you see gen AI evolving in observability, especially for RAM?

Eric Futoran: Yeah, for RAM I think it'll enhance. And that's where we've made investments. I mean there's AI for like coding and productivity, but that's not observability. But obviously we're invested like every tech company in outside the in observability land. Like if the ultimate measure is or ultimate measure of success is can you measure overuse the term or optimize and what's the user impact kind of the three pieces. Where does it help? I mean, it helps you with user impact, like do I looking for specific sessions and looking across them and trying to find patterns? Knowing whether the users impact like when they left an app because of a freeze. Is it a one time thing or is it something across a lot of different users? And what are all the types of reasons that could happen? That's a total I use case hard, but there and we have the data for it, which I think is the reason that a lot of vendors haven't gone into it. They're either not they don't have the data sets, but they're not allowed to co-mingle across their customers. That's where we're really starting to use it. Machine learning and AI are basically to the same point at this point are the so for folks that aren't familiar on Android, there's a concept of an ANR, which is basically an app for use, like when you're using an Android app within five to 10s, Google changes the timing for the prompt once in a while, but a prompt will pop up and ask you to terminate.

Eric Futoran: The app means the app is frozen or a thread got frozen. That's a long time to wait. And in the middle, we're, like, collecting code samples from as soon as we can detect it all the way to that prompt. And then that's a total I ish case, because you're looking for the commonalities and the threads that may have caused that ANR not just in that session, but across all the user sessions that exhibit that same set. It's a lot of data. We have huge right now the visuals flame graph, because it's impossible for us to know the exact cause because we don't code the app. And then we use technologies to try to bubble up. Maybe the root cause analysis, which is a good example for.

Mirko Novakovic: That, makes total sense. That's exactly how we see it at the moment. So I don't see a disrupting the space. Right. Having at the moment. Right. You never know what happens when they hit AGI or what whatever we call it. Right. But we also use it in cases like you just explained. Right. When you have a lot of data like these threat samples I could imagine that you have when you take these. Are snapshots, right? During a freeze. And then you used the LLM to understand what's happening. And and you try to figure out commonalities and, and bubble up the problem. That makes sense. And it really makes the life of a developer easier, right? The one where.

Eric Futoran: I'm seeing a lot of startups is anomaly detection. It's kind of an obvious use case when you think about it, right? You have tons of data sets and you don't know which one. And it doesn't make sense for a user, which is kind of what happens today to set the thresholds or some sort of velocity or acceleration equation on a set of data. You should be looking at all the data and trying to find the anomalies. And then the hard part where AI is helpful is false negatives. Like you never want to get an alert for something that's not really a problem.

Mirko Novakovic: No, absolutely.

Eric Futoran: It's always been a problem. The learning algorithms around it for even a large vendors suck. And so I think that's an obvious use case. I've seen a lot of startups are they end up to be controversial. Are they independent companies? Probably not. But it's definitely a problem to be solved and they'll just get gobbled up by companies like ours.

[00:38:52] Chapter 10: Future of Mobile Observability and Final Thoughts

Mirko Novakovic: It was fun talking to you. It is really a space I'm super interested in. I will be watching embrace and I'm happy that you are there with OpenTelemetry. And I see that this is a growing space, right? It's I mean, every application will probably be a mobile application, right? So that's cool. Yeah.

Eric Futoran: Thanks for having me. I mean, you're I love your background, especially with Dash0 and Instana. Like, your questions were really awesome, so I appreciate it. Thanks, Eric.

Mirko Novakovic: And talk to you soon. Thanks for listening. I'm always sharing new insights and insight and knowledge about observability on LinkedIn. You can follow me there for more. The podcast is produced by Dash0. We make observability easy for every developer.

Share on