host

Mirko Novakovic

guest

Juraj Masar

Episode 4036 mins3/19/2026

#40 - Breaking the Observability Model: Pricing, AI SRE, and a Developer-First Mindset with Juraj Masar

host

Mirko Novakovic

guest

Juraj Masar

Listen on

Apple Podcasts Spotify Youtube

About this Episode

Better Stack co-founder and CEO Juraj Masar joins Dash0’s Mirko Novakovic to challenge the fundamentals of modern observability, from cloud lock-in and pricing models to how platforms should be built in the age of AI and how we market them. They discuss why observability costs are fundamentally broken, how Better Stack combines cloud and ‘bare metal,’ and why small teams can outperform large engineering orgs. The conversation also explores eBPF as the new default for instrumentation, the shift toward AI SRE, and how Better Stack has paired a developer-first product with unconventional marketing, from generous free tiers to SEO-driven status pages to a massive YouTube presence.

Transcription

[00:00:00] Chapter 1: Opening and Guest Introduction

Mirko Novakovic: Hello everybody. My name is Mirko Novakovic. I am co-founder and CEO of Dash0 and welcome to Code RED code because we are talking about code and Red stands for requests, errors and Duration the Core Metrics of Observability. On this podcast, you will hear from leaders around our industry about what they are building, what's next in observability, and what you can do today to avoid your next outage. Today my guest is Juraj Masar. Juraj is the co-founder and CEO of Better Stack, an observability platform used by more than 400,000 engineers and 7000 customers, including brave, NordVPN, and Unicef. He's building the company into a profitable developer first alternative to legacy observability vendors while operating with a deliberately small team. We will talk about that. Welcome to Code RED.

Juraj Masar: Thanks, Mirko. Thanks for having me.

[00:00:56] Chapter 2: Juraj’s “Code RED” Incident

Mirko Novakovic: Yeah. And I always start the conversation with my first question. What was your biggest Code RED moment in your career?

Juraj Masar: So I thought about it over the last year, obviously. I went through many downtimes and many incidents, but one stands out. Back in 2011, I was VP engineering at a company called represent where we sold t-shirts with celebrities over their social media. And we were working with this one, a middle aged men celebrity selling t-shirts to other middle aged men. And so you can imagine how surprised we were when after selling thousands and thousands of t-shirts, we looked at the orders and it turned out that most of the t-shirts sold for this middle aged men campaign were women's s t- shirts. And for a moment we thought, what an interesting social, you know sort of phenomenon happens over here. And then we realized that we did this database migration where we overrode the connection. So we were this close to shipping thousands and thousands of pictures of incorrect sizes that would probably bury the company at the time. This was 2011. It was early in our careers. We had database backups once per day. We ended up parsing logs to reconstruct the e-commerce orders from logs. And we ended up saving the company, but there was a moment where I was like, oh, okay, maybe a real time database replication is a thing and maybe you should always, you know triple review your table migrations before you actually run them. That was a scary moment.

Mirko Novakovic: Absolutely. And that's why also data observability is an interesting space, right?

Juraj Masar: What was a bug? What a bridge right there. You should, you should start a business.

[00:02:40] Chapter 3: Founding Better Stack and Early Product Focus

Mirko Novakovic: Maybe, maybe. So let's talk. I mean, how did you start Batter Stack? How did you come up with it?

Juraj Masar: Fixing my own problem. So I wanted to get a phone call when the website went down and back in the day, you had to plug in Bingdom into PagerDuty with this terrible API's. And we ask ourselves the question, how hard can this be? And at that time, working with a few friends we made a bet that we were going to launch this in a week, a simple service where you enter a credit card, enter a phone number, enter a URL, and you're going to get a phone call when your website went down. And sort of ever since, whether it was a eBPF based telemetry tracing, whether it was log management, whether it was something like status pages or AI SRE, we simply follow the market. We followed what our customers asked us to do.

Mirko Novakovic: Yeah. But so you started with this combination of synthetic checks and on call, basically, right?

Juraj Masar: That's you need to start somewhere, right? Yes.

Mirko Novakovic: Yeah, absolutely. And I think it's smart, right? It's a good way of acquiring customers. You're in a, I would say simpler domain than logs, metrics, traces somehow. Right. And, also easy to onboard.

[00:03:55] Chapter 4: PLG, Free Tier, and Marketing Engine

Juraj Masar: You certainly don't need to build your data warehouse in order to check werbsites. I can tell you that yet it's still a complicated problem.

Mirko Novakovic: Yes. But also from an onboarding perspective, it's like, I think Pingdom how many customers did they had? 100,000.

Juraj Masar: Look, everyone needs something like this. Do you know what I mean? Everything. It's the best lead gen out there.

Mirko Novakovic: Yeah, absolutely. So you started with that. And I remember I think it's way before I started Dash0 we talked right. Because you were, I think fundraising and, and I, I helped the VC. So I had a conversation. I was pretty impressed, right? Because you did almost everything through, I think internet marketing, right? So you didn't have a sales team at the time. You, you acquired the customer through SEM, SEO and, and today, 100 000 YouTube subscribers, right? Also through YouTube channels. So you got the people on the website onboarding them through PLG and then literally getting them to put the credit card in and get them as customers.

Juraj Masar: That's it. That's it. The biggest marketing cost or sales cost today is our free tier, our very generous free tier that gets us those hundreds and hundreds of people using us.

Mirko Novakovic: That's really good. And how do we think about pricing? I know that you have a lot of opinions around it, right? How, what do you think pricing, pricing models out there charging for gigabytes and insane costs. Right?

[00:05:24] Chapter 5: Cloud Economics and Vendor Lock-In

Juraj Masar: So do you want to talk about pricing for, for, for the big clouds out there? Or do you want to start talking about pricing of observability tools?

Mirko Novakovic: I think both. I saw a picture today of you in the data center. So I can, I can imagine what you are doing there, right?

Juraj Masar: It's called vertical integration where everywhere in between AWS all the way to bare metal. Correct.

Mirko Novakovic: Yeah, absolutely. So are you using a cloud vendor today or are you already on bare metal?

Juraj Masar: It's a combination. So different people ask us for different deployments in their cloud in 2011. When, when AWS was, was really getting popular. The big promise back then was that your total cost of ownership was going to be lower than you can auto scale your instances. And if, if, if you're not getting a huge volume, it just scales them down, you're going to pay less. I don't think this ever materialized and sort of Amazon persuaded everyone that hosting is a terrible business, yet they generate north of $100 billion these days. And so, so, so, so I have strong opinions on how teams should build on big clouds. I think you should absolutely use them. And I think you should use them for the commoditized parts of their offering. I think speaking about AWS, you should use their EC2. That's wonderful. Super powerful. I think you should use their object storage. I think you can send emails via AWS, but the moment you start going into these very cloud specific services, whether it's Lambda, SQS, SNS, DynamoDB. Those exist to lock you into the ecosystem, making it impossible to rip them out later and migrate away.

Juraj Masar: Essentially, if you think about the product offering of Modern Cloud being architected about locking customers in, computers are fundamentally a commodity. How many CPUs are there in data centers? How come so many people are on AWS never leaving? Setting one terabyte file from AWS to the internet costs close to $100. Yet the actual cost that AWS is paying is something like $0.15. So, so, so that right there means that many customers are reluctant to have hybrid solutions right there. And so essentially what my point of cloud is absolutely do use it, use it. Where is the right tool, where, for instance, can predict your future workloads, where you need to scale fast, where you're in a high growth startup like us, that's the moment to use cloud. But be very careful when when someone comes with a very bright idea, let's plug in that redshift. Lets plug in that that BigQuery. Lets plug in the DynamoDB because those are the moments where you're making the choice to stick with the cloud vendor forever. So just as long as you're able to take your workload and migrate it to GCP tomorrow to Azure tomorrow, you're in a good place.

[00:08:13] Chapter 6: Bare Metal vs Cloud Cost Multipliers

Mirko Novakovic: Yeah, absolutely. I mean, there was a lot of discussion, I mean, openly discussion by DHH. who, who openly said that he was migrating away and he also compared S3 cost, right? S3 costs to host it himself. Right. Which was like, I don't remember, but at least a ten X difference in cost.

Juraj Masar: Oh, more than that. More than more than more than, more than that. When you think about bare metal versus these budget providers like Hetzner, I think ten X roughly ten X difference in cost. And when you think Hetzner does say AWS, like a proper provider, big tech providers think another ten X depending on what exact instance you use. But oftentimes when you use serverless compute, like Lambda, for instance, you might be overpaying by 100 X to the bare metal. And sure, it scales up and down and it can be great for some workloads. But here's a another alternative thing you can do. You can take the money if you're actually running a sustained workload on something like Lambda, and you can overprovision your architecture by 100 X for the same amount of money. And it turns out oftentimes when you overprovision by 100 X, suddenly it's much easier to develop for such a system because you don't need to optimize for memory usage in your system all the time because you're running on ten instances. It's easier for engineers to ship new software. So here they are. The big clouds have hundreds of billions of dollars of motivation to persuade you that you are not a startup unless you're running on AWS. You're not a startup, you're not a real company. Unless you're running on Kubernetes, you're not a real startup unless fill in the blanks. And there's a lot of people living a very happy life like that. Their business is not selling compute, so they don't need to shout about alternative ways of doing this, but they're very happy running in these very different ways. Ten years ago, AWS actually simplified things. Ten years ago, 20 years ago, you would need to go and buy the servers out there. These days, the open source tooling progressed so much that learning how to use these proprietary cloud tools is often equally, if not more complex than learning to use open source tools that give you the freedom of using the cloud vendor of your choice.

[00:10:22] Chapter 7: Observability Pricing and Billing Structure

Mirko Novakovic: Now that makes absolute sense. And I remember talking to the city of Zynga once and they said, like the gaming's right? They always start on AWS because they can't predict the load, right? They don't know how much load will get. And then once they know the load, they migrate it to an on prems or own data center because it's so much cheaper, right? So I think that's a really good. And talking about this, I mean, for us observability vendors, that's a big part of the cost, right? We have to process and store heavy amounts of data. So S3, EC2 is the majority of our cost, to be honest. Right? Running it. So. So how do you think about pricing of logs metrics traced by gigabyte and and you also made huge changes, right? I remember when you came out with metrics, it was super expensive first. And then you, you put it down like a thousand X or something, right?

Juraj Masar: Yeah. The reason is that how do you price. Okay. I'll get back to the story. It's a funny story. Remind me if I forget, but you asked about pricing. So, there are two different topics to talk about. One of them is sort of building how to structure the billing. And the second thing is, okay, what is the price point? How much would this cost? So let's talk about the structure first. What are the properties of good billing? It's understandable. Customers don't need a PhD to understand what it means. And it's predictable. You send me more data, I charge you more money. You know what to expect. Now we actually change our billing recently one more time, and I need to do PR about it where we started charging for metrics in gigabytes. And this is a fundamental good example. So Grafana charges for active series. Datadog charges for custom metrics. We were charging for active data points, signals, charges for million samples and Dash0 charges for data points metric. Something like that. The point is, everyone charges for something else. And then I'm looking at my Datadog invoice and I'm like, how much is this going to cost on better stack? And I have no clue. And that's just stupid.

[00:12:24] Chapter 8: Cost Basis vs Vendor Pricing Disparity

Juraj Masar: That's just stupid, right? So that's not understandable. That's why, that's why we simply simplified everything that, hey, let's talk about gigabytes. Everyone can imagine one gigabyte of data and then there's price point. So let's talk about price. You mentioned S3. So storing one gigabyte on S3 costs roughly $0.02. And if you're smarter and go to a compatible object storage that doesn't charge for egress like Cloudflare R2, that's 1.5 cents. Okay, let's talk about even how much it costs to buy a super fast NVMe SSD. So that costs roughly 10 to $0.20 per gigabyte. Buying one drive. And now if I look at data dot pricing page right now, I see that if I store one gigabyte of logs for 30 days with on demand billing, I'm paying almost $4 $3.85. Okay. So I can buy one superfast drive for 10 to $0.20. And pr gigabyte it and I own it. Or I can rent the space on, on, on cloud for two for a cent per gigabyte, yet I'm paying three, $4 for the same gigabyte for one month. I'm simplifying. Okay. I'm not talking about redundancy and raid and compression and high availability, but the point stands like there's a huge disconnect.

Juraj Masar: And so the point is that no one ever got fired for buying Datadog. And right now, there's an entire industry that was created about optimizing cloud cost and observability, vendor cost. And I think this kind of engineering, that kind of insight that we're doing financial engineering on how to optimize our invoices, where we should be doing real engineering on how to work with a lot of data efficiently. And so internally, what we've done is that we built our own data warehouse with this in mind that we will, we will always have to be as data efficient as possible while offering super fast queries at massive scale. And at the same time, we went really far with vertical integration, meaning combining the best of cloud, but combining the best of bare metal. And so today that one one one gigabyte, we charge $0.15 to process and store it for, for, for, for one month. And this is sort of a strategic decision. And so like, honestly, when you, when you talk about pricing observability, it is kind of insane that it costs $0.02 to store a gigabyte on S3. Yet it costs almost $4 to, to, to, to store it on Datadog with, with, with on demand billing.

[00:14:48] Chapter 9: Team Size, Leverage, and Profitability Philosophy

Mirko Novakovic: Absolutely. Yeah. But I always also say, I mean, you can look at it from two dimensions, right? One is the infrastructure cost against it, which is what you are doing, right? You compare what it cost to have the disk or to store it in S3. But then you also have a gross margin on it for all the costs to basically build and sell the product, right? And what I found, I mean, we have to look at the PNL of data. They spent $1 billion a year on R&D, which you kind of have to finance right now. I think you have a different opinion on, on sizes of teams, right? So I think I read that you said 150 people max, right. Hardcore mode. And I think you have, what did you say, 30 engineers, 15 engineers, I don't remember 30 overall.

Juraj Masar: We're between 30 and 40 people right now. And roughly, let's say 15 engineers, 15 superstars.

Mirko Novakovic: Yeah. So 50 superstars and basically building this. And there's also a large leverage, right? Because if you have $1 billion in R&D, you have to have that gross margin, right? Because you have to spend a lot of money on your engineering team.

Juraj Masar: So let's, let's talk. About that engineering team. Let's talk about how software should be built. Okay. So Paul Graham wrote the book Hackers and Painters when he sold via web. And this was in the 2000 or even around the.com boom. And he said that the best engineers create 50 times as much value as the, let's call them average engineers, median engineer. Ever since we got internet. Then we got Stack Overflow. Right now we got cloud code and cursor. I think right now that the very, very best engineer, 0.1% of engineer on the, on the, on the, on the bell curve create, say, 1000 times as much value as the average engineer. And so, so the trouble is typically you get the opposite of that, which is that companies treat engineers as a commodity. They think, oh, I have a big project. I need eight engineers to work on this for a month. And that's, I think, the worst thing, how you can think about something as high leverage as software engineering, where the very best people get insane leverage on their work and time. And so what we do, for instance is that we, for instance, pay our engineers based on the output of what they create. There isn't an artificial table. You're with the company for, for years and you're a senior engineer. This is your pay. Now we actually do the work and understand what's the output of their work and we pay on that. Plus, it's just more fun to work with the tight group of people who are super senior, and you can learn a lot of things that affect other super senior people to join you.

Juraj Masar: You mentioned in the intro that, oh, better stack, better stack is a profitable business. I think of a better stack as a fast growth business. That's the point of what we are doing it. And when we, every, every single time, when we mention profitability and the fact that all the VC money that we ever raise is in the bank I say unintentionally profitable. This is a side effect of what happened. The point is to grow. The point is to, to, to, to serve many millions of engineers. And profitability is just a side effect of how we organize that. And I don't think we should be talking about cost. I don't think we should be talking about gross margin. We should be talking about how to ship software fast, how to ship better software fast. And, and I don't think that in the age of cloud code, you know, you get even Instagram was, was when Instagram was sold. They had what, ten people, 15 people when WhatsApp bought, sold, they had 30 people. Cursor these days have 50 or 100 people. These are tiny teams. Let's call them AI native. And so the way to think about that is that, hey, we should rethink everything and anything when it comes to enterprise sales and doing SDRs. And we should think, rethink everything and anything about shipping software and and how to best use these tools like cloud code to completely reinvent engineering teams. And honestly, I don't think hiring a thousand engineers and tell them, hey, build a better Datadog is the way to go.

[00:18:58] Chapter 10: Scale vs Complexity and Greenfield Advantage

Mirko Novakovic: No. Absolutely not. Not at the start. Right. I think at one point you will end up this. I mean, if you look at Datadog to be fair, they have like, I don't know, 100 different products right now. We can discuss if you really need that. But if you have 100 different products, you also need someone who takes care of it somehow, right? Otherwise it will just suck. And then naturally you will come up with, okay, I need like a few engineers for the logging topic, a few engineers for profiling, a few engineers for building the agents for profiling. So, and you can't, can't have a context switch every day if you continuously develop that product. I think there is a point. I'm not saying you need 1000, right. But there is a point when your product grows and you need to maintain it, then you also need some sort of engineering team owning it, right?

Juraj Masar: 100%. But I think you're still simplifying it quite a bit. For instance, if you grow by acquiring companies and your logging and profiling have different backends and you just smash the UIs together, you're maintaining a very different system to where you say, we are starting on a greenfield. We this is our data warehouse, and we want to support these two use cases. So those details really, really, really, really matter in large enterprises. You know, 80% of your work as an engineer is migrating, migrating from one project to another project. And so, so this is the huge advantage of startups that you can actually do things properly when you, when you start, and hence sort of remediating a huge part of that penalty that you get with maintaining existing legacy systems. So I do agree with you. I wouldn't oversimplify and this is, again, when those engineers making those fundamental decisions get a lot of leverage on their time.

Mirko Novakovic: Yeah. But you also have today, you have almost everything you need in observability platform, right? You started with the, with the incident response part, right? And then you had a status page, you added logs, you added metrics and dashboards, then eBPF based tracing and now you have AI assistant, right? An SRE agent. So you are getting more and more complex, which either you can't work on all the products all the time, or at some point you need dedicated kind of engineers focusing on the topic because there's, there's always something to do on tracing or logging or something, right?

Juraj Masar: 100%. It's just you don't need 1000 people doing that.

[00:21:22] Chapter 11: eBPF + OpenTelemetry as the New Default

Mirko Novakovic: No. Yeah. That's true. Let's talk about tracing and eBPF. I think, we discussed this also at the last trade show. And I was asking you, how is it working? Right. Because I think eBPF is a little bit, I love it. It's easy to use if it works right. So how is your view on eBPF? How is it working for you and for your users? Is it simple? Does it work these days?

Juraj Masar: So so yes it does. And if you look at the OBI project in the cloud native foundation. It, it is extremely active and I think eBPF has already won. I think I think we're already there. I think it's time to declare a victory. And so look, the pitch for better stack today is instrument All your distributed applications without any code changes thanks to eBPF and OpenTelemetry. Then get this unbeatable price performance ratio that we talked about in just 80 times as much data as we did, and then get an AI on top of all your data that you can query very fast. It's your copilot to fix live issues. So eBPF is a very important part of the stack and it should be the new default. You are, you are instrumenting your code. You can set up a remotely controlled eBPF data collector in one helm install. You can try it out on your staging in ten 15 minutes. And it's going to instrument all your applications very fast. Now, is it going to work in all edge cases? Well, that depends on your application, but these days it works mostly fine for the vast majority of use cases. And so here's what I propose to our customers. This is a very cheap experiment. Very fast. Try it out. And you very, very quickly see it works on you know, 95 of our 100 services. Let's use it there. Let's switch it off on a per namespace basis, per service basis for these five services, where we go with a more granular OpenTelemetry SDK instrumentation. But why would you go the tough route from day one and adjust the code in every application that you have where you get this option now? So, so I honestly think that that is the combination of OpenTelemetry for, for most people.

Mirko Novakovic: No, I agree, I like it too. I think if it works, it's really good and it makes total sense, right? Because as you said, you don't have to change anything. It does it on a kernel level essentially. Right. Which makes also Instrumenting. I mean, some things are easy to instrument like Java, everything that's interpreted, but especially languages that are not interpreted like go or C or rust, they are harder to instrument with without eBPF. Right. But eBPF, it doesn't really matter, right? You can instrument almost everything.

Juraj Masar: You can still instrument the network calls coming, coming, coming, coming in and out. Oftentimes you have some sort of reverse proxy in front of it and look like, like if you have a dedicated service deployed on a dedicated host, thanks to you, at least see the network traffic coming in and out, although you don't know what's happening with that application, that is still a lot more than not having any data, right? And so that's sort of what I recommend, like start with eBPF. You can literally do that in 15 minutes. Like as long as you have a staging environment, just try it. Okay. And then maybe it's much easier to deal with the consequences of if you have a Redis instance, which is doing, you know, 1 million queries per hour, what? That's probably per second. Whatever. In that case, that eBPF CPU overhead is actually going to hit you quite a bit. Okay, cool. Turn it off for this service and figure it out with the granular instrumentation.

[00:24:59] Chapter 12: YouTube as Developer-Led Growth

Mirko Novakovic: No, absolutely. Absolutely made sense. So what I also want to talk about is your YouTube channel, right? I mean, it's incredible. You, I saw on LinkedIn, you posted that you had the 100 000 subscriber and you said you had never, ever targeted for it. Right. But is that, is that a great channel for you today? Is that a great marketing channel or how, how, how are you using that YouTube? Because it's not only about observability, right? It's about anything development, I would say. Right. It's targeting developers and it gives you news about almost everything and not so much observability in specific. Right?

Juraj Masar: Look, Mirko, we publish hundreds and hundreds of videos. There's like 1 or 2 videos coming out every day. So there's only so many videos you can make about observability. And we've done our fair share and we will do more. But yes, it targets broader developer audience in addition to SREs. Look, in 2026, the question is how do you sell to engineers today? Youtube is sort of our version of that. Let's educate people instead of you know, selling them on benefits and features. And oftentimes they come back and they're like, oh, actually I didn't know the better stack is an entire observability company behind this, you know, cool YouTube channel. So that actually generates a lot of goodwill for us.

Mirko Novakovic: Yeah, absolutely. I can see that. And, and you were from day one, right? Really good in the marketing and conversion kind of game, right? Getting people on the website, converting them to, to users, right.

Juraj Masar: I think you need to lean into your DNA. And my DNA was in B2C. My DNA was, was, was in online marketing. So these things are natural. I'll tell you, I'll tell you a few things that helped better to grow in the first few days because it's not a secret anymore. One of the very first beautiful status pages that you could put on status.yourcompany.com for status.huggingface.com, right? Every one of those had powered by batter stack in the footer and a. It brought us direct referrals, but. B and it helped us with search engine optimization because we could simply point that to a landing page. About the new product that we, that we, that we launched and exploded in Google. And now it was so great that everyone copied it. So these days it's not a competitive advantage anymore, but it's one of these few tricks where you need to use a little bit then blend engineering with marketing, not have them siloed, but then you're able to come up with these very cheap solutions that are better than a huge marketing campaign.

[00:27:35] Chapter 13: Competitive Dynamics and Friendly Rivalry

Mirko Novakovic: Yeah, absolutely. No, congrats on that. I think you, you reinvented it over and over again, right? I also always see your I don't know if I Google now, if I Google for Dash0. I normally get a very Nice, better stack ad that says goodbye to Dash0. Yes.

Juraj Masar: I mean, look, that's where we are. Do you know what I mean?

Mirko Novakovic: Absolutely, I love it. I I'm not offended. Right. I think it's a smart idea. You have to go after vendors, right. And you have to do it. So I like it.

Juraj Masar: I would, I would tell you that I really love that you're doing this podcast because I think everyone in the observability industry are sort of frenemies that we do compete with, with each other. But look, we are human beings and we should be friendly and God knows, maybe, maybe in few years time, you know, will be colleagues. God knows what happens. It's a very small industry and we share the passion for making software better for people. So, so, so honestly, the only company right now right there that we're competing with is Datadog. We talked about it before. So yeah, different people have different ideas and let the best idea win.

Mirko Novakovic: Absolutely. I'm not sure if you just made me an acquisition offer because we we become colleagues, he said.

Juraj Masar: Was it paypal that was created by a merging company with X?

Mirko Novakovic: So absolutely.

Juraj Masar: You know, there are different ways of doing these things.

[00:29:02] Chapter 14: AI SRE as the Emerging Interface

Mirko Novakovic: No, absolutely. I mean, we just acquired Lumigo, right. Israélien company that was specialized on AWS Lambda observability. So what do we see next? How do you see observability evolving, especially with AI? I mean, I saw that recently. I think you posted also a few of the feedback of your customers with the AI, SRE agent. You just also said that this will be the layer essentially where users interact with your platform, right? You have the data lake with your OpenTelemetry data, and then you have a fast query engine and the agents will communicate with that lake and essentially become the new user interface essentially. Right? So how do you, how do you see that evolving?

Juraj Masar: I think sort of the observability category. Very soon we'll be talking about it as the AI category, and it will be sort of the same thing. And I think you will start with the AI. Sorry if it's good. And then when you need to really laser point into problems and analyze the hypothesis and look into the raw data. Then you go into trace view and service map and lifestyle and dashboards. But you will start with AI. Sorry, I think we're, we can already see this with how consumers use ChatGPT. Like oftentimes I do prospecting and research on ChatGPT. And when I make a decision to buy, then I go to Google. I think you can make this parallel with how observability tools are going to be used. And it's, you know, one of those rare features that our customers really rave about. Your customers are going to like some things and they're going to hate some, some other things. But it's very rare that you not only get like a moment, I get it, but you get a moment like, oh my God, this is so amazing. This changed the way I work. Right? And for instance one of our customers recently shared this wonderful anecdote that thanks to our MCP server, their layer two technical support engineers now are able to write code, whereas previously they had separate customers. Because right now the non-technical L1 support is able to use the MCP to investigate issues on their own. So you literally change the org design because of your new capabilities with, with, with, with LLM. I think this is sort of obvious for everyone. And now we can discuss what are the individual specifics of that implementation. And again, let, let the, let the, let the best idea, let the best idea win. But it would be foolish not to lean into this. If AI is literally impacting everything and anything that we talk about.

[00:31:35] Chapter 15: Data Ownership, Integration, and Infra for AI

Mirko Novakovic: Absolutely, absolutely. What do you think about owning the data? Where versus AI as our agents that are not owning the data?

Juraj Masar: I think you needed both. Because, I'll tell you what the AI SRE does? So I studied it quite a bit. Okay. I played with it quite a bit. It can figure out what's wrong. It really is. We have this wonderful demo where we give it a large cluster, and there's an intermittent issue with readings, and it's able to actually figure out what's wrong and suggest a hypothesis and a fix. It is not efficient though as a human. Okay. A human needs to run way less queries today, needs to run very fast queries very quickly. A lot of them are inefficient. So the key to making AI work today is to have a wonderful infrastructure, very powerful, cheap infrastructure powering it at scale. Right. And so yes, there's a, there's a lot of people trying to build this on top of, say, Datadog's data or Grafana data or self-hosted Loki. And if the underlying database crashes on you, as it often did with self-hosted Loki, for instance, for me, then obviously the AI is going to be very limited at this point. Number one, point number two is that similarly to, for instance, how Salesforce restricted their Slack APIs for AI companies. You know, if you were running Datadog or New Relic, it's one of the options to consider to limit your APIs so that other startups cannot build on top of it. So in reality, I think you need both. You need to behave like Switzerland and integrate with, with all the other tools that people are using. Because the more context you're going to put into these systems, the better. But data sovereignty is a very real thing. I encourage our customers to store their data, not in better stack, but, but to manage the data, but store them in their own buckets in open format so that they're in charge of their data. And I think you really do need a powerful infrastructure to make an AI server really work. And you can only do that with your own.

Mirko Novakovic: That makes sense. Makes sense. I agree with that. By the way.

Juraj Masar: Did you play with this?

Mirko Novakovic: I know not, I think with some right, but not with all of them. I mean the resolve, AI, etc. I have not tested myself. Have you tested them or.

Juraj Masar: Briefly, briefly. But look, my first impression is very good.

Mirko Novakovic: Yes. I also see the value, especially if you have multiple tools, right. Think about having your logs in Splunk, you have Datadog for whatever APM, and then you have Prometheus for your metrics. And today you need the developer to logs into logs. It has to copy something, paste it into Datadog, and the SRE agent gives you the full context, right? They can do the work for you and figure out things and combine things. I think that's also a big benefit of those tools that they can plug into multiple tools.

Juraj Masar: Yes, 100%. As long as you don't get rate limited by Splunk or Datadog, in that case, you're kind of screwed. We'll see where this goes. This is a very, very, very exciting field.

[00:34:32] Chapter 16: Multi-Tool Context and The Road Ahead

Mirko Novakovic: So in your opinion, is that the next big thing in observability, the AI as a agent or what is it?

Juraj Masar: So it's again, the pitch is an instrument. Your apps with no code changes with ibn OpenTelemetry then be able to store 80 times as much data for the same budget. I'm very excited about that. And then rebuild your workflow so that you are able to use an AI and MCP server reliably in your day to day work. It is the combination of these three things that fundamentally changed how I observe our systems. And so, I highly recommend, you know, every, every viewer to, to give it a shot on their own.

[00:35:10] Chapter 17: Closing

Mirko Novakovic: Absolutely. You're right. I wish you all the best and good luck with growing better stack that fast as you have done in the past years.

Juraj Masar: Mirko, thank you so much for the invitation. Let's do this next time on the YouTube channel with the Better Stack podcast.

Mirko Novakovic: We will. Thank you.

Juraj Masar: Cheers. Take care. Bye.

Mirko Novakovic: Thanks for listening. I'm always sharing new insights and insider knowledge about observability on LinkedIn. You can follow me there for more. The podcast is produced by Dash0. We make observability easy for every developer.

Share on

More Episodes

#41 - Platform as a Product: Why Internal Platforms Fail (and How to Fix Them) with Abby Bangser

Episode 4142 mins2026-04-02

Abby Bangser

#41 - Platform as a Product: Why Internal Platforms Fail (and How to Fix Them) with Abby Bangser

#39 - Beyond On-Call: How incident.io Built Multiplayer Incident Response with Stephen Whitworth

Episode 3942 mins2026-03-05

Stephen Whitworth