host

Mirko Novakovic

guest

Ozan Unlu

Episode 2540 mins5/15/2025

Beyond the Dashboard: Building Telemetry Pipelines for Petabyte-Scale Systems

host

Mirko Novakovic

guest

Ozan Unlu

Listen on

Apple Podcasts Spotify Youtube

About this Episode

Edge Delta founder and CEO Ozan Unlu joins Dash0’s Mirko Novakovic to discuss the rise of telemetry pipelines and why this once-niche idea is now critical infrastructure. Ozan shares how his experience at Microsoft and Sumo Logic inspired him to challenge centralized observability models. They cover data tiering, pipeline-based architectures and the growing role of AI in curating massive telemetry streams.

Transcription

[00:00:00] Chapter 1: Introduction to Code RED Podcast

Mirko Novakovic: Hello everybody. My name is Mirko Novakovic. I am co-founder and CEO of Dash0. And welcome to Code RED code because we are talking about code and Rad stands for request Errors and Duration the Core Metrics of Observability. On this podcast, you will hear from leaders around our industry about what they are building, what's next in observability, and what you can do today to avoid your next outage. Today my guest is Ozan Unlu. Ozan is the CEO and founder of Edge Delta, a Seattle-based platform founded to help engineering security teams analyze data with telemetry pipelines. He previously led software development at Microsoft and Sumo Logic. Ozan, welcome to Code RED.

Ozan Unlu: Yeah, thanks for having me, Mirko. I'm excited to chat. I know our teams have, have, have met and chatted at various different conferences over the years, but you and I have never actually been able to connect, so why not do it over this?

[00:01:01] Chapter 2: Code RED Moments and Cloud Dependency

Mirko Novakovic: Absolutely. I'm looking forward to it. But I have to start with, the first question I always ask is, what was your biggest Code RED moment in your career?

Ozan Unlu: Oh my gosh. So good question. I think at Microsoft, for instance, I worked on server side. I worked as an engineer in the early days of Azure. I also worked on the security team that handled all zero vulnerabilities across all Microsoft products. So I have tons of those types of moments, some of which I can't talk about and some of which I can. But I will say probably more relevant for this audience is, you know, really it's anytime Amazon has an outage, right? Because I think a lot of times a lot of the cloud products that we build have a dependency on Amazon, and then a lot of our customers have a dependency on Amazon. And the last thing you want is for your observability product to go down when your customers also experiencing the same type of issues with Amazon. And so, you know, I think that's probably the biggest Code RED moments over the years. And why now we are in various different regions and availability zones and have kind of failover to ensure that for, you know, anytime AWS is actually having an outage. And also in the early days, by the way, they weren't very good at actually putting that on their website, right? They were like, oh yeah, we're kind of sort of experiencing degradation. They've gotten a lot better about it over the years. I think Amazon's an awesome product. You know, AWS is an awesome product at this point. But I think that's probably the biggest Code RED moment when you feel like, oh crap. You know, we all have a dependency on S3 buckets, and if S3 is having an issue, it's, you know, usually a bad day for everyone.

Mirko Novakovic: Yeah, I remember that. I don't know when it was when I was at Instana. Still, I think there was a moment when US East was gone. Right. Yeah. One of the biggest regions and and you had e-commerce shops disappearing and, and everyone who was on Amazon was in trouble, right? Yeah. I can totally relate to that. That's

Ozan Unlu: I mean, I remember even again, this was almost 15 years ago. I remember in the early days of Azure, AWS had like a eight hour outage and we were disappointed as well. It's not like we were, you know, jumping and whooping and hollering and, you know, celebrating because our competitor had an outage. It was like the early days of the cloud. And it was like, this is bad for the cloud. You know, we want people to have confidence in this whole concept of, I'm going to run my mission critical systems in, in someone else's data center and someone else's hardware. And so, yeah, I think I think over the years there's been a lot of those Code RED style situations. But yeah, I think those are usually the worst is when one of the big three has a big outage and it affects so many systems.

[00:03:34] Chapter 3: Global Cloud Concerns and Business Continuity

Mirko Novakovic: Yeah, absolutely. By the way, I was traveling in Denmark for the past two days and it was really interesting. It's still in my mind. I want to share it with you. Maybe you have an opinion of it, but I heard for the first time that now European companies are thinking of contingency plans. What happens if the US turns the cloud off for Europe? And they are looking into going on premise building things in their own data center. I had multiple clients asking me about it and.

Ozan Unlu: Yeah, yeah.

Mirko Novakovic: Super, super interesting and super weird, right? Because I never thought like that, right?

Ozan Unlu: It was actually funny. I was on a flight last night. And I was actually reading that article on the plane. Yeah. I mean, I think you're always going to have in times of uncertainty high volatility, you're always going to have people that say, hey, just in case, what if now I think the reality is for a lot of these cloud platforms, you know, it's not like you can snap your fingers and get off of AWS or Azure or GCP overnight. So I think, you know, it makes sense from, from anyone's perspective to always say, okay, guys, what if let's make sure that we have business continuity in the case of XYZ instances. And I think a lot of times, you know, most recently what we saw during Covid, right at the, at the when Covid first hit, sure, it rebounded pretty quickly. And I think there was you know, good things that happened to ensure that there was business continuity across all these various different enterprises and different companies. But, you know, I think that business continuity planning of what if you know, obviously Covid affected some businesses different than others. But today, you know, with obviously all the uncertainty all the geopolitical climate stuff that, you know, I don't necessarily love to talk about, but is a reality in, today's situation I think it's logical that someone would say, hey, if I'm a huge company based in wherever in the world I need to cover my bases, I need to ensure that if XYZ happens, I have a contingency plan, and I think that's pretty natural.

[00:05:41] Chapter 4: Evolution and Growth of Observability Pipelines

Mirko Novakovic: Yeah, I agree, I agree. Yeah. Let's talk about Delta so I can give you my I mean you are basically a pipeline company right? Observability pipelines or telemetry pipeline company. And I came across that category years ago when I heard the first time of Cribl, right? Yeah, yeah. At the time, it was a pretty young company and I was thinking, oh, that's interesting, right? And it's probably some sort of a niche use case. Yeah. Yeah. And then recently I read they hit 300 million in ARR, right.

Ozan Unlu: So it's 200. Yes.

Mirko Novakovic: Yeah. 200 million or so. But still it's far out of a niche now. So it seems to be. Yeah a fast growing big category. You are in the same category. And what I also, if I remember correctly, I think the founders of Cribl came from Splunk. Yes, yes. And you were a sumo logic. So it looks like you both had the same kind of problem that you saw with the logging company. Or tell me a little bit how you started or what it really does and how you see that category developing in the recent years.

Ozan Unlu: You know, it was funny. It was the first time I had heard about Cribl was I believe it was when we were raising our seed round and I was giving a pitch to someone and they were like, oh, this actually looks and feels a little bit like Cribl. And I was like, Cribl like, what is that? And I was like, how do you spell that? And so I wrote it down. I tried to try to figure out at the time they were still in, I think, in stealth, so there wasn't much information about them. But look, I think to give credit where it's due, I think they took a very simple concept and solved it in a very simple way. And that's not to put down the product to say it's a simple product. I think they had a good insight and said, hey, we're going to go attack this market with this very simple use case, this very simple way to reduce your Splunk costs. You know, again, they came from Splunk. So they had a good ecosystem partners and everything else that they knew within that ecosystem. So yeah, I think, you know, they've done a good job. I think they have a good product. I always like to say, you know, if you think about, like, the DNA of these companies. Right. Let's let's take a look at, like, the early days ten, 15 years ago, right.

Ozan Unlu: In logging, you had Splunk if you wanted to run it on prem, you had elastic. If you wanted to go basically roll your own and build it in-house. And then you had Sumo Logic, which was like the biggest, I would say, of the cloud based multi-tenant logging vendors over the years. You can't quite shed that DNA, right? So like for instance, in the APM space, it was, let's say Appdynamics and New Relic in the early days and then in the metric space, there was signalfx. There was, you know, obviously Grafana has always been around. Datadog. Exactly. And so, you know, if you look at Datadog logging product today, for instance, I don't think anyone's going to say, oh, that logging product is on par with everything you could do with Splunk and vice versa. You would never look at Splunk and say, oh, they you know, they acquired Signalfx and they have some, you know, lots of metrics, capabilities and dashboards, but you would never look at their metrics product and say it's on par with Datadog, for instance. Right. And so I think these organizations have their DNA and it's very hard to shed that DNA. And I think for Cribl, you know, I look at it always, especially in the early days, as akin to like, let's say, Kafka, right? So, so okay, if I'm going to go filter something out.

Ozan Unlu: Instead of filtering it out either on ingest or even worse, if it's indexed in Splunk. Already. And somehow I'm going to filter that out or delete that from the index. That's not as good as let's say, filtering it out in Kafka layer, right. So I think it was a very nice half step in the right direction. To their credit, I think they executed on that really well. But I think our vision has always been a lot more innovative, a lot bigger than that, which is okay if the things that work at kilobytes and megabytes and gigabyte scale does not work at the terabyte and petabyte scale, then what additional capabilities do you need? We've never been against cloud. It's always been, hey, you should do some certain things locally on the edge where the data is being created, and you should also do the cloud stuff that you need to be doing from a correlation common information model, enrichment, all the other things that you need to do, right. So we've always been pitching both. So I think that's where I would say from a cohesive standpoint, from an innovation standpoint, of course, I'm a very unbiased. I think that edge delta is, is, is on the right path, and I'm excited to see our customers happy and getting value out of it.

[00:10:07] Chapter 5: The Mechanics of Edge Delta's Pipelines

Mirko Novakovic: That makes sense. So that's also where your name is from Edge Delta. So yes, you are working on the edge and explain a little bit how it works. So you have something basically where the data is generated at the customer side. And that's part of your story of the pipelines. And then also on the cloud side. So how does it work?

Ozan Unlu: Yeah, exactly. So I think this is probably a good transition into otel. Right. So I think Otel everyone's supportive of it, I think in the industry some more than others, of course, but we are absolutely supportive of otel. And, you know, everything that we do within the pipeline is by default. Now, of course, you can do other formats, you can do transformations and enrichments and everything that you you might want. But by default we try to say, hey, look, you know, otels taking off, it's definitely the direction everything's going. So that's that's kind of the default. Now I think from a pipeline perspective, what we try to do is we try to say, okay, look, otel is a few different things. It is a format, a schema, like a way to actually, you know, make sure that you're massaging your data and getting it to the right format so the downstream destinations can accept it in a much more efficient, optimized cohesive, standardized manner. And then it's also, you know, there are certain concepts where customers or clients or enterprises will say, hey, you know, I actually want to use the otel agent, right? So we actually have a flavor, I would say, of the otel agent. Now, we created that and started that before otel had really taken off. So it's not necessarily an otel distribution, but we have an agent. You can deploy that agent. Kubernetes is obviously a big use case. But then also let's say for network data, you might be a syslog listener listening over UDP or TCP on a specific port.

Ozan Unlu: So we support any various different edge use case. Again, you know, log files or, you know, grabbing from Kubernetes APIs or whatever else. Right. So that's certainly where the edge component comes in. But then of course, from there you have different layers, right? So for instance, you might say, hey, I want to run this as a daemon set, or I want to run it a little bit more centrally, or, hey, I want to run a gateway right where I can actually go have lookup tables that I can distribute out to the gateways, and it doesn't have to be central. So I would say for us, again, we think there's a lot of different layers to this. If you want to have a very nice, scalable architecture. Again, a lot of our customers will be working at the hundreds of terabytes, if not petabytes of data that are creating every single day, right? So at that level, you have to be very thoughtful about what architecture you want to use. And we're saying, hey, the appropriate things that you should be doing literally within those nodes, within the EC2 instances within EKS, within like even very close to the Lambda functions, where we want a listener and the lambda function to talk directly there, whatever, wherever that data is being created. There are times where you want to be right there, and there are times where you want to be in the cloud. And again, we offer both capabilities to our customers.

Mirko Novakovic: Okay. So it's a little bit like if I ask like the collector. Exactly the way like.

Ozan Unlu: Sumo logic. Head collector. Splunk. Head forwarders. You know, there was obviously from from elastic standpoint. You know logstash filebeat like there's always, you know, there's obviously fluent. D fluent bit. There's always been this concept of an agent for us. One of the exciting things about pipelines is when we have our own agent, we can actually give a full, cohesive, end to end experience to the user. Right? Like, we are absolutely a telemetry pipeline company. Now we also have an observability platform, but that's mostly just to make our pipelines better. What we saw in the early days was, let's say you're a Splunk customer, and let's say you want to go modify your data and transform it within pipeline. Right now, if you have to wait for that data to be sent to Splunk, Splunk has to index it. Then you have to make sure that the data, let's say you can query it. And then that query actually puts that data on a dashboard. There's a lot of different steps there to validate that the transformation that you've done is the transformation you actually want. And what we saw typically is that time frame was too long. So whether it was five minutes or 15 minutes or 30 minutes or sometimes based on your setup and how many hops it takes, sometimes it might take an hour, right? Nobody wants to make one small regex change and wait an hour to see if that data is validated, right? And so for us, we have a very simple observability solution that's scalable.

Ozan Unlu: But again, it's not our core product. Our core product is pipelines. That observability platform allows our pipelines to be better. But realistically we are a pipeline product. We are telemetry pipelines. We love working with vendors such as Dash0, for instance, to be able to say, hey, look for the customer. We want them to have full control and flexibility. So for instance, let's say, you know, we mentioned Datadog, we mentioned Grafana, let's say, okay, they're on those solutions. We want them to have the flexibility to say, hey, Dash0, they're doing some cool things over there. Let's go try it. And instead of trying it taking a month now trying. It might take 5 minutes or 10 minutes with just adding an additional endpoint and saying, hey, I want to also fork my data and send it over to Dash0 and see what this solution can offer me. And so I think that control and flexibility, we are entering an era where that is being a requirement for a lot of customers.

Mirko Novakovic: And what is kind of the primary use case, what do you do with the pipeline is that reducing data, taking out passwords, what is kind of what would you say is the key use case that everyone has?

[00:15:19] Chapter 6: Data Tiering and Storage Strategies

Ozan Unlu: Yeah, I think you touched on two of the big ones. I think we talk a lot about this concept of data tiering. Right. And data tiering means we know you want all your data. You just shouldn't be storing all your data in an expensive premium solution like Splunk, right? So we say take all your data. You should store all that data in, let's say S3 or GCS or Azure Blob storage, whatever object storage you want to. Or it could be on prem, right. Whatever storage you want to do to store all that raw data. You should do that. In addition to that, there should be subsets of that data to go to other platforms. So, you know, we have large fortune 100, fortune 500 companies that use our products. So of course you're going to have, you know, security teams using Splunk. And then you have this operation team over here using Datadog, and then you have this other team rolling their own elastic or Grafana. And you basically have this concept of data tiering. So you say all your data goes into the lowest cost storage, then subsidize that. And it's really you can almost think about pipelines as that comprehensive toolbox that gives you all the capabilities to effectively implement this data tiering in the most effective way possible. Then of course, we have this concept of rehydration, right? So let's say you're saying, hey, I'm a security team. And yes, okay, I need to have access to all my raw data. I agree that I shouldn't have all that raw data going into Splunk at all times because it's, you know, obviously slow and expensive and isn't the future. And so if I do have a forensics investigation or audit or compliance use case. I do need to be able to select that data and pull it into Splunk. So that's absolutely a capability that we provide within pipelines as well.

Mirko Novakovic: I saw I mean you have this nice UI right where you can design and see the pipelines. Looks a little bit like a workflow engine right. Where you have the different steps. And so if I design that and I decide to remove some data, route it to somewhere, do you decide where you execute that if it's on the edge or in the cloud, or is the user deciding where to do that?

Ozan Unlu: So I think it's a little bit of both right. So let's take a look at let's say you had this concept of edge agents or you had cloud agents, right. So we support both. So cloud agents for instance, let's take an example. Lambda right. Like within Lambda, if you wanted to put a tiny little extension within Lambda to say, hey, I want to post messages to that endpoint. You know, yes, you could use an edge agent, you could go send that somewhere. But realistically, you want to probably use some sort of SaaS service in a cloud agent. Those are separate concepts, right? So. For instance, you'll have edge pipelines. You'll have cloud pipelines. So in the case where you update your cloud pipeline and like you said, in a nice visual, cohesive, everyone can see the edits, they can see the changes, they can understand. One of the biggest things we saw in the you know, I think you mentioned the collectors, right? The collector or the Ford era is let's say you and I were both working within a certain organization. I might not know or understand all the changes you're making to the collector or the forwarder configuration. Right. I might not understand all the changes you're making to effectively what used to be a very basic pipeline that didn't have a lot of the capabilities. But now with Edge Delta, we can see what each other are doing. It supports Rbac, it supports version changes. So let's say you were able to update a pipeline and add an additional source. Next time I go log into Edge Delta, I can see I can be like, oh, Mirko just added that source. Okay, great. Oh interesting. It's actually ten x our data. Maybe there's a valid business justification for that.

[00:18:49] Chapter 7: Pipeline Visualization and Collaboration

Ozan Unlu: Or maybe I walk over to you and I say, hey, Mirko, like, what are we doing here? We're 10xing the data, right? So that visibility is great for both of us. Now we can collaborate, we can work together. And it's not this kind of opaque unclear. You know, one of the last projects I worked on at Microsoft was actually, like a data access layer, right? In the early days of software development, you had your applications and you had your databases. And let's say you had ten applications and they're accessing ten databases. It was nuts to say that every application should directly access all those databases, right? Because now all of a sudden, you have one slight variation in this application and your database is getting mucked up. Right? All the data is changing. So we took that crazy concept and said, oh, but for observability it's fine. You can have all these different styles of data and all these different formats and all these different standardized, different, you know, schemas, and everything's just sending data into all the different Splunk indexes or Elastic or Datadog. And that's nuts. Like you look at it and it's nuts. Right. So that's really the concept of pipelines, right? Which is, hey, if you and I are working together, we can modify the data access layer, which actually still standardizes everything and all the downstream systems, whether it's Datadog or Dash0, they still have a cohesive understanding of that data. It's all standardized. It all works well. And so now if you want to add a data set to Dash0, or I want to add a data set to Dash0, it's very easy for us to do that and it's very easy for us to collaborate on it.

Mirko Novakovic: A question I always have is if, if, if you do that, you kind of do it also because of cost, right? Because as you said, you don't want to store all the data inside of a very expensive Splunk, for example. Right. How do you see the idea of having a data lake for observability data, maybe iceberg bays or whatever? And, and then you essentially would have one data lake for all of the security observability data. Is that something you are looking into. Because it's kind of in your, in your product space, right, where you could also provide something like a lake and you say, hey, by the way, I'm not only providing the pipeline, but I can provide you with a data lake where you can have different requirements for cost or retention, etc. and, and I will deal with it also on the database side. Right.

Ozan Unlu: Yeah. I think it is our worldview. Let's jump. You know, maybe five years if we're being optimistic, but okay, let's jump ten years into the future, ten years in the future. You have your pipelines that are standardizing data on the way in. So whether that's, you know, Otel or Ocsf, you have your storage format. Let's just say iceberg. You mentioned, right to standardize store that data. And then you have your standardized query language, right. Like for instance, there's OQL or other, you know, potentially pipe SQL or whatever else that's going to be the future standardized. Those are the three components that we look at that are going to be the future data stack. Now, that being said, if you look at, let's say let's just use Splunk as an example, let's use Splunk as an example. And on the other side, let's use snowflake as an example. Right now the concepts for data analytics and observability and security, the concepts are similar right. Where you have you know you have ETL, ELT on one side. And then now of course we have transformations within pipelines. So the concepts are always similar. But how those systems are optimized are drastically different right. If you look at for instance on the observability security side you have an edge delta. If you look over here there's you know, DBT there's matillion. There's you know, other type of sort of it's the exact same concept but a very different customer. And those two solutions do not do the same things. Right. Similar to Splunk and Snowflake for instance.

Ozan Unlu: Right. Snowflake is a lot more optimized for like batched queries, use cases BI analytics and Splunk is more optimized for real time, right? Operational use cases. And I think those two worlds are so different that yes, they can use the same standardized data lake on the back end. And I think that gives you also additional benefits for being able to tie some business analytics and metrics to operational metrics. And I think that gives you additional, I would say, capabilities and potential. But realistically those are such different platforms. Again, one optimized for much more batched and much more kind of, you know, for instance, in snowflake, you don't mind if your query takes two hours to run, right? If you're in the middle of a life site outage, your Splunk Query, you want that to be milliseconds in seconds. You know, even minutes annoys you, right? And so they're just optimized for very different use cases. But I do think to your point, if we can standardize on, let's say iceberg and you have a standardized data lake, it does make sense that those two products or platforms or different types of technologies can hit the same data on the back end. That totally in our worldview. Again, we look at those three different pieces, pipelines that standardized on the way in data that's indexed in a standardized way, and then a query that's standardized to be able to hit that data that, you know, again, 5 or 10 years from now, absolutely. Is our worldview of the future.

[00:24:00] Chapter 8: Contemplating a Unified Observability Data Lake

Mirko Novakovic: Yeah, absolutely. We also, especially on the query languages, it's a little bit sad that also Otel doesn't have or hasn't been able to standardize yet on it. Right? Because that's a big pain point also for customers, right? Because every dashboard, every alert, everything you do is based on queries. And that makes you really locked in to a proprietary query language. Right? But I agree I think it will be standardized.

Ozan Unlu: Yeah. And look, there's, you know, the OTQL project that a lot of great engineers are working on. And we're trying to contribute to as well. So I think, you know, it will take off. I think Otel needed to take off first before OTQL could take off. So I think it's appropriate kind of this timeline that we're going down. So I'm excited. I'm excited for what's coming next.

Mirko Novakovic: And talking about excitement, what's coming next. I have to talk about AI at the moment, right. So how do you see AI affecting observability? What do you see in terms of the usage of LLMs? Agentic AI right. There's a lot of companies starting SRE agents troubleshooting agents. We also are working on something similar. Right. Which is sometimes really scary. If you see how these agents use your MCP server. And it's crazy, right? But yeah. Yeah. What do you see? What do you have? You already implemented something or released something? What, what's your view on that?

Ozan Unlu: So look, I'm very fortunate that, you know, ChatGPT hit in whatever November, December of 22. Everyone started using it. And I'm very fortunate that I started getting invited to these AI dinners and events and, you know, a lot of extremely smart, you know, machine learning and AI engineers in a room all discussing. And in the early days, it's funny that I say early days. It was like two years ago, but in the early days, I was like the Mad Hatter in the room. And I was like, I was like, hey, we are not even thinking about like, how much data is being created. I think we're going to have a huge margin problem with AI. Now, of course, you know, you and I, we come from like the observability security background. And so for you and I, we understand that like some of our customers are creating petabytes of raw data a day, right. Nobody's putting petabytes into nobody's even putting terabytes or gigabytes into it. Right. Unless you have billions of dollars you just want to blow on, you know, OpenAI costs. Right? Or Amazon bedrock costs. Right. And so in the early days, I felt like I was trying to get people to understand, hey, LLMs on large data sets are going to be its own entire challenge right now. Us humans with our inefficient fingers can only type so fast, right? So if you're typing into a ChatGPT prompt, sure you're only going to be able to create a little bit of data right now.

Ozan Unlu: Machines creating, you know, millions of events per second. You have to be very thoughtful about how you're using AI and observability. And so that's always been our approach is to say, and I would say we almost got lucky. We almost stumbled into it. So we said, wait, hold on a second. These pipelines that we've been working on for 5 to 7 years are now actually the only way you can use AI and observability, because you have to identify what are those tiny little nuggets of high value information of all the terabytes and petabytes you're creating a day? What are those kilobytes and megabytes that I'm actually going to go feed into AI and try to get a response from. And so we have this concept at edge delta called continuous curation and inference. So we are continuously curating your massive data streams. And finding those small little anomalies and nuggets to be able to feed into AI. And we had this concept of on call AI, right. So what On-call AI will do is, hey, we found an anomaly. There's a statistical deviation from baselines. And if you're going to wake up at, let's say, two in the morning because you're the on call engineer, you wake up and all of a sudden you effectively have this mixture of experts, all these different models that are telling you, hey, I think the issue is this and I think this is what you should do to fix it.

Ozan Unlu: And then your team can actually vote on those models and say, for instance, this model is really good at it. This model is really bad at it. Now, we haven't quite gotten to the stage yet because again, we're very careful with our customer data, so we haven't got to the stage yet where we're doing any type of training or fine tuning or anything like that, but that's something that we want to offer in the future. So in the future, we want to offer the ability for customers to say, hey, I want my own dedicated model that I can, that I can go fine tune or, hey, I want to use a community model and I'm okay contributing to that. Or you know what? Hey, I just want to use the base models and be okay with that as well. So right now we just basically have the base models and we don't do any training or fine tuning. But you know, of course, if we have a large customer that says, hey, we want to own the models and we want to use your product to go fine tune those. That's obviously something that we would love to explore in the future as well.

Mirko Novakovic: Yeah, absolutely. We I mean, we released a feature we call log AI. We have essentially the similar problem that you described. So the idea was you send in an unstructured lock and we want to structure it for you. Yeah. That's kind of the basic idea. We wanted to use an LLM, but as you said right. We can't call an LLM. You can copy an unstructured lock into ChatGPT and it will be actually pretty good in analyzing it. Right. Yeah. But then you can't do that, right. You can't for billions of logs. You just can't call ChatGPT all the time. Right? So we had to figure out how to use that mechanism and, and basically fingerprint the locks and make sure that we don't call it and then extract for example regex.

Ozan Unlu: Yep.

Mirko Novakovic: Out of the and use the regex on the. So it's a very interesting problem. And we also learned I don't know how your experience is that you need very different types of developers. Yeah. So the classical, I would say machine learning, data science, people that use Python, and they also don't think in a way where it's around petabyte scale and, and, and, and working with millions and trillions of of logs and metrics and how to, how to execute the algorithms there. Right. It's a very different problem.

Ozan Unlu: Yes, yes, exactly. And I think that's sort of led this, this, these different types of customers have sort of led to, I would say, less progress being made with AI in large enterprise, because again, you know, that data scientist is very different than, you know, DevOps, SRE platform engineer on what they need. Right. And, and the way that they think about it. Right. Okay. I'm going to go, you know use Python and go figure out how okay, over here, you're going to be using regular expressions and Splunk queries and other things. Right. So it's very different in the thought process. And I think that's why for me, I think a lot of customers know that they need to invest in it. They want to go in that direction. But there isn't this cohesive solution that just works across a large enterprise for every single team. So I think it's it's both frustrating but also exciting because there's a lot of potential and things are moving so fast that there's just I don't think anyone has a good idea of where we're even going to be in a year, much less where we're going to be in 3 or 5 years. So I think that part's super exciting for me.

[00:31:14] Chapter 9: The Role of AI in Observability

Mirko Novakovic: No, absolutely. And I talk to many CEOs on the podcast, and I ask them what will be in a year or two years. And I essentially everyone says the same. Right? I have no idea.

Ozan Unlu: Yeah, yeah.

Mirko Novakovic: Because things are moving so fast and you really have to learn every day what's what's up? There's a new metal here, new model there, right. New capabilities. And we are also almost starting right now to figure out what we can do with those models. Right. It's kind of a black box that you need to figure out. What it can do, what it can't, and where it's good at. It's a very fascinating, interesting time.

Ozan Unlu: And I think for people like you and I who are putting AI into the product, but we are not like core AI products on their own. I think for us it can be exciting because for us, we can kind of be on the sidelines a little bit and watch. I would say I have a lot of respect for people that are just jumping in and saying, no, I am, I am an AI company, and I'm going to go compete with all the rest of the AI companies. So I think for us, we're a little protected because we have a product that we can have AI as enriching that product versus going directly and being an AI company overall. So I think that's where it's exciting for us and maybe a little bit more daunting or scary for some other founders.

Mirko Novakovic: Yeah. I have to say that I always you talked about this DNA of a company, right? And I actually always say exactly what you said. I say every startup creates a DNA. By the way, I think that DNA is created from the beginning right in the first year. Somehow you create a core DNA of your product and none of the products, especially. Let's talk about the observability space ever changes that DNA. Yeah. Yeah. Somehow. Right. It always like, I don't know, a Dynatrace started with the pure path. And then it's now, I don't know, 20 years old, but still it's in the core DNA right around that kind of pure path trace. And you, you see, the same as you said with Splunk around log and none of the other products, you can buy a signal of ammunition, build everything. But at the end, the core DNA is around logs. And so I totally agree with you. And so I'm a little bit scared for myself though, as we are, I'm not our core DNA. When we started with Dash0, was not I right? Right. So the question is, if I is really changing everything do we have to kind of change our core DNA towards being AI first, or whatever you want to call it, or can we just use it as a, I would say, as a tool where you enhance your product, or do you have to rethink the full product? Right. That's kind of what I wouldn't say. It keeps me up at night, but I spend a lot of time trying to understand if I have to turn everything upside down and start to rethink the whole observability space. I don't see it yet, to be honest. But the question is, will it happen? And then how fast do you have to react to to adopt to this AI first observability then?

Ozan Unlu: Yeah, I mean, I think it's not a concept I invented, but, you know, I read about and heard about quite a bit, which is this concept of like a bridge to familiarity. Right? I think, frankly, in the early days at Edge Delta, we made a mistake by being too futuristic. And what I mean by that is we had a lot of customers early on. That said, undeniably, this is impressive. Undeniably, it's incredible what you guys have done, but I don't understand how I get from what I'm doing today to this. Right. It's too big of an incremental gap where I don't understand what I'm doing. And so I think for us, AI is going to be incremental because I think there are so many requirements from an observability and security standpoint that someone's going to say, hey, I love it. I love what you guys are doing with AI, but I still need X, Y, and Z. And if you can't show me how I can do that, then unfortunately this will be no bigger than a side small, fun little project for us. But if we want to seriously do this, it needs to be able to kind of support all the rest of our use cases, especially in enterprise.

Ozan Unlu: Right. And so I think that's where for us, our customer is large enterprise. We work with a fortune 100, 500, 1000, etc.. And so for us, I think it'll be very iterative. So maybe less scary where, you know, I don't have to think about changing our DNA. Our concept has always been intelligent telemetry pipelines. From day one, you can go back to five years ago and look at our press releases talking about intelligent observability, telemetry pipelines. So that's always been in our DNA is to kind of have that machine learning and intelligence built in. I think obviously with most recently all the LLMs that have started to get very popular and, and a lot more useful. I think again, for us, we have to look at iterative and how do we kind of add those into the product. But it's not going to be a core shift for us because, again, there are so many very legitimate XYZ requirements for observability security.

Mirko Novakovic: I agree with you. As I said, I also at the moment I can't see it. Right. Because also, as you said, right. We have different problems. Right. And, and with the massive amount of data. But still I'm starting to understand maybe there is something I can't see right where it is coming out of left field and hits me.

Ozan Unlu: Yeah, yeah. I mean, I think to your point, you know, you mentioned the Dash0 functionality, right, of, hey, let's have this thing start writing some regular expressions for us. And, you know, it's not going to necessarily be something that we do 100 times a second, but it might be something that you do 100 times in a month, right. And so that's a great use case for, hey, how do we go layer in AI to take some of the manual things away and to reduce the human cost. But again, it's not going to fundamentally change, you know, your back end and how you're doing logs and metrics and traces and, and all of the things that you need from a requirement standpoint for observability as well as security use cases. But I think, yeah, I think it is important to be able to layer it in. And that's the part that's exciting is you can kind of add it in all various different parts of the product, but it's not going to be as one massive model that just says, you know, send me anything you want. And all of a sudden I'm magically going to do everything with it and maybe, maybe in ten years, but, it's going to require some many hundreds of iterations to get there.

[00:37:41] Chapter 10: Reflections on Company DNA and AI Integration

Mirko Novakovic: And on the other side, I say, and that's part of where you are in with your product, right? We are creating more and more data right at the moment. I don't see that we create less data, but with microservices, with things growing higher scale clouds, more, more tools, even LLMs. Right? They are creating more data. And so I really can't see how you can solve that problem at the moment with that AI that you just have this massive amount of data that you send into those pipelines. Right? And then you have to decide where to send it and what to keep and how to maybe optimize your cost structure.

Ozan Unlu: I think, you know, luckily the token token costs are coming down, but realistically, we're still orders of magnitude off from, you know, just saying. Okay, it's now cheap enough that I can just go send a petabyte of data into an LLM and LM and just say, okay, hey, you know what? If you see something wrong, let me know. Again, you know, we might get there. Maybe. Who knows? With quantum computing and solar and everything, who knows where we're going to go? But yeah, not this year. Not next year. Probably not the year after that. But at some point, at some point in the future, I'm sure we'll get there.

Mirko Novakovic: Perfect. It wasn't super fun talking to you. I think it's a very interesting category. This whole pipeline category, which I like a lot.

[00:39:01] Chapter 11: Conclusion and Future Prospects

Ozan Unlu: I'm excited about Edge Delta plus Dash0. You know, obviously for us, we're a pipeline product. You guys are phenomenal observability product. I think, you know, we can do a lot of great things together. So thank you so much for having me on Mirko. It's been a pleasure. It's great to finally meet you and excited for what we can do together.

Mirko Novakovic: Thanks for listening. I'm always sharing new insights and insight and knowledge about observability on LinkedIn. You can follow me there for more. The podcast is produced by Dash0. We make observability easy for every developer.

Share on