Episode 345 mins7/11/2024

Monitoring in Milliseconds: Real-Time Observability and Preventing Outage Pain

host
Mirko Novakovic
Mirko Novakovic
guest
Geeta Schmidt
Geeta Schmidt
#3 - Monitoring in Milliseconds: Real-Time Observability and Preventing Outage Pain with Geeta Schmidt

About this Episode

Pioneer of index-free storage and co-founder of Humio Geeta Schmidt clues us in on Humio’s product DNA, explains how observability overlaps with cybersecurity and reveals what she’s looking for as a new industry investor.

Transcription

[00:00:00] Chapter 1: Introduction and Defining Code RED Moments

Mirko Novakovic: Hello everybody. I'm Mirko and welcome to Code RED Code because we are talking about code and red stands for requests, errors and Duration. The Core Metrics of observability. On this podcast, we will hear from leaders around our industry about what they are building, what's next in observability, and what you can do today to avoid your next outage. Today, my guest is Geeta Schmidt. Geeta was co-founder and CEO of Humio, which CrowdStrike acquired in 2021. Humio pioneered index free storage. Geeta is a well respected leader in the observability space. I'm excited to have her today. Geeta, welcome to Code RED.

Geeta Schmidt: Thank you, and thanks for hosting this. Mirko. It's been a while, so it's good to see you in entrepreneurial mode again and starting something new.

Mirko Novakovic: Absolutely. We will talk also about your new career, which I've read, but we always start with the first question. What was your biggest Code RED moment in your career?

Geeta Schmidt: Oh man. You know, that's a tough one because like, although we've had a lot of big code reds in Humio days I would say sort of the big code red is really what many of our customers have faced. And so very early on in Humio days, we got an early customer who ran into an outage essentially. And we'd happened to be at the conference, at a conference where they were and they really didn't know, really. They just started using a sort of observability logging metrics tool and, and sort of new to this area and really sort of lost. And this is where the reality of context really could help, where we could help them kind of guide the questions and get the context quickly. Also realizing that Code RED is red when you know nothing. And time goes on. As time goes on, you get more and more red. The team gets more and more worried. The lack of context and the lack of signals. So, one of the things we were able to do was kind of pull them out of red, right? And I think that when companies understand when they're maybe getting close to red, that's where observability sort of takes in. So when, when that sort of habit of understanding what is normal, which is, you know, green, it could be yellow, it could be okay, this is going to be red for a bit, but we're, you know, and so really this idea of what is comfortable, what is the norm, what is outside of the norm.

Geeta Schmidt: And many times this is where I see observability and understanding observability in your team matters. And so that's where the Reds have come up. And this is one incident. There have been tons of code reds. I mean I think, you know, you know them, I know them. I think anyone who's probably listening to this podcast knows a Code RED feeling. And I think it's a physical feeling as well a emotional feeling. So teams that are trying to solve Code RED, it's not only the tooling and the product, it's the oh my gosh, like, I need to figure out what is wrong and and sort of almost a pain in the gut of the team that's either trying to solve it or the product that's maybe caused it or, or all of the above. And so when we think of red, which is, is a really great name for this podcast is something that I think everyone in the industry has felt at some point in time. And, and actually when you feel it, it's actually a physical feeling.

Mirko Novakovic: It is really interesting .

[00:03:46] Chapter 2: The Emergence of Observability

Mirko Novakovic: Because I was just remembering a big outage I had, and it really gets physical immediately. Right? Because you feel that pain when the system is down. Everybody is kind of looking at you and you try to solve. And you have absolutely no idea, right? And then and then you start looking around and you need tools to support you, but also the knowledge and yeah, it's stressful. Right. Absolutely stressful.

Geeta Schmidt: Yeah. And I think one of the things that adds to the stress is the lack of visibility, the lack of understanding unable to communicate effectively with your team and being able to share the information properly. So all of these elements are happening all the time. And if you're not able to do some of these things, the stress level increases, which also does not actually improve your ability to you know, get your service back online or stop an incident or, or get back to normal. And, and I think about that a lot is because oftentimes we've talked about observability as being sort of a stack or three pillars or something simplistic. And it really isn't that. It is really. It's almost like you've got a body coming into or someone coming into er and you're trying to use you know, all of the contacts that you have as they come in to address the situation. Stop the pain as quickly as possible and then realise what your long term plan is to solve the problem. And if doctors got, you know, totally crazy and chaotic and couldn't really read the signals they'd have a very hard time, you know, saving your life. Same thing.

Mirko Novakovic: Absolutely.

Mirko Novakovic: Yeah. Let's talk about Humio I remember you basically started almost at the same time. I think you started a little bit later than us at Instana. We started 15. I think you 16 or somewhere around, a year after?

Geeta Schmidt: Yeah, 16, a year after.

Mirko Novakovic: And at that time, there was no observability. Right? We were in separate categories. Instana was an APM vendor and Humio was a logging vendor, a log management vendor, And I remember when my CTO, Pavlo, came to me because he knew Creston, your CTO, and said, oh, Humio is was building this cool new index free storage. And I was like, oh.

Geeta Schmidt: What's that got to do with the rest of what we're doing? Etc.. Yeah, yeah. So I think you know, it's funny, it doesn't seem actually that long ago. But yeah, you're right. Observability was not really a word or a title as it is nowadays or or a culture. It was very much very new. There were some early, let's say, Silicon Valley companies talking about it in terms of managing quite complex, you know, microservices you know, cloud native infrastructures. And so you really saw sort of the first signs of thinking about logging metrics and tracing and understanding your the health of your own service, because many a many times these teams were running these products themselves. So like if you built an application or service, you're responsible for the production and the runtime of these products. And so that's I think where we started hearing observability, where there was really like, not this. Like I'm a developer, I write something, throw it over the wall and hope it works. No, it was more like you were responsible for that, that product. And so that's where, you know, we started probably hearing it most from our side. I think in our own experience, we built Humio out of running some pretty large public infrastructure products that we built in Trifork as a consulting company. So we built the sort of national health care system that tracks all medicines for all citizens in Denmark. And by doing that, we built it, but we also ran it. And to run it, it needed to run 24/7. It couldn't go down. That meant, you know, if you went in and needed your prescription, you wouldn't get it or you wouldn't get your surgery because we couldn't see what you had, etc..

Geeta Schmidt: So we realized this was the first time where we really worked on an incredibly mission critical product. And 24/7 really meant something to us. And that was where this whole idea of, you know, thinking about having this access to data, access to context came at least true for our team from an observability point of view from the logs and from the data that we were using when we were using, you know, other products. But we found that some of those products were bad at doing some things, like bad at letting you put lots of data in them. You know, they're getting difficult to query when your data volumes increase and oftentimes your data volumes increased. You know, not because you chose it, but just because activity grew or, you know, for positive reasons. So you want to be able to, to capture that. So we started thinking about how do we make it more efficient? How do we actually think about querying on the fly? Right. Like just, you know, as the data is coming in. And so those are sort of the early things and thoughts around index free logging and also really storing data efficiently for long term storage. Because most of the data you're not interacting with all the time, usually logging systems are a few folks running and querying them. So most of the querying is happening, you know, kind of in a time window. But then you do want to store this data so you can potentially go back in time.

[00:09:29] Chapter 3: Humio's Index-Free Storage and Data Handling Innovations

Geeta Schmidt: And so the queries that really needed speed are the ones around. Like maybe, you know, in the last half hour and the next half hour. And this sort of time frame is what you're looking at. But long term storage ended up being also really interesting because you, you find patterns, you find things you want to go back to. there's also compliance reasons. And you want to go back to. So in the long story, we started to think about the problem as we saw the future, which was astronomical growth in the amount of data that people want to store. And the only way to do it is to do it efficiently as to not put indexes in the front and let stuff come in as fast as possible, because that was the backup, really. With, you know, kind of any sort of standard structured, a lot of structure in the front end is going to stop data coming in. So we tried to make it as clean as possible and make it very nice to be able to query it out with speed. So index free. Yes. It's it's no longer novel. Let's say because that's, you know, where a lot of the world has gone. But at the time, like looking back, no one was thinking about this astronomical growth that's coming. And now, like, you know, you can see with kubernetes and microservices and cloud native and even Covid has exacerbated the kind of amount of data we're looking at. So now we're kind of in a completely new chapter as we fast forward to here. Time wise?

Mirko Novakovic: No, absolutely. I think at first when I heard index free for somebody who's in computer science and I've worked with databases for a long time. Sounded pretty crazy, right? How should that work? How should that work? Right. Index free. That must be super slow. But then I heard, oh, it's actually super fast and it can scale. And everybody knows that an index is actually a problem, right? It's a problem when you scale. It's a problem when you add nodes to your system. And it's also a problem because basically if a system relies on indices then you can only query what you have indexed, right. Because everything else is super slow.

Geeta Schmidt: And go back and reindex. And the problem is, most of the times when you're doing a lot of deep discovery in logs and like in metrics, you're getting like, you know, you're getting all of that and the structure is what you need actually when you're looking at metrics. But in logging, you're kind of going there because you're really trying to search something you didn't know you needed to search. And ultimately, that's the problem. And trying to like, sit there and just, you know, guess at all of the various parameters you would want to be searching on. That's usually where it breaks, right? Because the unknown hit or the ones that the really painful outages, the really painful problems are the ones you couldn't predict, ones that you need to search and find quickly. And, and that's kind of that index free feel that is completely like, okay, I can just free text search if I want to. Right. And that was a very different feel than let's say like some, you know, other products like elastic or others that are actually very heavily indexed. The other thing is the speed where people were, you know, really needing a system in an outage, but they had to wait for the data to be indexed.

Geeta Schmidt: And just imagine back to your Code RED feeling like you know, you know something is wrong. You know your system is out, you can't transact customers, but you need to wait ten minutes to get the data in. And that's and that's where like that feeling of uncomfortability or your gut is really kind of like, okay, there's got to be a better way to do this. So, this was ultimately the problem we were trying to solve from a data perspective. And again, to be quite honest with you, I think we were looking at an angle of observability. You all were looking at another angle of observability. And I think observability had really not been defined in any real way. There were some conversations in the Valley. But, you know, I don't think many companies were really thinking about it as they do today. Like there was, as you said, APM, which was where you guys were playing. Right?

Mirko Novakovic: Yeah. And what we did is just when we saw that people were asking for logs, right? We didn't have that in Instana. Then we did an integration with you, right? Yeah. I remember we had very, very well integrated into the product so that we could solve. But it also comes with some problems. Right? Because it's hard to query both then, because it's basically different systems. It's more integration. But it was a step in between the transition of classical tools like yours or ours, who were really good in one category, transitioning to to platforms. Right. Platforms in observability. But and that would get to my next question here about CrowdStrike, because it's not only observability we see with platforms like Datadog or also Dynatrace moving more and more into the security space. And I remember pretty well I had sold Instana, and I was really exhausted, and I was on an island, and you called me and you were telling me that you were in discussions of of being acquired and ask me what I think about it. And I was surprised that at that time that it was a security and probably the leader in cyber security was looking at observability logging tool.

[00:14:58] Chapter 4: Transition to Security and the Acquisition by CrowdStrike

Mirko Novakovic: So what was the rationale for CrowdStrike to acquire Humio? And I think today it's kind of more obvious. But at that time it was also a bold move somehow for, for CrowdStrike to acquire you and integrate it into the platform.

Geeta Schmidt: Prior to the conversation that you and I had you know, we started really kind of pivoting into security use cases very large kind of security use cases with data volumes that were quite large and growing banking or public sector. And many of those had customers that were also customers of CrowdStrike. And many of these large customers had quite a bit of endpoint data that they were sending to, let's say, a logging system or a SIM solution. So what was happening was since we were, you know, very efficient to run, we were turning into a very interesting platform for security users, secops teams where they could get millisecond results on their queries. Using some of our competitors, they would make a query on terabytes of data a day and they would just go to lunch, come back, you know, hope the query had ran because unlike maybe the developer or DevOps use cases, security is running lots of queries all the time. They're kind of always looking at, I mean, in terms of this idea of what is normal, what's outside of normal. That's what security and security teams are always caring about. They're always looking for the anomalies or the potential exceptions or those incidents that are about to happen. So so there's a lot of queries that are happening daily on large volumes of data. And they can't wait. They get. Wait. If they find an exception, they can't wait to drill down because, you know, seconds or minutes of an intruder in a environment is lethal. I mean, it's basically lethal from a system perspective and a security health perspective. So speed meant so much to security teams. So this millisecond query speed on, you know, terabytes of data that was incredibly addictive for security teams.

Geeta Schmidt: So we started really moving in from an observability to observability customers, which is where our background came from. But we started moving into many security use cases, and that's where we met CrowdStrike out in, out in the wild, I would say. And the conversation started about, you know, the compatibility's there. And CrowdStrike creates lots of data, lots of endpoint data that is interesting for their customers to look at. But they were unable to store most of it for longer periods of time, so they was really getting pushed out to other SIM systems. So the ideal would be to take a Humio or now it's called log scale, but essentially an integrated platform where you can store the data, their own endpoint data and other products. As you know, the platform has expanded cloud, applications, etc.. And, you know, really we're forming this next generation SIM. there's a lot of kind of the legacy SIM´s, as you call them. The early SIM´s have really been kind of around putting all this data together, event management. But there's also been a lot of you know, kind of struggles with the ability to correlate and search the data quickly. And that is one of the things that, you know, working with CrowdStrike as well as partners of CrowdStrike, we're able to put a lot of that data together. So this vision or this idea is something we actually saw some of our leading customers doing, I would say. And we realized that if we make this easier for larger number of companies, they will you know, be interested. And so this has obviously been a progression. But when you actually see CrowdStrike talking about next generation SIM, it is Humio and the back end running that for our customers.

Mirko Novakovic: Yeah. It's interesting . So that basically means that you started in the observability space, but then the customers used your tool because of the capabilities of being fast and being able to store a lot of data and querying that data fast to use it for security use cases.

Geeta Schmidt: Yeah. I mean, interestingly enough, CrowdStrike has really built a phenomenal, let's say, security observability platform. there's a lot of similarities in the problems that we're trying to deal. If we're a Secops team or a DevOps team, you know, and we are all both in like in Code RED situations. Right. And the context and visualization, the interaction of the information, the ability to, you know, go to the next steps or really shut something down or fix it or, you know, do some type of automated change, etc. Those are all things that we're actually doing in both workflows. So although our tooling is somewhat different, right. It's kind of funny that way when we think about it. But you know, our tooling is and is different because oftentimes developers are pretty sharp about building their own stack. Right. Like, you can just take some things that are open source and maybe pull them together and write some parsers and, you know, do lots of things that we're comfortable with doing. So oftentimes maybe fortunately or unfortunately, developers have gotten sort of used to pulling together their own observability stack. If you look at security, they don't have time for that. They don't have the skill set for that oftentimes.

Geeta Schmidt: So it's rare that you will see a security engineer be very good at wanting to do all of this stuff. They actually want to do their work, which is really keep the company secure and make sure that they've got the right processes in place. So they take that very seriously. So they expect a lot from the stack, and the stack should work well together. The platform should work well together. Et cetera. Et cetera. So that's one of the reasons why you see more of this. Let's say melting of functionality. So like, you know, as we say in observability land, you know, the logs, metrics, traces, like there is that essentially in security as well, and also, you know, incident response, etc., all built in although it's it's oftentimes much better integrated because they need to resolve things very quickly. So I think there's a lot that we could probably learn on the developer side Reside on, on our own, you know, DevOps side and observability side. From the security side or the security use cases because they have actually. Kind of cared a lot more about the entire experience of the workflow. Working very quickly.

[00:21:51] Chapter 5: Integration Challenges and the Future of Observability

Mirko Novakovic: I mean, you are spot on. I have interviewed around 100 potential customers in the last few weeks, and I did actually analysis how. Many of them use out of the box tooling and how many build their own. And it's still around 60%.

Geeta Schmidt: Yeah.

Mirko Novakovic: In small businesses and large businesses who are building their own observability stack, as you said. Right. developers just are used to. And to be honest, I was a developer. It's also very much fun of building it right somehow.

Geeta Schmidt: So it's something that like.

Geeta Schmidt: You can play with until it's not fun anymore.

Geeta Schmidt: there's a point.

Geeta Schmidt: where it's not fun when like, everybody wants you to support this thing and suddenly, oh, that was kind of fun for a bit. But maybe it's not like, you know, my career for the rest of my life, and that happens quite a bit. And sometimes it just happens that you're seeing it you know, the need for observability and the standardization across the company to be the same means more. You know, sometimes it's like, oh, let's just standardize on one stack, for example. So they'll be like, oh, well, Mirko built something. Let's see if Mirko wants to roll this out through the company. And then you become, you get a new job and that's hard.

Mirko Novakovic: Yeah, absolutely. But at the end I see it a lot that at some point, scaling observability data is a pretty tough thing to do. So at that point, a lot of people just switch to a solution that can just scale, because we see even small customers sending us hundreds of gigabytes a day on logging. And that's small, right? So bigger customer sends terabytes. Maybe you get to petabytes of data, which is enormous. Right. That that really exploded in the recent years. And that's not something you just do as a side project. You either have a super dedicated team doing that for you, or you have to rely on a vendor who does that. Right.

Geeta Schmidt: Yeah. And I think this back to you know, these functionality. So one of the things, for example, CrowdStrike is other vendors are also good at is that there's sort of this whole integrated interface, right. In terms of like really coming and getting a full view of your graphic in a, in a graphical interface of what your systems look like and the security health of those. Being able to jump into metrics when you're seeing things, you know, living into the data or the logs, like, you know, the traces in terms of like. So I'm trying to equate like, you know, in some senses what I'm you know, really run into and many times in customers is been like, well, we've standardized on EKS for tracing, so we can't move off of that. And so you end up having, you know, a lot of people who feel like, oh, I'm never going to jump over to our tracing product because that's a pain. I'm not, you know, like, so this whole like integration between three separate products can be very ugly. I mean, I think to be quite honest with you, you know, Instana and Humio did a really nice integration, but we put work into it. Right. And, I think for the user it is really painful. Like, you know, oh wait, you know, I got to do this and, and then, you know, you've got three different consoles and it's messy.

Mirko Novakovic: You can also not bring the data together. Right. That's also pretty messy. Or you set up a data warehouse that does that for you. So you have another thing to maintain. But it is super painful. Yeah. We call that Franken monitoring. Right.

Geeta Schmidt: I just talked to somebody this morning.

Geeta Schmidt: About how he left his Franken monitoring. And now there's six people trying to keep the Franken monitoring together.

Geeta Schmidt: So as you mentioned.

Geeta Schmidt: Like sometimes it's like you suddenly got you know you leave this project that maybe one person magically knew how to run and and work. And now it needs to be monitored and needs to have several teams etc.. So yes, I think the answer is there's got to be a bit more of the merging, given the volumes we're looking at. You know what I'd love to be able to do for the, you know, developers and operations SRE teams is to be able to get them to get context lightning fast. Right. And it's both this whole, you know, integration of these kind of platforms. But also just like get the dig it in in a nice way. And I know this sounds again an area where you know, developers fortunately or unfortunately have Potentially enjoyed it, but not that much. That's almost the most painful part. It's like figuring out how to pull the right data into your system and make it work. If we could skip all that and just get them the data into context so they can start working day one with a wonderful onboarding experience. What a wonderful way to get started, right? There could be other things that need to be pulled in over time, but like getting the customer to be able or a user to be able to just work with their data and their metrics from day one would be just phenomenal. And I think that's one of the things that security does much better. Right now at least and.

Geeta Schmidt: And there's.

Geeta Schmidt: New things coming that could be better in, around OpenTelemetry. Obviously an interesting way to do that. I think the future is going to be around caring about that much more around the company's data pipelines and understanding how they're using them.

Mirko Novakovic: Yeah. And you just mentioned OpenTelemetry. I mean, obviously we are based on OpenTelemetry, and OpenTelemetry has a really good concept for a context, right? They provide at the moment three signals, logs, metrics and traces. But they have this concept of resources and having semantical conventions for that, that brings all that together. And just today we were looking at also competitors and looking at metrics, what's called the Metric Explorer. And it was interesting to see that most tools still have a very flat idea of metrics. So you basically have to know how the name of a metric is to find it. They don't have a notion that there is a kubernetes cluster, and there's a pod in there, and you want to take that pod and you want to take a metric of that pod, because the pod is already the context for a set of metrics, and it's just not there. Right. And now think about the logs and the trace. Everything has context. And I think that I totally agree that context is one of the really big missing pieces and observability. And probably I'm not so familiar with the security space, but I can imagine that in security this is really important, right, to to figure out things and bring different data sources together to find threats.

Geeta Schmidt: Yeah. I mean, you know, one of the things is also just data mapping and, and like that those are really important things because you've gotten so much different. You've got so many different types of data coming into a security system. In developer land, you could also have that. But it's like this whole common, you know, data model problem is actually a big problem. So this whole thing around, how do we get this data clean, you know, essentially so that it's able to be manipulated in the right way. And I think that is something that we've taken a bit I don't know why I think we've been less good at doing this until really OpenTelemetry came onto the scene, and I think that this is really come out of pain, to be quite honest with you, because I think, you know, you don't want to go in, adopt some vendors model or go for a third party pipeline that, you know, a company that may or may not be around and support another product, etc.. So here I do think like this is a massive value to companies. The you know, the work that's been put into it so far and the adoption, it's one of the fastest you know, growing and most adopted groups in the, in the cloud native space. So I think, like, this is just something that's just really been needed. And then I think what we're seeing now is vendors kind of backtracking and sort of saying like, oh, wait, okay, we got to worry about OpenTelemetry or we've got to support it somehow. whereas luckily for you and the new product is a little bit Around. Okay, well, we can start fresh, right? We can think. Think about these things as a first class citizen. And how does one take OpenTelemetry and build an observability solution based on what these nice new superpowers it could provide? You know, as building blocks would be. So I think you have kind of a chance to build fresh, if that makes sense. Right?

Mirko Novakovic: Absolutely. And I know from a vendor perspective, you build all your own model and then a new standard comes up. So the first thing you think it's a threat. Right. Because oh yeah it's different than yours. Right. And you have built all these agents, all that IP. And now there is something open source, right?

Geeta Schmidt: And free like you know yourself, right. Like you built lots of agents.

Mirko Novakovic: Absolutely. Absolutely. Yeah. Yeah. And we loved it.

Geeta Schmidt: It's a lot of work in it. And it's and it's also like, you know, it's not the most exciting work sometimes. Right. But you know, you need it because it's a, it's sort of success or failure for companies who need that type of data source or.

[00:31:28] Chapter 6: Addressing Cost and Efficiency in Observability

Mirko Novakovic: Also something that's kind of magic, right? Instrumentation automation. And so you we enjoyed it. But then you realize that the customers put a lot of pressure on the vendors. Right. Because they actually like that open standard. They like these conventions. And so most of the vendors I think today, all of them, then the next step is you integrate it, right? You say, okay, I can't fight it. So I do an integration layer and map the OpenTelemetry to my model so I can integrate that. That's step two. But I think what's happening now and what we are doing is really building an OpenTelemetry native solution, which is built from scratch for the model, for the context, for resources, for the semantic convention of OpenTelemetry. And that gives a lot of power to the users, I think, especially in terms of context.

Geeta Schmidt: Yeah, I think so. And then I think when we were going back to talking about why, you know, many of these development or engineering teams have built, let's say, their own Frankenstein you know, shops of observability. A lot has been done because of cost you know, and so, like, bringing that up that ugly word as it may be. But, you know, I think when you look at the value that vendors provide versus like, how much it actually costs to do something, I think that's where you know, vendors could be more innovative, right? Like, I think that's where observability companies can build more innovative products, whether that's like better integration, more efficient, better speed, better, you know, scalability. there's a lot of different ways to play. So I don't think like it's, you know, if you're dropping this OpenTelemetry side, there's so much more that still hasn't been solved in observability. And I think if, if we see the vendor view going more towards that, I think we'll get some much better products for the, for developers and, and SREs and infrastructure folks. I think it's been Frankenstein because there's been a lot of lack of choices. So people have just kind of been stacking stuff together. And so that's why it's it's exciting. Like now there is a market, right. Like we said that it didn't exist then. Now there's a pretty real market and there's some real chances to find new innovative products coming out.

Mirko Novakovic: And just to, to reflect that on when I talk to customers. Cost is a big issue in observability. I have a lot of customers I talk to who say observability is actually the second biggest thing after the cloud cost. It's like a tax on cloud, like 20%, 30% of your cloud costs. Additionally, spend for observability, which is, as you said from a value perspective, probably it's too expensive, right? Especially if you look at how storage has evolved. Right? If you store things in S3 or other, it's actually pretty cheap. So, yeah, I think that's a reason why a lot of people build themselves, because they say, oh, the difference between me building it myself and then paying for a vendor is just it's just too big.

Geeta Schmidt: Yeah. But see, there's again, where innovation can be interesting in terms of how you can actually build your strategy around cost you know, reduction or optimization. So, you know, like, how are you actually long term storing your data? Do you actually need to long term storage? That's like one question as well, right. Like so how long do you need to. And then you know, and then what are you storing where. And that's also where like this telemetry pipeline, the data pipelines become quite interesting . And it's funny that you say it is like number two cost. Like everyone talks about how much time people spend optimizing on their cloud costs. But you don't really. there's not really a formula on how to optimize your observability costs yet. And I think that's going to be sort of this next wave of you're going to be seeing this from, you know, other companies is really seeing like, you know, how did I save money with this observability solution. Right. And oftentimes it's like, all right, can you pull together all the things I had with three vendors. Right. Like into one platform. Okay. Ta da. That's one thing. You know, have all the data together. That's another thing. I can manage that data. You know centrally better. Okay. So you start to see the levers of ownership and control of the observability cost. Oftentimes, if you ask your customers and you probably run into this as well, is like, how good are you at managing them? You know, like, well, developers need them. Or suddenly we just had an outage, so we needed to pull in a bunch of logs or we need to do this or that. And so it feels very out of control. But I'd be curious as to like seeing you know, more of this ability to, you know, think about how you control that and you can do that better, you know, in a centralized platform that's giving, you know, at least the customer more, you know, fin ops or operations information about how much this costs.

Mirko Novakovic: One of the things we are building is giving you observability for your observability data.

Geeta Schmidt: Yeah, exactly. Yeah. Exactly that.

Mirko Novakovic: Yeah. Because one of the things you do if you want to reduce your cost, you need to know which data is actually accessed, which data is critical. Maybe you are storing hundreds of metrics or logs that nobody is looking at, right? Never. How do you know that? So that's something, I think, where the whole observability space can innovate a lot on the way, how to give the user the power to really decide which data to keep, for how long and if they want to keep it at all.

Geeta Schmidt: Is that AI driven America?

Mirko Novakovic: It could, it could at some point. Right. And I think we are at the stage where even very basic things that has nothing to do with AI can help. Right. Are these things accessed? How much are they on dashboards? Did ever somebody query for that data? Right. That's very simple things. And then with that information, you can make decisions. But yes, at one point, maybe also the context of the data, the content of the data can drive an AI to decide which data is actually critical or not.

[00:38:02] Chapter 7: The Role of OpenTelemetry and Future Opportunities

Geeta Schmidt: I think there's a lot of situations where people feel comfortable storing everything, so they don't have to think about it, if that makes sense. But it's also because it's hard to think about it. Like it's hard to think like, do I need this or not? Or like, do I want to be in a situation where I didn't have the data? That's always a bit of a horrible situation sometimes. But in, let's say more on the DevOps and versus the security side, you know, in security and more compliance driven situations, you need that data to, let's say, prove something or disprove something that potentially happened. whereas I think there's a lot more creativity in the developer and observability space around being smart about how you handle the data volumes. But there's all of the things. I mean, it's also just about having, you know, like I said, three different licenses and making the glue work as you have some kind of integration group that's trying to plan out all the data's like, seam together and the products are seam together, potentially another cost, you know, a Frankenstein setup, another cost. So like, you see all of this stuff and and I think this is where it would be really lovely to see more and more innovation on this side, and less so around things like OpenTelemetry, right, like or agents, etc. if that becomes more of a solved problem, we can solve some of the harder problems.

Mirko Novakovic: Yeah, and coming to the end here, I want to talk a little bit more about innovation and your new job. We haven't talked about it. I just read about it. But you are switching the sides, right? From an operational startup CEO to going on the side of being an investor. Right. You just became a partner at a venture capital firm. Congrats on that.

Speaker5: Thank you.

Mirko Novakovic: And I guess you will probably invest in that space too. And so my question would be what are you looking at at the moment. What are the things you think are the next big topics in observability, DevOps, infrastructure, security that you would invest in?

Geeta Schmidt: Well, I think many of the things we talked about today are relevant. I would say cost is incredibly relevant. It was one of the drivers for what we were building. And then the second bit is the experience. I really do think that the experience is you know, the developer experience or the experience of, of using the tool. I mean, I like to say, like, if we're successful with the tool, then, you know, teams are using this all day long, right? Like, they're in it. They're avidly using it for you know, understanding what their applications are doing or their infrastructure is doing and, and have that sense of normal, like, we know when we're going to get to Code RED, we know before we're going to get to Code RED. And, and I'd like to see that more of this, like products that are helping us be more proactive versus be reactive as you know, as developers and engineers, etc., because, you know, the data's not going to reduce the complexity isn't going to reduce the the new problems that come out with, you know, AI applications, etc.. I mean, there's a lot more the spheres are changing, but the problems are still the same. Like, certain applications are just going to not work for some reasons. Infrastructure is going to stop working. So I would say, like, you know, cost and developer experience are kind of my two things that I'd love to see. Like, how could we find innovative ways to fix that? I also think that we have, you know, had a bunch of products that came out also like Austin, but there were a number of other products that were looking at cloud and cloud based observability and and kubernetes and cloud native.

[00:42:05] Chapter 8: Geeta’s New Venture and the Future of Innovation

Geeta Schmidt: Now that we've gotten past that wave and, you know, there's been a massive actual, you know, real adoption of cloud native infrastructures. Now we have a new set of problems that we're looking at. And like, you know, the question is, is really like, how do vendors solve this next new kind of, you know, what are the new enterprises looking at? What are new companies looking at? So like thinking kubernetes first and cloud native, you know, and you're thinking of a completely different sort of environment than we were looking at maybe just even in, you know, 15 and 16. And I like companies that are caring about like, not just today but where the world is going. And I think that's exciting. So, you know, moving from being in the space to seeing things that are exciting you know, oftentimes because if you've been in the space, you're probably pretty critical of solutions, you know, because you've seen how hard it is to really win. And it's a very busy market. You know, there's a lot of solutions that are, you know, trying to solve things. And I really am looking for things that kind of, you know, have an opinion and, and flip sort of a thinking. That's much more focused towards the next the next stage of observability that we're going into.

Mirko Novakovic: I like that a lot, especially looking forward. Right. And, and and think differently out of the box. Right.

Geeta Schmidt: Yeah. I don't think like Humio and Instana would have gotten very far if we didn't think kind of crazy thoughts. Right. And those were kind of a bit crazy then. And so like, I'd like to see what's crazy now. Right. Like that would be kind of fun to look at because because we have a new chapter that we're going into.

Mirko Novakovic: That was really a pleasure talking to you. I could talk forever. I really also like the aspect. I was thinking about it all the time, the cultural aspect. And this coming back to the intro right where you said it's a physical feeling. And I think that's why we are building tools, right? We want to help developers.

Geeta Schmidt: Pain.

Mirko Novakovic: People to stop the pain. Really? Really. It's really about stopping the pain. And it's not easy to do that, but it's good that you mentioned that.

Geeta Schmidt: Well, I think one of the things, Mirko, is you probably feel the same pain when your customers feel the pain. So you, you know, you know exactly what that feels like. And that's why you probably built the right things. And I think that's what the kind of mentality the next generation of observability teams or products should think about is like, yeah, that pain bit is very real. So let's try to get rid of that.

[00:44:51] Chapter 9: Conclusion and Closing Remarks

Mirko Novakovic: Absolutely. So thank you, Geeta. Thanks for listening. I'm always sharing new insight and insider knowledge about observability on LinkedIn. You can follow me there for more. The podcast is produced by Dash0. We make observability easy for every developer.

Mirko Novakovic: Thank you.

Share on