host

Mirko Novakovic

guest

Kaspar von Grünberg

Episode 1036 mins10/10/2024

#10 - Engineering Developer Success: Platform Architecture and the Rise of the IDP with Kaspar von Grünberg

host

Mirko Novakovic

guest

Kaspar von Grünberg

Listen on

Apple Podcasts Spotify Youtube

About this Episode

Humanitec CEO Kaspar von Grünberg joins Dash0’s Mirko Novakovic to break down the origins and importance of the Internal Developer Platform, when your engineering team should consider building an IDP and why platform engineer is a key job for the future.

Transcription

[00:00:00] Chapter 1: Introduction and Setting the Stage

Mirko Novakovic: Hello everybody. My name is Mirko Novakovic. I am the co-founder and CEO of Dash0. And welcome to Code RED code because we are talking about code and Rad stands for requests, errors and duration the Core Metrics of observability. On this podcast, you will hear from leaders around our industry about what they are building, what's next in observability, and what you can do today to avoid your next outage. Today my guest is Kaspar von Grünberg. Kaspar is an early pioneer in platform engineering. He has been building IDPs at scale over the last decade, and he actually coined the term IDP, which we will talk about. As the founder and CEO of Humanitec, he's part of the team that Score and the platform orchestrator. I'm really excited to talk to Kaspar today. And Kaspar, welcome to Code RED.

Kaspar von Grünberg: Thank you so much, Mirko. Thank you for having me.

Mirko Novakovic: Yeah. And you're the first one who's not a CEO of an observability company, but a platform engineering company. So I still have to ask you the first question, which is what was your Code RED moment?

[00:01:07] Chapter 2: Kaspar's Code RED Moment

Kaspar von Grünberg: Right. My Code RED moment. I mean, I had, you know, a couple of fun ones like we once you know, didn't protect like, our production Postgres database. Well like my last software I built, which was then, you know some fun guys somewhere in Nigeria, held our data captive. And we had to backwards deduct this and didn't realize it was a huge disaster and very, very embarrassing because we missed the basic one on one. That was not the Code RED moment that led us to build Humanitec. That moment was when one of our DevOps engineers came to us 2015 and said, like, hey, I would like to have triple the salary, no, double the salary I have right now. He was at 60 K, which for Berlin 2015 there was already a lot of money. And then he asked for 120. And I turned to my CTO and said, like Greg, like, why the hell should I do this? Like, that's usually the moment where I said, like, hey, goodbye. You're probably not meeting our cultural values. And Greg said, like, yeah, unfortunately we don't know how to deploy.

Kaspar von Grünberg: If that guy leaves. So I'm afraid we have to double the salary. And, you know, we ended up doubling the salary. And as always, you know, three months later, the guy leaves anyways. And we weren't able to deploy for a while. So you're seeing that, you know, I've been following best practices in software engineering for a long time now, but that was really the moment where I thought, like, you can't have that much to person dependency. At some point you need to react to this like you need to have you need to know what's going on, right. The, the theme of observability. And you need to make sure that you can act on the stuff that you are learning. And if that's depending on one human being not in the time zone where you need them to be at the moment, or that person is holding you hostage. Then the best observability doesn't really help you, because you need to somehow need to to react on that. And that was the moment where we said, okay, that's probably like a more fundamental problem in the market.

[00:03:10] Chapter 3: Understanding IDPs and Platform Engineering

Mirko Novakovic: That's already a good problem to start a company, right? Yeah, yeah. And I think there are two terms I really want to discuss here, which I found really interesting. And one is IDP which stands for Internal Developer Platform. And the other is the more general term and I would say pretty popular term at the moment, platform engineering and platform engineers. It's a whole topic. I'm really excited about talking to you because you are the expert here, and I have a lot of questions. But let's start with IDP because I think that's where you started. Also, right? As a disclaimer, I was an angel investor in tech, and at the time we talked, you basically explained to me. When I remember it correctly, it said, hey, all the big companies like Google, the big tech companies, they have a developer platform, which is essentially, as far as I understood it, a self-service platform for developers to quickly set up their environment and deploy their application, basically without having to maintain all the infrastructure pieces, but also the process around it, including deployments, etc.. You said, hey, and this actually will become the standard for all companies, right? Because especially small companies don't want to build their own environment. And so you founded Humanitec to build IDPs, correct?

Kaspar von Grünberg: So we founded Humanitec and Humanitec as a really, really interesting story because when we it's I always say it's a little bit like we are developing engines for trains, but tracks weren't invented and trains weren't invented either. So we had, you know, there was a lot to do. And there's still a lot to do. Like, it's a pretty heavy, heavy lift. Like, we are developing tools that help platform engineers build these platforms. So the key observation that I shared back then was when Cloud Native came up and we had that huge shift left. Right. 2008 you have Werner Vogels, the who was the CTO at Amazon at that point proclaim developers should build it and run it. And that was sort of his big statement. And the entire industry followed over the next year. So when everybody made the move into Cloud Native, they said, okay, the developers should actually own the Amazon account and all of that, all of these things. Now, unfortunately, what most people don't know is that Amazon was a really, really tiny engineering organization. You know, at least from the perspective of the engineering organizations, that you and I probably work with more in the dozens and hundreds of developers. And then our focus had never led a large engineering organization like that. Is it like amazing guy, great CTO. But at that point he was a scientist, right? And he had just started to manage a couple dozen engineers. And so that idea of you build it, you run it is a really cute idea if you have a couple engineers, but it's pretty dangerous idea, actually.

Kaspar von Grünberg: If you have hundreds of engineers because people are leaving all the time and you have security concerns. And so when you looked at the progression in the early tens, right, you could see that five years after the, the, the, the big tech companies jumped on that train wagon, they realized that's probably not working. We have to structure this more. We have to treat the supply chain. If you want the digital factory that developers work when they're used, when they're building their application more as a product, that idea of treating that idea as a product. And you can see this in, in America you know, with, with Google, obviously, you can see this with Netflix in Europe, you can see this with Zalando building a pretty large platform. And I was going back and forth with Jason Warner. Jason was the VP engineering at Heroku. And Heroku is a platform as a service. So that's sort of a fully managed platform. And he then became the CTO at GitHub in 2017, I think. And he looked at the setup in GitHub. So the actual developers building GitHub and said this is probably, you know, we can probably speed this up and structure this a little better if we would have Heroku. You know, unfortunately GitHub can't use Heroku because, you know, scale and needs more flexibility in configuration. And so they started to build something like an internal Heroku. And so this came up my colleague and co-founder Chris has had built one of the platforms at Google.

Kaspar von Grünberg: And we said, okay, if if the trend is that many, many organizations are going to build their own internal Heroku, their own internal platform, then they probably need the tools and the back-end systems and the front ends and the portals to build these platforms. Also, at some point we said, well, we probably should give the child a name. So we literally sat down and said, if an organization is building a platform for their developers, how do we call that? And we were literally I mean, this is so funny to look back because they're like, so many people are using this now. But we literally debated, you know, should we call a developer platform? And we decided against because back then the API platforms of Facebook and Twilio and stuff were also called developer platforms. So we said, okay, it's for internal developers mostly, let's call it internal developer platform. And, you know, that continued. There was this sort of relatively small circle of people you know, including you debating the topic of, you know, how are the practitioners building these platforms called, like a guy next to me and said, like, well, let's call them platform engineers. So now this year, you know, it's on nine hype cycles with, with Gartner and mentioned everywhere. And we have hundreds of people signing up to getting certified as platform engineers. So that train is running. But it's actually, you know, we're in the very early days and we're also figuring stuff out as we go.

[00:08:47] Chapter 4: The Evolution of DevOps to Platform Engineering

Mirko Novakovic: Okay. But there are many words. And I have to be honest, I'm a little bit struggling always with the words, right? So I remember I mean, I'm a long time in this industry. There was this big trend of DevOps, right? Yeah, that was essentially so before you had the developers and then you had ops. And at that time that's when I programmed, right? I programmed stuff on my laptop. Then I throw it over a wall and I didn't care anymore. Right. It was not my problem to run the application. And so essentially DevOps tried that there was already a shift left. Move it somehow, right? That you try to bring the teams together and you, you code it, you run it. Right? So, so give the responsibility to a team which was a mixture of development operations. And then I remember this famous site reliability engineering book from Google came out where essentially they said, hey, you need engineers running or operating because you need a lot of automation. You need a lot of code. You need them to understand observability built in. And so it's more a stuff like Terraform came up right where you had a lot of automation infrastructure as code. So that's kind of where I understand. And what you are saying is then somehow as far as I understood, a standardization happens. So we figured out that this is you need kind of the internal developer platform. This is kind of the standardization so that not everybody has to do all the manual work. And those engineering building those platforms are now platform engineers. Is that somehow can you help me get this somehow together? It's it's so many words and so many different words.

Kaspar von Grünberg: Yeah, absolutely. Yeah. That's the progression. You have the throw over the throw over the wall situation. Right. The you have that idea of this isn't working. Let's deconstruct this like anarchy. Everybody does everything. Like literally you have people and that still happens, right? You just give your team like an Amazon account and say, do whatever you want. Now, do whatever you want. That's okay for the beginning. But somebody has to, you know, pick up the pieces, make sure that this is like the conflicts are, you know, streamlined. Make sure that we don't have I mean, my litmus test if you want, is I ask the organization, can you tell me how many Postgres databases you have running for staging environments? They return like we have 200. Okay. How many materially different ways do you have to configure those 200 Postgres databases? They say like, nah, we don't know 200. Like every developer is doing their own thing. So if you if like then I always ask them, okay, I mean let's as engineers let's take a step back. How many ways are there to configure Postgres databases and staging like RDA says like 195 different factors you can tune, but they are really like three, four, five different ways of configuring RDS and staging. So then you say, okay, if there are as engineers, we agree that there are only five ways of configuring and staging, but you have 200. Like what's the reason for the 195? Because they're creating overhead. They are, you know, somebody has to maintain them. Somebody has to update them pretty frequently. Somebody somebody has to respond to outages, observe them. All of these things. There's risk. There's risk. Yeah. Every configuration you have is risk. So then you have to ask yourselves, well, if you let everybody do everything, nobody will be able to agree on a config standard.

[00:12:15] Chapter 5: The Role of IDPs in Standardization and Efficiency

Kaspar von Grünberg: So at some point we need to say let's take a step back and let's structure this a little more. And let's say there are at least defaults and golden paths, so that if the developer needs a Postgres database for staging, they can pick the default. That default is lifecycle managed. You know, the updates are done for the person. We make sure that observability, you know, like let's take for example, the workload has a sidecar container to make sure that all everything is picked up, all of these things are enforced. And then if the developer says, well, my application needs something else, I need a sixth way of configuring Postgres. That's okay. Right? They can leave that. What Netflix calls the golden path, and they can go on their own. But that probably also means that then they are responsible for the SLA and all of these things. So the art of crafting these golden paths, setting these standards, and providing a consistent well thought through system that developers that can then can use and that drives standardization and developer self-service by design. That's what we call an internal developer platform. And people are sometimes confused because they want to see like a tangible, like they have a tangible idea how that platform is, looks like. And if I'm looking at the platform built with Humanitec, then you do see repeatable patterns. They all look more or less the same, but there is not the absolute IDP because otherwise you could just use Heroku. So how the platform looks is really also depending on the cultural context of the organization.

Mirko Novakovic: And is it something you would say you need a certain size to get started with? So I don't know a minimum number of developers or when do you think you should start with your IDP and platform?

[00:14:00] Chapter 6: When to Implement an IDP

Kaspar von Grünberg: I don't think you should do that if you have less than 50, maybe even 100 developers, because I just think if you're a small team, you know, everybody should do everything and you should have short communication paths and you're a startup. You shouldn't restrict yourself with too much standardization. You should move fast and you should and bash out things you should. Maybe. You know, I always think you should consider using Heroku as long as possible, right? There's really nothing wrong with that. You know, you don't lose your honor if you're using Heroku. It's actually a pretty good tool. And then as you're growing larger or if you, if you tell me, hey, we're going to double our engineering team in size, we're 50 developers now, that's probably like a good idea to start considering this, but the majority of the organizations that work with us are in the range of 250 developers plus into the, you know, hundreds of developers.

Mirko Novakovic: Yeah, that makes sense. I also think that maybe it also depends for startups. I remember Instana when we were growing very fast. One of the problems we had was onboarding, right? So when new developers join and you have, I don't know, ten developers per month, join your organization, then you have a lot of friction and issues and time to set up the environments for the developers. Tell them how to use it. Is that something an IDP can help also or.

[00:15:23] Chapter 7: Product Details: Platform Orchestrator and Score

Kaspar von Grünberg: Yeah, absolutely. So that is one of the use cases where you're able to say I have. So for instance, I have structured a structured way of configuring something. I have a structured user interface that will help me get into things. I have a common Rbac profile that I can, I can just be onboarded to and I have access to all of these different things. I have scaffolding functionality where I say like, I need ABC and I just need to describe this and it's created for me. So all of that is something that's absolutely you know, the onboarding case for developers is a big driver. Also from a painpoint perspective, because that's expensive, right? It counts into retention. It counts into every, every time. Like I just looked at the example, if you're an enterprise in America, you assume that every senior developer costs you 1500 a day, right? So you can do the math on that.

Mirko Novakovic: Let's talk about Prague. What it really is. Right. And if I look at your website, there is basically a product called Platform Orchestrator. Right. And then you also have an open source product called score. Can you a little bit elaborate what score is and how you use that and what that I saw you have some traction on on the project right on GitHub, what it actually is and how it works together with the orchestrator, what the orchestrator is. So what? What as a customer, what do I get?

Kaspar von Grünberg: Absolutely. So I think that platform architecture is a lot like software architecture, right? You have a front end, you have a back end. You need to, you know, you need to have interfaces to engage with that platform. And you need to have a back end that actually does the logic handling. So we our core product is the orchestrator, which is a back end for logic handling of platforms. If you really want to simplify it, there's a request coming in from the developer. I need an S3 bucket or any resource for my workload. That request will make its way to the orchestration. The orchestration will say, okay, well, that's the request. What's the global standard for S3 in the context? Let's say in staging, grab that standard, generate the app in infrastructure configurations and weaves this together, run sign offs and executes this right intersects with your CI CD pipeline, gets wired into this, and then builds a graph in the back end. This is a graph based backend of all the different dependencies Sees that it can update and lifecycle manage. Make sure that the S3 bucket is always on the latest version. Make sure that there are no vulnerabilities. Make sure that the workloads are all observed by Dash0 like all of these things. It is something that the orchestrator can take care of. So really like a logic handling backend for platforms. Now you need interfaces for, for developers to interface with that backend. You could just call the API, but that's not super attractive. So we've come up with the idea of having a code based interface. So developers want to stay where they are right in their inner loop.

Kaspar von Grünberg: In the version control system. They don't like swapping into a user UI or, you know, any of that. So we want to make sure that we do not change their workflow too much. And with Score they can just describe what their workload needs. It's a YAML format if you want like a spec, a convention that we've handed over to the CNCF and Score allows the developers to describe the relationship of the workload and dependent resources in. I would call that an environment agnostic way. So in a Score, you say I have a workload called product service. It needs a Postgres S3 and DNS. And if you take this and then you do git push and you say, give me this for development, then this will be translated and the orchestrator will make sure that the development environment, is up and all the resources are configured following the global standard and it's observed and all of these different things. And if I say, hey, I want this for production, I just take the same score file. I push this to production. The same thing will happen to prod even local. I can run, score, compose, and I'll have everything running on my local machine. And next to this code based interface, we also have a user interface for a platform which is called a portal in platform engineering terms. And that that is really just like the code base score file, but in click. Clickops. Some people do that. Most people keep that as code interesting.

[00:19:36] Chapter 8: Possibilities with Open Standards

Mirko Novakovic: And one of the things I love about OpenTelemetry is what they call the semantic convention, where they basically do have something similar in score. So if I define S3 bucket, is there a standard how I specify that, or can I just define anything? Is it just a templating specification or does it define the semantics. And behind it.

Kaspar von Grünberg: It can define the semantics exactly. And so we are trying to actually push a standard there because S3 is a great example. It's called in this cloud. In this cloud it's called.

Mirko Novakovic: Exactly.

Kaspar von Grünberg: This way right. And so which is useless like this is really just one of these ideas the hyperscalers do. And I completely respect that I would do the same thing. But that drives a little bit of lock in all of these things. Many of our customers are multi-cloud, or at least they have the legal obligation to, you know, be able to swap between clouds. So you want to have a semantic convention that actually spends all the different providers. And that's what we're trying to do, to achieve. We're actually also investing pretty heavily in making these default golden paths. How should an S3 bucket be configured in staging? Available. And we are working with the cloud providers. We are working with suppliers like MongoDB for Atlas to say, hey, what do you think should be the, you know, general way of configuring MongoDB Atlas in staging for you know, medium load application and then making those conventions also available. And then you have sort of the score naming convention, you have the off the shelf standards. And you can then bring that together and orchestrate that, and you can build that layer above the cloud that I've been dreaming of for a long time. It's a pretty tough thing to pull off because so many things in motion. But if that all clicks, then it's a pretty exciting you know, idea for the next years here in software development.

Mirko Novakovic: You should actually check out OpenTelemetry if it's a CNCF project, because it would be pretty cool if, if those semantic conventions would be standardized overall. Right. Because then if you have already defined in score, basically your infrastructure. As far as I understood what you want to have, you could automatically apply those tags to your instrumentation, right, to the metrics to the logs so that you could automatically. I mean, I was thinking of an observability, right? So I could actually take your score configuration to understand how your system looks like. And then I could, could use that information to apply and tell you, hey, there's the logs for that service. Or here's a problem. Right. Because we speak the same language, it's always a problem in observability that we don't speak the same language. Right. That's that would be actually.

Kaspar von Grünberg: Maybe you could even at the create time of the service forward the intent of how the observability is supposed to happen by, by aligning this with the telemetry standard. That's an interesting thought.

[00:22:24] Chapter 9: Developer Experience and Automation

Mirko Novakovic: Yeah, yeah. And we also see the same what we tried to do right now is or what's alRedisy happening is standardizing a few things on top of it. Right. How do I create an alert for S3, for example? And there are projects like awesome Prometheus Alerts, GitHub projects where you have hundreds of predefined alerts. And we want to do the same for dashboards, right? We say, hey, there's a new CNCF project called Percy's, and that's a standard JSON format for dashboards. And we want to provide a how does a you set RDS, how does a RDS dashboard look like. And now we could map that and speak the same language. You not only defined that you want your RDS service, but your observability tool alRedisy knows which alerts to set and which dashboards to provide you based on that golden standard. Right. That would be. Yeah.

Kaspar von Grünberg: That's actually sounds really interesting.

Mirko Novakovic: I really like that idea. Right. Bringing the two things together.

Kaspar von Grünberg: Yeah. And I'm seeing this all the time with the customers. Right. First we thought like no customer would seriously consider our standard for how we think S3 should be configured in Amazon staging. But actually they come and say like, look, you know, we're not of the opinion that we have the gold standard. If there's a joint standard that's been rolled out pretty, pretty heavily, right. We're happy to rely on that. And so they're actually just taking the terraform of their provider, adding their security posture. And I guess the same would hold true for observability, right? People want experts to have opinions. And then also, you know, share best practices. That's what we're seeing continuously across everything we're doing.

Mirko Novakovic: As a developer, I always like tools like Stack Overflow or something because yeah, it's essentially copy paste and then optimize, right? You want to have some sort of how do I start? Or you mentioned scaffolding, right? When Ruby on Rails came out, you had these scaffolding commands where you essentially build an application, and then you just put in the stuff that you needed or adopted it, and it just gives you speed, right? Not only that, it provides you with the best practice how to do it and you learn from it, but it also gives you speed and it doesn't require you to do all the boilerplate stuff that you don't want to do anyway. So I like it a lot. And so now that you have the score specification and I define my infrastructure, then you put that into your tool. The orchestration then uses that score file to actually provision the things that are defined in there. Or how does it work.

Kaspar von Grünberg: Let's say I have an existing score file and now I want to add a Redis. You add two lines, give me cash of type Redis, and then you just push this so it makes its way through your CI pipeline hits the orchestration. The orchestration analyzes that change and gets the context. So let's say we're deploying to staging. It's then simply doing a lookup and goes and it says, okay, what's the convention of how the organization wants to configure Redis in staging. And this is defined as code again. We call this defined in a resource definition. So the organization might say you know Redis is in staging. We're using this Terraform file by the way. It's parameterized. And we want this cloud account. And this is the naming convention for the yeah. We can take all of those resource definitions. We cue the execution, we generate the Redis, maybe even the roles necessary networking. You know, we could even create a new account, whatever you want. Really? There's no there's no limit. And then we will generate the app configurations or regenerate the app configurations. Because now you need configurations that also you know, include Redisings. So that means we have the new configurations, we have the container image, we have the different resources. And then we inject the secrets into all of this. And we serve this whole workflow is sort of the create workflow. The more common workflow is the update workflow. So let's say I want to update all the Redisings across my estate because everything's a graph. I can say, okay, where's Redis running? Ba ba ba ba ba ba ba ba ba ba ba. Update my global convention and then push to the entire system. And then all Redisers will progressively, you know, in a careful way there's sort of fleet rollout capability progressively update. And that means I have way less configurations that I have to maintain. Right? I don't have hundreds of repositories. I have everything in a graph, and usually most Redis is in a given context. Look the same.

Mirko Novakovic: And you set the interface is basically GitHub. So I can as a developer I can have my score file versioned with my code. And when I hit the code into the repository, it will automatically detect it and trigger that orchestration engine or.

Kaspar von Grünberg: Exactly. Yeah. That's it. Nothing else.

Mirko Novakovic: Now that's pretty cool. So it's similar to I recently I'm not programming anymore, but I recently did a change. Our little changes. My only commit I have to say zero. But we are using Vercel as as the platform for our UI. And I found that so amazing that I just pushed pushed the code and it automatically creates the environment it creates for my branch. It created me a test environment where only my little fix was in there. I could test it. It was amazing, right? And so it's something similar, right?

Kaspar von Grünberg: This orchestration. Exactly. Exactly the same experience for the developer that you have with Vercel. You say like give me a new branch. Boom PR environment up new infrastructure up, everything Redisy for testing. We'll just give you the URL and you can start testing. But we are essentially like, if you are a large enterprise, you can just not use Vercel because you have certain scaling criteria and data privacy. And so if you're in a situation where you want to build your own Vercel, then you're using us to do it.

[00:28:15] Chapter 10: The Role of Platform Engineers

Mirko Novakovic: I was so, so fascinated, right? Because when I developed, you did this all manually. It was super complicated. And I like this approach of integrating this into GitHub. Yeah. Amazing. I, I, I really like the idea. And so where in this situation now if I use Humanitec do I need a platform engineer.

Kaspar von Grünberg: You do. Because in the end, at large scale there is so much to think through. Right. What's our security posture and how like in Vercel you just say here, global standard, whatever. I don't really care. But if you are a you're working for Convair or Western Union is sort of has hundreds of every application developers working with a platform that they've built with our systems. And like for them, it really matters how a Postgres is configured in any given context, right? Because security, posture, compliance, all of these things. So you need to have these people that think about what's our standard. Why are all of this at enterprise scale? You don't have one CIA provider. You have 4 or 5. There will be GitHub actions and GitLab and Circleci. And that's normal for the enterprise, right? Because you're constantly buying new stuff. And all of that has to be integrated with the platform because you want to have a consistent experience. So we will need platform engineers for the next ten years. I think that I think it is a good job to, you know, get into and yeah, that's that's still.

Mirko Novakovic: And how do they work? Are they working inside of your tool in the UI or do they program something, or are they building new interfaces that do not exist or build these golden standards? So how do they work?

Kaspar von Grünberg: Platform engineers think about the glue if you want. Like the stuff that's important for them is what happens between my GitHub and my CI pipeline and my registry and my orchestration and my portal and my cloud and all of that. I'm a big believer in everything as code. Like, so you could probably configure stuff through the user interface. But people that work with us. They work with, you know, the Terraform provider for the orchestrator. They will work with our CLI. They will really build a platform as code. We talk a lot about that. The platform should be seen as a product. And I think if you're building a software product, that's code first because you can test it, you can you have disaster recoverability, you conversion it. So the platform engineer is, you know, those are developers that are often coming from the app dev field or from our infrastructure and operations people and that are sticking their heads together and really thinking about how can we build a consistent experience. And there will be literally a repository called the platform where they're building the platform.

[00:31:05] Chapter 11: Observability and Platform Management

Mirko Novakovic: I totally agree. I mean, we had really, really bad accident yesterday when we were walking to the restaurant here on our offsite in Lisbon. We were discussing because we do also all the dashboards, all the alerts inside of Dash0 you can just deploy with your code. Right. And, and somebody asked why? Right. And it's not only that you are in your environment where you naturally work. Right. I also think you have the power of that tool stack. Right. Rolling back thing. Going back to a version comparing things in your tool set. So all this is, is actually pretty complicated if you build it from scratch. Right. So by doing this as code, you also get all the benefits of CI, CD platforms of version control, systems of rollbacks, branches, all the things are there for you without doing anything. Right. It's a very powerful and understood platform by developers, which you don't want to replicate in a UI, right? It's just it's not what the user wants, probably because they want to use the tools they have. But it's also super complicated to build.

Kaspar von Grünberg: And it's also, frankly, just not secure. Right. You can not apply policy checking. You cannot screen this like there's so many things that you know are scary.

Mirko Novakovic: Yeah, let's talk about observability in platforms. I know you have these blueprints. Right? And in your community and, and I did a blueprint a little bit to, to show how observability is integrated in platforms. And now that we talked, actually, I have more ideas around Score and bringing the things together. But I guess that every, every platform needs to be observed, right? I don't think you have customers who deploy platforms and have no information at logs and trace and metrics out of those platforms.

Kaspar von Grünberg: Exactly. And that's actually twofold. You need to observe the platform itself, like, because at this point, right, these platforms are the epicenter of your development, right? It's like you're out of your producing cars and the factory is down. That's really, really expensive. So you need to observe those. Yeah, the platforms themselves. And you need to make sure that standardizing or having that level of standardization always gives you the option to say, hey, now I can actually also make sure we make it really easy to include observability by design, right? We have sidecar containers next to the, to the workloads that are running to make sure we can also always pick up stuff. We make sure that we, we, we collect all the traces like that. Really. Ingraining observability by design into the flows and saying, if I'm on that path, I have optimal traceability, I have optimal observability, both for the infrastructure as well as the workload running on any compute that is. That's so great, because how many organizations are there that are paying incredible amounts of money to certain providers in the observability stack? But then, you know, half of the services aren't configured the right way because the central teams have no way of enforcing this. And, you know, constantly need to call the developer and say like, can you please make sure that this and this agent is there? So I think this is a huge chance to make sure we have consistent observability across the entire state.

Mirko Novakovic: Absolutely. There's one thing that I also think is very important, which I wouldn't say that we found the final answer yet, but when we talk about observability for developers, we always say the developer. But I do think there is a huge difference between an application engineer and a platform engineer, right? Because the platform engineer does not really care of the individual service and application. They care more how the platform is running, right. And then you have the individual service developer or the developer that builds an application. And they care more. They care more about the application. Right. So I think we need some different views and different functionality for, for platform engineers in our observability tools to make sure that this different way of thinking, it's kind of a layer. Right. Like a platform layer is covered. Very exciting discussion. And I think platform engineering is one of the hottest topics at the moment. And getting this Vercel like experience for your own platforms really seems like a wow moment, if you see that for yourself, at least for me, it was. And I looking forward to follow Humanitec and your journey, Kaspar even more. And I also know that we both will be at KubeCon, and we are sponsoring the House of cube party. So very nice inviting people to come visit us and join the discussion.

[00:35:55] Chapter 12: Conclusion and Future Insights

Kaspar von Grünberg: Absolutely. Thank you so much for organizing this. Thank you so much for having me.

Mirko Novakovic: Thanks for listening. I'm always sharing new insights and insider knowledge about observability on LinkedIn. You can follow me there for more. The podcast is produced by Dash0. We make observability easy for every developer.

Share on

More Episodes

#11 - Optimal Performance: Leveraging Observability for Cost Savings with Luca Forni

Episode 1138 mins2024-10-24

Luca Forni

#11 - Optimal Performance: Leveraging Observability for Cost Savings with Luca Forni

#9 - The Observability On-Ramp: Early Monitoring Made Easy with Micha Hernandez Van Leuffen

Episode 925 mins2024-09-26

Micha Hernandez van Leuffen