ARN

Why Mercedes-Benz runs on 900 Kubernetes clusters

The German automaker runs a massive fleet of Kubernetes clusters to support a wide range of project teams around the world.

The technology team behind the German automaker Mercedes-Benz has spent the last seven years building up a homegrown fleet of 900 Kubernetes clusters to support hundreds of standalone developer teams, giving the company a modern infrastructure platform it says is scalable and easy to manage.

The automaker started dabbling with Kubernetes for application deployment in 2015, after Google open sourced the container orchestration system in 2014. 

Since then, Mercedes-Benz Tech Innovation — the fully-owned technology-focused subsidiary of the storied automaker — has developed the internal expertise to support hundreds of business unit-aligned application teams with their own unique technology needs.

“We knew a single shared [Kubernetes] cluster wouldn’t fit our needs, no vendor distribution fit our requirements, and we had the engineers with expertise,” Jens Erat, a devops engineer at Mercedes-Benz Tech Innovation said during KubeCon Europe last month. 

“We built a 100 per cent FOSS [free open source software] platform build and developed by the same devops team, with no licensing issues or support requests.”

Today, Mercedes-Benz is operating on 900 on-premises Kubernetes clusters across four global data centres using OpenStack, running on version 1.23 from the end of 2021.

While that may not be the biggest Kubernetes estate when compared to the cloud vendors, only 10 per cent of organisations use more than 50 clusters, according to the Cloud Native Computing Foundation’s 2019 survey. It is also nearly five times larger than the Kubernetes environment of fellow KubeCon Europe keynote speaker CERN, which runs 210 clusters at the time of writing.

How much Kubernetes could Mercedes-Benz run?

“We put a lot of effort into doing things in a way where we are able to manage it,” Peter Müller, lead expert at Mercedes-Benz Tech Innovation, told InfoWorld. “For us, the surrounding systems are working well if we are managing 500 clusters, or 1,000, because everything is automated … If we were to add 500 more clusters, we would have to add just one more engineer.”

A key part of that management puzzle is Cluster API on OpenStack, a Kubernetes project which allows for declarative cluster creation, configuration, and management, which the company recently opted for in lieu of Terraform and some custom tools. However, as with anything in technology, it’s not a perfect solution. 

“The number of clusters is not a problem. The problem we have are some of the surrounding systems and sometimes OpenStack,” Müller said. “But Kubernetes runs pretty well, it scales.”

Changing the culture

Each of several hundred application teams across Mercedes-Benz now has the option of requesting its own Kubernetes cluster via an automated process using a set of homegrown tools, built and managed by Müller’s team at Mercedes-Benz Tech Innovation. 

The result is typically a pre-provisioned production cluster, as well as smaller staging and dev clusters within hours, or even minutes, of making a request.

“From an organisational perspective, five to six years ago, devops was the new kid on the block, everyone was talking about ‘you build it, you run it.’ As a provider of a shared platform, that means each application team within Mercedes-Benz gets their own Kubernetes cluster,” Jörg Schüler, team lead at Mercedes-Benz Tech Innovation, told InfoWorld.

“Our goal is to provide an ecosystem and get empowered application teams,” he added. “That ecosystem is underpinned by principles of self-service and being API-driven.”

That estate is managed by not one, but five separate platform teams. Two of these make up a combined team of around a dozen engineers who focus on the core Kubernetes-as-a-service platform. Then there are platform teams responsible for database as a service, logging and monitoring as a service, and container security, including runtime, registry, and image scanning.

Adding to those teams is still proving difficult for the business, however. “Looking for good Kubernetes expertise is hard,” Schüler said. “Providing education, training, and other offerings around this platform is really helpful. You need a community approach for developer teams to help each other with boot camps, training portals, and sandbox environments.”

Golden paths to the cloud

Having built up all this muscle for managing Kubernetes at scale, Mercedes-Benz Tech Innovation is preparing to start moving more and more workloads to the public cloud, where it could use more managed services such as Microsoft’s Azure Kubernetes Service (AKS) and Amazon’s Elastic Kubernetes Service (EKS), to help lighten the cognitive load on the platform and devops teams.

"We are still in the phase of evaluating if we go for EKS, but at the moment we are preferring to do it on our own, because then we have the same architecture on-premises and off-premises,” Müller said.

While those managed versions of Kubernetes may help lighten the load on the Mercedes-Benz Tech Innovation platform teams, the application teams still need help to move to containers and Kubernetes.

One route to speeding up progress here is the idea of golden paths, which are essentially Helm charts that can be used as templates for certain functionality, such as identity and access management, saving on repeated work across different teams.

“We have to provide golden paths and some things as a service to reduce that cognitive load and allow them to deliver what they do best: business value,” Müller said.

Of course, the maturity levels will vary across all of those application teams, so Müller sees his role as giving them a safe environment in which to learn. Once they become mature enough, they can move to the cloud, he said.

Using some inner source techniques, Mercedes-Benz Tech Innovation then manages some of these golden paths, while others are in what Müller calls “a community state,” where they might be considered for full ownership and management if they get a good response.

Ideally these golden paths will eventually be codified into a “Spotify Backstage-style catalog.” Müller says they are currently working on “proof of concepts for a central developer portal for the integration of all of the services, but we are not yet there.”

‘For us, managing Kubernetes is not hard’

“Kubernetes remains hard, don’t leave devops and developer teams on their own,” Sabine Wolz, a product owner at Mercedes-Benz Tech Innovation, said on stage during KubeCon Europe.

However, Müller firmly believes that the learning curve now awaits the application teams and not the platform teams.

“Managing Kubernetes is hard if you are not deep into it. But in our opinion, if we are managing it, we want to be deep into it, so for us, managing Kubernetes is not hard,” he said. “Kubernetes for application projects is still hard. To consume Kubernetes as a devops team is sometimes hard.”

Helping application teams understand the underlying infrastructure without necessarily building deep expertise is where Müller hopes his platform team can shine. 

“Some teams are still on virtual machines and moving to a Kubernetes cluster, and they have to split up their monolith, understand how transactions are handled, think about asynchronous communication, and understand how Kubernetes works,” he said. “That is hard, so don’t leave them alone, help them.”