ARN

How 5 companies got their developers to care about cloud costs

Software developers don’t typically have to worry about the costs of running their services, but as cloud costs continue to rise, more and more will have to learn to embrace cloud cost optimisation. That means adopting finops.

Previously the purview of dedicated centres of excellence, or even exclusively the procurement and finance teams, cloud cost management is rapidly becoming a required skill for anyone who consumes cloud resources on a day-by-day basis—and that includes software developers.

The emerging approach for cloud-first organisations is to have a central team that can manage broad consumption issues, like using the cheapest possible infrastructure for the job and negotiating committed-use discounts with vendors, while responsibility for the cost of individual services is pushed out to engineering teams that are incentivised to run as cost-effectively as possible, without sacrificing business value.

“You need that central expertise but also engineers to understand what they are spending in the cloud. ... You want them to feel empowered to do something about their spending and how it stacks up to the value they are driving,” said Eugene Khvostov, vice president of product and engineering at cost optimisation specialist Apptio.

“Every organisation is different and has different maturity levels and styles, but some of the more successful cases we have seen push that information to the edge and get engineers involved in that challenge, rather than issuing a mandate from on high.”

This can be a difficult shift to make, however, especially for organisations accustomed to lengthy procurement cycles and those that look to insulate their software developers from worrying about the total cost of their own services in a push for greater digital momentum. But now, as cloud costs continue to rise in the wake of the COVID-19 pandemic, the tide might just be turning.

Optimising costs, not just code: Introducing finops

In their 2020 O’Reilly book, Cloud FinOps, J.R. Storment and Mike Fuller explain that in the old world of procuring enterprise hardware, engineers and operations teams would have to think about the cost of infrastructure well in advance.

“Now, in the cloud, they can throw company dollars at the problem whenever extra capacity is required,” they wrote.

Although this has allowed for faster, more-effective development cycles, it also introduced a new set of considerations around the cost and business impact of those infrastructure choices. “At first, this feels foreign and at odds with the primary focus of shipping features. Then they quickly realise that cost is just another efficiency metric they can tune to positively impact the business,” they wrote.

A senior product manager for cost engineering at streaming giant Spotify, Janisa Anandamohan, wrote in a recent blog post, “we know engineers are natural optimisers when it comes to reliability, security, performance, etc. And now we’re telling them, ‘Hey, add costs into the mix.’”

While that optimisation piece is one part of the puzzle, the more significant change is how to bring together previously disconnected groups in engineering, finance, and beyond. This organisation-wide approach to proactively managing cloud costs is commonly known as finops.

As defined by Storment and Fuller, “finops brings financial accountability to the variable spend model of cloud. But that description merely hints at the outcome. The cultural change of running in cloud moves ownership of technology and financial decision-making out to the edges of the organisation.”

A cultural shift of this magnitude naturally equates to enterprise-scale challenges. Finding a way to get engineers to act was the most commonly cited finops challenge by respondents in the 2021 State of Finops report from the Linux Foundation-led FinOps Foundation, with 39 per cent admitting to struggling to gain broad buy-in from their engineers.

“One known finops challenge is to not only start the practice up, but to encourage and incentivise cloud users (like devs and engineers) to participate in cloud cost management,” the report said.

Here’s how five companies have gone about realigning their teams and incentivised engineers to take better care of their cloud costs.

Airbnb reins in spiralling cloud hosting costs

A few years ago, popular travel accommodation booking website Airbnb realised it had a big problem: Its monthly Amazon Web Services (AWS) cloud bills were growing faster than company revenue.

“We had a problem, but we lacked an in-depth understanding of how teams use AWS resources, and how planned architectural and infrastructure changes would impact our future AWS costs,” Airbnb engineers Jen Rice and Anna Matlin wrote in a company blog post.

However, given Airbnb’s “you build it, you run it” engineering philosophy, Rice and Matlin quickly realised that “adding significant friction for our engineers would be met with heavy resistance.” So the Airbnb engineers set out to build up the cost-attribution data required to start to show its data-driven developer community just how big a problem they were facing to gain some buy-in to finops.

At Airbnb, the approach to consumption attribution “was to give teams the necessary information to make appropriate tradeoffs between cost and other business drivers to maintain their spend within a certain growth threshold.

With visibility into cost drivers, we incentivise engineers to identify architectural design changes to reduce costs, and also identify potential cost headwinds,” Rice and Matlin wrote.

This shift brought with it a centralised cost-efficiency team, armed with “a birds-eye view of the entire Airbnb ecosystem,” they wrote, and tasked with finding significant cost-savings opportunities.

For example, Airbnb now leans heavily on AWS Savings Plan options, complete with “a set of prepared responses that move certain workloads on and off Savings Plan to keep utilisation healthy,” they wrote. This team is now supported by a set of AWS cost champions, who sit in all product development organisations to support at the local level.

The result of all of this effort has been a major organisation-wide shift. As Rice and Matlin wrote:

In addition to the various technical and organisational efforts to manage AWS costs, we saw a profound cultural change toward cost awareness and management. This shift was both top-down and grassroots. Leaders mentioned the company-wide cost goal during all-hands meetings. The finance team created a company-wide award for financial discipline, presented by the CFO, which recognised employees who had driven important cost-savings initiatives. In scrappy Airbnb style, the infrastructure organisation held a cost-savings hackathon that spawned a number of impactful efficiency projects. Engineers learn best practices from one another and discuss new savings opportunities in a Slack channel. Upon launch, the AWS Attribution Dashboard became the most viewed dashboard at Airbnb and has since remained in the top list. Seeing this cultural change, we are optimistic that the recent cost reductions Airbnb achieved are not a one-off, but rather a new muscle that we will only strengthen with time.

As a result, Airbnb saw a $63.5 million year-over-year decrease in hosting costs, which contributed to a 26 per cent decline in Airbnb’s cost of revenue in the nine months that ended in September 2020.

Sainsbury’s realigns engineering around cost accountability

Like many enterprises today, cloud investment at British retailer Sainsbury’s has been focused on building new features and digital capabilities for customers, which led to a rapid escalation in cloud service consumption. “Somewhere down the line, the operations team was trying to keep a lid on spend,” group CIO Phil Jordan told InfoWorld.

Now, following an intensive four-month change and training program throughout the COVID-19 pandemic, developers, operations, and product people are all part of what the retailer calls “engineering families,” which have full life-cycle accountability to the business.

This new operating model pushes end-to-end accountability for a product or service out to the engineering teams, including cost management, vulnerability management, risk management, and partner management, all without being overlooked by the now-disbanded Service Operations team.

Those teams are now directly incentivised in line with a new set of devops research and assessment (DORA) metrics—deployment frequency, mean lead time for changes, mean time to recover, and change failure rate—plus service performance, total cost of ownership, and development cadence.

Cost-management tools from vendor Apptio have been brought in to give engineering a more transparent view of their specific cost base, a tool Jordan said the company is placing “a lot of faith in to give those new teams full transparency of cost.”

Sainsbury’s piloted this new mode of working with the data engineering team throughout 2020, and “it was unequivocal that we demonstrated it drove efficiency, speed of delivery and colleague sentiment improved,” Jordan said.

Naturally, not everyone was on board with the change. “Some heads of engineering didn’t make the journey with us; they [just] wanted to do develop,” Jordan admitted. However, bringing together dev, ops, and product “has helped us pull together expertise to make engineering think more holistically,” he said.

Pushing responsibility out to engineering teams was a significant shift for Sainsbury’s, but Jordan said that it could account for up to 20 per cent in IT cost savings in the long term.

Spotify taps cost insights to align infrastructure costs to customer growth

Similar to Airbnb, the music streaming company Spotify has worked hard over recent years to build cost optimisation into the engineering process across the company after its infrastructure costs started to outpace user acquisition.

As an engineering-led company, Spotify decided to build its own cost-management tool called Cost Insights, which is built into its internal developer platform called Backstage and has since been open-sourced. Because Spotify mostly runs on Google Cloud, Cost Insights is currently geared to Google Cloud resources.

As RedMonk analyst James Governor detailed in a blog post, the idea behind the tool is “that engineers and engineering teams are incentivised to take more responsibility for the costs associated with the products they’re building. Modelling cost becomes part of the engineering process, rather than being a separate process for finance teams to manage.”

A culture of sharing cost-savings was encouraged through the Cost Insights portal itself and through an internal wiki called Our Cookbook. This encouraged competition among teams to drive down their costs and share major wins with the rest of the organisation.

Cost optimisation isn’t completely decentralised at Spotify, however. A cost-management organisation is tasked with intervening if they see a team or service quickly ramping up costs, engaging with that group to find out why and what can be done to bring things back under control.

“Spotify found that the best way to get involvement was to encourage engineers to use Cost Insights in a cadence alongside existing quarterly planning. If there were issues that the cost team felt needed attention, they’d alert a team before those meetings.

"That said, if costs are rapidly escalating out of control for a particular service, that’s something that should generate an alert, so anomaly detection is on the Cost Insights product roadmap,” Governor wrote about Spotify’s efforts.

At Spotify, these costs are benchmarked against engineering resources, so if a team wants to optimise a service it must account for the value of that work in terms of full-time employees that could be hired using the savings. “Early experiences with Cost Insights allowed Spotify to fund the equivalent of 25 teams across the company,” Governor wrote.

Nationwide banks on finops as part of its cloud transformation

Financial services company Nationwide presciently decided to implement cost considerations in tandem with its broader cloud transformation program, which is currently in its third of four years—meaning that finops principles were baked in from day one.

However, that early start didn’t mean there was no engineering pushback.

“The main value driver of cloud is speed of development, so you go from a traditional centralised procurement model to a world where every app developer is in procurement as well, so you turn into the Wild West without someone, or a team, looking at the financial implications of that,” Joseph Daly, director for cloud optimisation services at Nationwide, told InfoWorld. “Everywhere I have gone there is initial resistance to this, as it is seen as additional bureaucracy which slows them down.”

With a degree in accounting from Miami University, it’s not that surprising that Daly boiled down the company’s approach to cloud cost optimisation into a formula.

“Your cloud bill equals usage multiplied by the rate,” he said. “We centralised rate management for things like savings plans and reserved instances at a high level for the enterprise.

"Then for usage we decentralised for application teams to be responsible themselves. Being informed needs a tagging strategy and structure, so when developers provision they tag something in a meaningful way."

Overcoming this barrier comes down to education for Daly and his team. “Anyone without finops principles won’t be able to manage their environment. But a good tagging and enforcement strategy like the one we have here at Nationwide will get the engineers on board with optimisation, as they can see the direct financial impact.”

Taking things one step further, Nationwide also has a chargeback model, where each application team is responsible for its exact usage via a monthly bill. “That creates accountability,” Daly said. “If they can see charges, they can see if they are oversized or shut things down overnight or use cloud-managed services where we can.”

Daly does warn against gamifying these incentives too much, however. “In my experience, if you turn it into a game, it gets played like a game,” he said.

Just Eat Takeaway.com feeds data-hungry developers cost insights

European food delivery company Just Eat Takeaway.com, which is the result of a merger between British and Dutch companies in 2020, predominantly runs on AWS, with applications broken down into microservices. For those teams based in the UK, Ireland, Italy, Spain, Australia, and Canada, a central finops team is tasked with giving engineers better visibility into cloud costs, with the aim of eventually reducing them.

“Our [finops] team is engineering-focused, and we build tools to either help improve visibility of cloud costs or to reduce them,” David Andrews, head of engineering for platforms at Just Eat Takeaway.com, told InfoWorld.

This manifests in a hub-and-spoke approach, with a central team tasked with making efficient buying decisions for the whole organisation, and with engineers armed with tools like Microsoft Azure Cost Explorer, the open source Cloud Custodian, and Apptio’s proprietary Cloudability to track their cloud spending.

Gaining buy-in was a challenge in the early days, when the process of tracking cloud spend was largely manual. Andrews said that automating these tasks can help remove that friction for engineers, including setting alerts for teams that start to go over budget.

“One of our previous challenges was that we used to conduct a portion of our activity manually, such as reporting and reviewing costs. As we continued to grow and scale, this became less sustainable,” Andrews said. Simplifying reporting and investing in training workshops helped gain developer buy-in, as did openly talking about the topic of cloud costs in technical all-hands meetings.

Like many businesses focused on growth, Just Eat Takeaway.com doesn’t want to reduce cloud costs to the detriment of growth. “While we do of course track our cloud costs effectively and regularly, we do that alongside tracking our business growth so that we don’t view cloud costs in a vacuum,” Andrews said.

Spreading the gospel of finops

Now, as cloud computing becomes more popular with enterprises that weren’t born in the cloud, the need for a common set of easily implemented finops principles and tools will become integral to many companies' bottom lines.

This can start with a single person or a small team tasked with establishing an enterprise-wide account, label, and tagging hierarchy. Once everyone is working from the same data, the hard work of education and culture change can begin.

It’s important not to get downhearted early on, as there is no golden path to better cloud cost management. So it is important to start early, learn, and iterate as you go to make those important marginal gains. Strong top-down support will be critical.

The fact is, as cloud makes up a bigger and bigger chunk of an organisation’s technology bills, it’s only a matter of time before these finops practices become mainstream and every engineer will need to know their way around a cloud bill.