The hidden cost of underinvesting in Infrastructure as Code maintenance

Greg Luxford

25 July, 2025

TL;DR
Neglecting Infrastructure as Code (IaC) maintenance may seem like a cost-saving move, but it often leads to technical debt, increased security risks, and delayed innovation. This blog explores why maintaining foundational tools like AWS CDK, Terraform, and Ansible is critical to long-term productivity, agility, and cost optimisation. Waiting too long to update can result in compounding issues, lost time, and missed opportunities. Incremental improvements and proactive maintenance are key to staying resilient and competitive.

Introduction

With ever-tightening budgets and resource constraints, optimising where to spend valuable dollars is becoming increasingly difficult. The general view within organisations is that to optimise costs, they need to reduce spending through cost avoidance. By spending less, this is seen as saving money and therefore results in increased profits. Cost avoidance, however, is not true for cost optimisation.

Maintaining Infrastructure as Code (IaC) tools is often neglected and seen as a cost to productivity. That is, until the lack of maintenance grinds productivity to a halt. This blog challenges the notion of sitting still on maintenance of critical tooling to focus on productivity and the true costs this has on organisations. We will explore the core concepts around cost optimisation and how to challenge the status quo that making no changes equals a reduction or net zero cost to the organisation based on real-world experiences.

What is cost optimisation?

Cost optimisation is about spending where it matters most. Initiatives involving an increase in spending that can be directly tied back to revenue production streams are seen as an investment to increase profits. Conversely, cutting costs that are wasting resources and money is also considered cost optimisation. Cutting unnecessary costs and spending where it makes sense makes up the foundations of what defines cost optimisation. It is s sum of its parts, not just saving money. Cost is a larger metric than financial implications, and this is why the concept of total cost of ownership includes the exertion of effort to maintain something. Being able to justify spending effort and attributing it to an outcome is important.

Maintenance versus progress

In small organisations that are highly constrained by resources, progress or rapidly shipping products often keeps teams too busy to make improvements. This is especially true for startups and smaller organisations with minimal staff. Shortcuts are often taken to deliver products and features faster. Workarounds are often introduced with a view to fixing them later. Teams are under intense pressure to deliver on time due to budgetary or even regulatory constraints.

They may work extraordinary hours for a production release deadline and require considerable time off to recover. Maintenance, it seems, is furthest from the product owner’s mind at the time of delivery. When the product goes live, downtime to resolve temporary workarounds becomes difficult to address without taking customers offline during maintenance.

In an exceptionally large organisation I consulted in, there was a mantra held within DevOps squads and endorsed by Senior Management of the company:

“Done is better than perfect”

The intent was to endorse the idea that something may not be perfect at the time of release and that this is okay. When taken literally however, it became clear that squads often re-used legacy code that was not regularly maintained and consequently used legacy versions of packages, builds and releases due to the convenience of not troubleshooting potential issues.

“We have always done it this way” – the most dangerous statement any company or employee can make.

The aftermath of ignoring Infrastructure-as-Code maintenance

Once products and services are delivered, there is seldom time for teams to work on previous solutions to address all those temporary fixes. Products may become too valuable to bring down for maintenance, or the worst case may be that multiple products and services are built one after the other – sharing the same foundations that have now become outdated or obsolete.

This is especially true for Infrastructure-as-Code (IaC) solutions including AWS Cloud Development Kit (CDK) versions, Terraform Providers, AWS CLI commands, JavaScript packages, Node versions, Python versions and imports.

Just like Operating System patching, there is a need to keep the IaC foundations updated too. As the cloud is continuously evolving, supporting legacy versions play its part in deprecation.

For example, see below for some critical deprecation and support dates for Lambda runtimes. Lambda Runtimes – Deprecation Dates

Name	Identifier	Operating system	Deprecation date	Block function create	Block function update
Python 3.9	python3.9	Amazon Linux 2	Dec 15, 2025	Jan 15, 2026	Feb 15, 2026
Node.js 18	nodejs18.x	Amazon Linux 2	Sep 1, 2025	Oct 1, 2025	Nov 1, 2025

NOTE: AWS customers running these versions will be alerted through the AWS Health Dashboard.

Reasons to maintain IaC

As new services are released, developers often want to make use of them to simplify or add features to the existing infrastructure supporting services. New services and features often have new API’s and feature flags often only available unless using the latest provider versions.

If foundational IaC such as AWS CLI, CDK, Terraform Providers or Ansible versions are not maintained, this may hamper efforts to roll out new products or services to customers. The longer this is left, the harder it will eventually be to catch up. Here are some real-world war stories.

Thankfully, Lambda runtimes are easy but there is refactoring needed at time to support new versions and this is true of most codebases when upgrading. It is not always straightforward or easy to carry out those changes at scale.

Scenario 1 – CDK versioning

In a customer I consulted, the AWS CDK version had not been maintained, and it was still running on version 1 of the CDK. All the solutions built using CDKv1 had constructs and codebases compatible with it.

To use a new AWS service and features, this required moving to CDKv2. The upgrade to CDKv2 from CDKv1 should be straightforward, however as it had been left for so long between the upgrade, there had been many applications built and moved into customer facing production environments. Modifying all constructs and code bases was a challenge. There were also package dependencies and the CI/CD pipelines to update.

To make matters worse, the build agents had dependencies for other software packages and ran a deprecated operating system. This added to significant risk of unpatched vulnerabilities and uncovered a failure in the build agent maintenance process – it had none.

Result:

Several weeks of setbacks for developing the new services and their dependencies until the discovery, thorough testing and Change Management processes resulted in the upgrade to CDKv2. The timeline for remediation was impacted by impending change freezes, and this impacted several teams waiting for new services. Teams were disrupted by the process of supporting and creating a build agent maintenance process to prevent future problems.

For readers interested in securing and streamlining CDK deployments, check out Cevo’s insights on integrating AWS CDK with Bitbucket and OIDC.

Scenario 2 – Ansible Tower

In a customer I consulted, the Ansible Tower version was running incredibly old versions of Ansible and supporting packages. As a consultant, I was added to the BAU team for augmentation. A process existed for the team to locally validate the Ansible playbooks first before committing them to the code repositories.

I was unable to build a local environment compatible with the Ansible Tower and local versions as they were no longer supported and had been deprecated. Teams purposely did not upgrade machines to allow continued use.

Result:

Apart from running critical and high vulnerabilities, the solution also exposed users to potential threats. Unable to develop and test locally, this required me to test all playbooks through Ansible Tower. This tied up valuable resources as teams queued and fought for availability of the Ansible resources.

It slowed down development considerably and impacted other teams. Critical product releases were delayed while teams had to troubleshoot failures and compete for runs that failed intermittently due to resource starvation.

Solutions for proactive Infrastructure-as-Code maintenance

To reduce the risk of major changes and disruptions in the future, tackling these issues early on helps to reduce the overall impact of changes in the future. This is not to say organisations should immediately jump onto the next version when it becomes available but also do not wait until you need it to update.

For solutions such as AWS CDK, Terraform, and Ansible, they represent the foundation of your deployed solutions. They are super important to maintain and prevent larger bodies of effort moving forward.

Fail forward

There is no magic formula for when an organisation should upgrade, however it could be driven through policy and enforcement of using something such as the “N-2” method (ensuring you are always a maximum of two releases away from the latest release). Where a challenge might be endured, resist rolling back major versions.

There is a good chance your existing solutions are simply fine, and the impact of the changes is isolated to newer services. Analyse the impact and consider if the challenge encountered is resolved in the next version. Use the data to tell you what to do; not the emotions of something failing. Rolling back is not always the answer.

Learn more about how partnering with AWS experts can help reduce risk and improve your cloud infrastructure strategy.

Incremental changes

Constant moving forward might seem counter-intuitive if seen through a lens of “everything works now – do not fix what is not broken.” Consider the impact notable change has on your organisation. If your teams are used to change happening in regular, small doses, there is far less chance of catastrophic failure or burnout.

Problems are tackled while they are small and before they have a chance to make an enormous impact, which reduces anxiety. Think of your IaC build tools as just another Git repository with regular and incremental changes pushed up to the master or main branches. Be Agile and move with the times.

Have a test environment

Evaluate changes and potential impacts of version upgrades to a safe and non-disruptive environment. Being able to evaluate the impact of change helps quantify the risk associated with the change. This is a notable example of spending where it makes sense. There is a return on investment in reducing operational risk.

It reduces anxiety on teams making changes and contributes to a mature culture around Change Management using evidence-based decision making for high-risk activities.

Tooling choice and design

Use tooling that makes sense to the case. The same universal tooling for everything including platforms, controls and infrastructure sounds great in isolation. Keeping everything consistent means the same processes, skills, and procedures are consistently applied across teams. It does also mean that the same problems exist for all your teams. It becomes a shared risk.

The tools you use for managing applications and platforms do not have to be the same tools. Developers have a vastly different role to platform engineers.

Something interesting about CloudFormation is that it maintains compatibility and is always up to date. Custom resources and inconsistent APIs to the AWS CLI aside, as an IaC tool, not having to manage versioning and maintaining backwards compatibility makes it an especially useful tool.

The ability to derive ChangeSets and dry runs also helps smooth out changes to services and foundational infrastructure.

Tools like the AWS Landing Zone Accelerator are built purposefully with integrated update paths. The solution is managed by AWS and self-maintainable in the future through CDK that runs under the hood.

Again, this does not mean that Developers need to use CDK.

To dive deeper into CloudFormation and building secure infrastructure-as-code solutions, see Cevo’s guide on creating secure code analysis applications with AWS Bedrock and CloudFormation.

What is the actual cost of doing nothing?

Here are some of the side-effects of doing nothing:

Decreased morale (no change means no challenge)
Increased risk from vulnerabilities
Lack of innovation and engagement increases staff turnover
Higher risk to change management
Harder to pivot to new services
Increased blast radius for potential impact
Increased anxiety when changes occur
Higher risk for reputation impact due to unforeseen outages
Competitive edge over competition is lost

Conclusion

It is important to understand that change is inevitable. Just because something works today, it does not guarantee that it will work tomorrow. Tooling choices have tradeoffs like any decision. When choosing tooling, organisations need to consider the maintenance and cultural impact these have on the organisation. Maintaining solutions needs to be prioritised, planned for, and incorporated into ways of working.

So, is your organisation sitting still?

Does this blog resonate with you or your team? If so, reach out to the team here at Cevo – we specialise in simplifying the complex and tackling hard problems.

Greg has been in the IT industry for 17+ years across a range of roles and specialties starting his career in IT within the Australian Army in the Royal Australian Signals Corps. Since then, he has worked across several industry verticals including Local and State Government, Network Integrators, Gaming, Mining, Private Cloud Providers and AWS Consultancies covering Insurance, Energy, Banking and Airline industries in addition to Public Sector (Health, Transport, Justice, Communities and Child Safety). Greg is an AWS Ambassador and AWS Community Builder for Cloud Operations. Greg specialises in network engineering, cloud infrastructure, governance and compliance, security, cloud operations, AWS Well-Architected and DevOps.

Enjoyed this blog?

Share it with your network!

The hidden cost of underinvesting in Infrastructure as Code maintenance

Table of Contents

Introduction

What is cost optimisation?

Maintenance versus progress

The aftermath of ignoring Infrastructure-as-Code maintenance

Reasons to maintain IaC

Scenario 1 – CDK versioning

Result:

Scenario 2 – Ansible Tower

Result:

Solutions for proactive Infrastructure-as-Code maintenance

Fail forward

Incremental changes

Have a test environment

Tooling choice and design

What is the actual cost of doing nothing?

Conclusion

Data Strategy Diagnostic: Building a Robust Data Strategy on AWS

Advanced Prompt Engineering for AWS

Unit testing JSONata in AWS Step Functions: Breaking Free from Step Function Testing Constraints

Prompt Engineering for AWS: Why I Started Having Real Conversations with AI