In part 7 in my latest series showcasing the six pillars of the AWS Well-Architected Framework, we continue to take a look at the Cost Optimisation pillar. The Cost Optimisation pillar goal is to put due consideration into total cost of ownership (extending beyond monetary costs to effort and resource reduction also). If you’d like to learn more about the other pillars of the Well-Architected Framework, check out the other blogs in this series via the links below. Otherwise, let’s get stuck in!
What we will be covering today
- What is Cost Optimisation?
- Common considerations with cost optimisation
- Improving cost optimisation practices
Why we are learning this
- To help others better understand the concepts of cost optimisation
- Using the AWS Well-Architected: Cost Optimisation Pillar to drive efficiency in total cost of ownership
How this will help me
You will:
- Understand good practices for optimising cost in a cloud operating model
- Be able to help champion cost optimisation across your organisation
- Implement cost optimisation strategies to reduce total cost of ownership
What is Cost Optimisation?
Cost is commonly looked at by a singular lens to mean a purely financial term attributed to dollars and cents. While this is an important element in cost, what is often missed is efficiency in operations, reduction in human effort and impacts to management, improvements or experimentation. Combined with financial impacts, we refer to this as a Total Cost of Ownership (or TCO) and what cost optimisation looks at is a reduction of TCO.
Common Considerations in Cost Optimisation
Reduce the Bill
Many organisations (especially those that weren’t born in the cloud) are often heard saying that their cloud bills are massive and often higher than other traditional hosting models. The answer lies in the very definition of “cloud computing”:
“Cloud computing is the on-demand delivery of IT resources over the Internet with pay-as-you-go pricing. Instead of buying, owning, and maintaining physical data centers and servers, you can access technology services, such as computing power, storage, and databases, on an as-needed basis from a cloud provider like Amazon Web Services (AWS).”
Compute and Storage
The operating model when running on the cloud is very different from traditional hosting in that many resources don’t need to be running 24/7. As resources are charged by the second, minute or hour, reducing the running time of those resources to only when needed is a significant cost reduction measure. Similarly, traditional hosting models encourage overprovision of resources to cater to demand spikes and future storage growth. Using the cloud, auto-scaling for compute capacity and storage provide the ability to consider only what you need and scale as needed.
Storage solutions like EBS for block storage allows you to grow the volume as needed (even dynamically) as explained here: https://aws.amazon.com/blogs/storage/automating-amazon-ebs-volume-resizing-with-aws-step-functions-and-aws-systems-manager/
Other storage solutions like EFS and S3 scale as you grow – so you only pay for the storage you are actually using without paying for pre-provisioned storage.
EC2 is Expensive and Serverless Can Be Too
Both EC2 and Serverless solutions have their place and each has its own pros and cons. Choosing the right solution for your workload hosting needs can make a big difference to your overall costs. Serverless solutions are built using managed solutions so you only focus on your code. This can reduce your operational overhead and save over the long run, however serverless solutions are designed to be used on-demand and where there are infrequent access patterns involved.
If your workload is operating and being accessed constantly, costs can quickly skyrocket. This is where EC2 has its place at an operational standpoint. Automation tools such as Systems Manager can automate many of the onerous administrative tasks associated with managing the Operating System including maintenance windows, patching, inventory management and vulnerability reporting. The caveat is that you must run additional supporting services to provide an auto-scaling function – so consider the TCO when making your hosting decision.
S3 Data Access Patterns
Understanding your data access patterns around S3 and using the correct strategy can save substantially on the overall costs. The costs for storing actual data are relatively low, however the operations on stored objects can be really expensive if not handled properly. The S3 Intelligent Tiering service is terrific for working well with unknown access patterns, however be advised that the cost to run this “managed” service on your behalf can also be costly – so use this pragmatically and where it makes sense to – otherwise just handle data tiering yourself. As data moves through tiers, the storage rate significantly decreases, but the operations substantially increase. If you know the pattern, try to reduce the amount of double-handling across several S3 tiers if you can.
Consider The Instance Type
Instance types that are built on the x86-64 architecture (Intel and AMD) instances cost a lot more than ARM64 architecture (Graviton family) instances especially when considering price to performance ratio. There has been a lot of development in ARM package and application development especially since Apple moved back to ARM architecture back in 2020. Most popular packages such as Python, Ruby, Java, Node, NPM and others have full support on ARM architecture. Besides better performance, you could be throwing dollars away for no benefit.
Where Fargate is being used, there is usually only a couple of lines of code difference between using an x86-64 based resource vs using an ARM-based resource. Due to the performance uptick of ARM-based resources, it may also be possible to reduce the number of instances you are deploying – further reducing costs.
Know What You Have
Inventory management is critical and one of the easiest and least disruptive way of reducing your monthly bill. Having a good tagging strategy helps here, however consider your account and environment structure and put some focus on any dev and test environments. These environments typically have a lot of dead wood that can be deprecated and consider automation to help reduce waste. Old storage snapshots are a huge waste of resources and cost also. Snapshots are rolled up so there is no need to keep original snapshots to get the baseline data.
Know How to Use Resources
Part of the role of Trusted Advisor is to recommend instance and storage sizes based on trend analysis over time. The recommendations come from polling the cloudwatch API for metrics and looking at trend analysis over time. Implementing recommendations needs to be done with caution as not all recommendations are good to implement – don’t trust it blindly and do your research first before recommendations are followed.
Improving Cost Optimisation Practices
Knowing what can be optimised is half of the battle when it comes to reducing the total cost of ownership. Cost Optimisation is a way of working and integrating good practices in reducing cost at a fundamental level reaps many rewards.
FinOps Observability
Something that can help is making informed decisions when new resources are being created through cost differential. Taking a FinOps approach to deploying resources increases visibility into the potential cost implications of infrastructure being deployed. Cevo has baked this product into our very own DevOps Maturity Assessment (DOMA) product. To learn more about this FinOps approach: https://cevo.com.au/post/empowering-your-team-with-infracost-a-finops-approach-to-cloud-cost-optimisation/
Using the Right Service
Strongly consider serverless services where workloads are unpredictable and highly bursty in nature. Where workload load is consistent and has very well-defined access patterns, consider using traditional services like EC2. Serverless has its place and can reduce management overhead also, but that management comes at a cost and can give a false sense of TCO if it isn’t monitored and adjusted accordingly. Where scale is considered, note that many serverless offerings have their limits in how fast they can scale in response to load. Consider all aspects of your workload characteristics and availability requirements and make an informed choice that meets your requirements.
Manage and Review Config Rules
When implementing AWS Config Rules as part of governance and compliance, consider the frequency of your rule scanning and any rules which may conflict / double-up. This can be a costly affair if left to defaults or where there are a significant number of config rules being run frequently. It is important to have both governance and compliance to policies however some of these config rules may be better enforced at another layer such as a release pipeline – tackling non-compliance closer to the source.
Check Availability-Zone Alignment Between Resources
Many cloud operators are not fully familiar with cross-zone data charges between Availability-Zones (AZ’s). When building resilient workloads there is inherently a need to replicate data across AZ’s and data charges for this are unavoidable. That said, managed storage services such as EFS and S3 include the data charge as part of their service. Where heavy reads are concerned with databases from application tiers – consider placement of those resources into the same AZ to prevent unnecessary cross-AZ data charges.
Use VPC Endpoints
Using VPC in of itself can be quite costly. There is a charge per hour for the endpoint as well as the data processed by the endpoint. Where there is large data transfer between workloads residing in a VPC and those accessed via VPC Endpoints, these costs are often much less than accessing those public endpoints via an Internet Gateway. Data charges via the Internet are the most expensive data charges in a VPC. In addition, security requirements may dictate the requirements for use of VPC Endpoints – so this has to be considered holistically.
Take Advantage of Discounts
Where workloads are long-lived and consistent, there are very good savings that can be had through using Reserved Instances and the use of Savings Plans. These solutions work on the concept of commitment to the use of a certain number of resources to AWS in exchange for a heavy discount on the use of those resources. To do this effectively, you need to consider exactly how much you will be using as a forecast and this requires trend analysis and forward projection to take the most appropriate level of commitment.
Leverage AWS Partners for Programs and Projects
This might sound counterintuitive from a cost standpoint however, consider that AWS Partners like Cevo are able to obtain discounts on your behalf such as Migration Acceleration Program (MAP) funding and also Proof of Concept (PoC) funding. Leveraging a certified partner like Cevo to conduct a Well-Architected Review (WAR) – you can take advantage of credits towards fixing high-value items discovered in the review. Not only this, but partners can help in recognising value to market faster through having a wealth of experience across many organisations.
Look At Your Bills
Use your previous bills and the Cost Explorer to determine where the most spend is occurring and how best you can take advantage of pre-purchased capacity.
Leverage Automation to Reduce Effort
Using automation to reduce the effort required to manage solutions. This will give you more time to focus on the things of most value to your organisation. With tools such as Systems Manager, the management overhead of running more traditional services like EC2 can be significantly reduced. Where there is a need to iterate versions of the same environment or the need to provide canary testing, services such as Elastic Beanstalk offer simplicity in switching blue/green environments and implementing canary testing.