The 7 Laws of Building for the Frugal Architect

Introduction

At re:Invent 2023 there were a lot of announcements and, while the overall theme for this year was focused on Generative Artificial Intelligence (AI), there were a few announcements that may have quietly slipped under the radar. One such announcement was from Amazon Web Services (AWS) Chief Technology Officer (CTO) Werner Vogels who announced 7 foundational laws for building solutions with frugality in mind for architects. This was referred to as The Frugal Architect: https://thefrugalarchitect.com/

This blog discusses each of the 7 laws with some practical consideration to each law to assist you in applying each of them in building solutions with frugality. In the AWS Well-Architected Framework, we have six pillars that combine the fundamentals of each law into a cohesive lens of frugality. While each pillar comes with its own guidance, it is equally important to understand that each pillar has some overlap. Practical application of frugality is taking a lot of the points discussed in the Cost Optimisation and Sustainability pillars and entwining these into Operational Excellence, Security, Performance Efficiency and Reliability.

What Are the 7 Laws for the Frugal Architect?

The 7 laws for the frugal architect are broken down into 3 main phases of building a solution. The 3 phases are Design, Measuring and Optimising. The Design phase laws focus on the phase before the build, the Measuring phase focuses on the build itself and the Optimising phase focuses on the ongoing operation of the build. The 7 laws are distributed across these 3 phases as follows:

Design

  • Law I – Make Cost a Non-functional Requirement
  • Law II – Systems that Last Align Cost to Business
  • Law III – Architecting is a Series of Trade-offs

Measuring

  • Law IV – Unobserved Systems Lead to Unknown Costs
  • Law V – Cost Aware Architectures Implement Cost Controls

Optimising

  • Law VI – Cost Optimisation is Incremental
  • Law VII – Unchallenged Success Leads to Assumptions

Law I - Make Cost a Non-functional Requirement

When we first design a solution to solve a problem, the technical requirements are often the easiest to discuss, understand and articulate. A solution often has well-defined metrics to determine the success of it – “my problem is x, and the solution meets or exceeds x.” It is steeped in tangible things we can measure and binary in nature.

Non-functional requirements are harder to measure and more difficult to discuss. They are more open-ended by comparison and require statements to describe. How do we operate and maintain the solution? How will it scale? What are the security threats of the solution? How is it accessed by both customers and operational teams? How do we ensure compliance?

Applying the principles of Law 1 is to ensure that we can articulate the costs involved. How much does it cost to implement this? How much will it cost me to operate and maintain the solution? What are the largest costs to the solution to understand the costs better? The goal of Law I is to know your costs and prevent bill shock. Costs in code can be just as damaging such as setting default instance sizes if not specified which can lead to developers not sizing solutions appropriately or even running inefficient code that takes longer to process when using serverless solutions

Law II - Systems that Last Align Cost to Business

Most businesses rely on the concepts of profit and loss. Not all solutions tie directly to front line of business or customer facing that can be directly or indirectly tied to the revenue of the business. However, all solutions deployed by an organisation directly have a cost to the business. The principle of Law II is that the systems that are most successful and last over time are those that align to a business function that ties to the revenue of the business. Those solutions that do require more time and effort to design typically has more on the line. It is important to understand during design phase how the business model operates and how the solution relates to the revenue of the business. A solution that relates to revenue may require additional cost to implement, run or maintain however this is mostly acceptable if the cost does not overtake the revenue it generates.

Law III – Architecting is a Series of Trade-Offs

Being frugal is not just about minimising spending. Practising frugality is about understanding how your spending ties directly to value. Cost, performance, reliability, and security are often in competition with each other. To release some of the competing priorities, we need to consider not just the cost of doing something, but what the cost is of not doing something to balance the cost and trade-offs. If the cost is justifiable, then it is less of a concern.

Take GuardDuty for example. Amazon GuardDuty is a threat detection service that continuously monitors your AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation. It can be seen by some as a large cost to implement. Let us consider the cost of having to complete a root cause analysis of an anomalous event that occurs or worse – a breach from a security event. Even if GuardDuty did not prevent a breach it may have been able to warn and give insight well prior to the event occurring. The cost of a breach or other anomalous event could cost thousands or millions of dollars. Spending on GuardDuty then makes that service justifiable to burden the costs.

For reliability, a solution will rely on some form of replication or multi-AZ architecture. If the solution will be customer-facing and directly tied to revenue, it is likely that reliability is a high priority. To practise frugality is to understand the needs of the business and the impacts a poor user experience might have to customers. This may influence the solution to deploy a fault-tolerant solution as opposed to a high-availability or disaster recovery solution. An e-commerce platform with millions of active users may not be a great candidate for tolerating a large recovery window for a disaster recovery architecture. It may influence the use of serverless architecture, and this might take longer to design and implement if an existing solution is being re-architected.

Backups are often seen as a large cost burden over time. However, during a disaster event, having backups available may be the difference between losing customer data or keeping it. Many industries have guidelines through compliance frameworks enforcing the use of backups and record keeping. Placing data on the correct tiering of storage makes an enormous difference to the cost. The storage strategy in addition to defined Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO) typically influence storage needs, however compliance storage requires archival storage. Higher cost storage tiers have a lower per-object operation cost, while lower cost storage tiers have higher per-object operation costs. If storing in archival tiers – it is important to reduce the number of operations, you do to that data to a minimum as this can add up very quickly.

Law IV - Unobserved Systems Lead to Unknown Costs

As humans, we cannot measure what we do not know. The AWS Billing console has been updated with a few new features and consolidated many of the scattered tools into a convenient place inside the Billing and Cost Management console. This allows greater granularity of observed costs, trend analysis and now also includes cost anomaly detection. The data is only as good as what it can see, so having a good tagging strategy is important. To aide in enforcing a well-defined tagging strategy, you can use tagging policies applied through AWS Organizations. This is all good for cost, however we need to correlate the costs and observed patterns over time in combination with the performance of our applications.

Observing an application could be done through CloudWatch dashboards, metrics, and logs. An enhancement to helping to find and fix faults faster is the new CloudWatch Patterns which uses advanced machine learning algorithms to find recurring log patterns. This can lead to better insights into potential issues, but also code which may be causing errors or affecting the performance of an application so they can be remediated. Third-party observability platforms may also provide a more cohesive single pane view for observability and this might help in managing access to information at scale to end-users to alleviate challenges with role-based access control. Enforcing logging to be enabled by default is also a strategy that can ensure observability success. Also, it is worth diving into enabling AWS Health to track the overall health of deployed solutions.

Setting budgets and alerts to ensure the notifications go to key distribution groups for visibility is a terrific way of building a culture of cost awareness and a sense of responsibility. Personally, I like to think about the perspective that if I were paying for something myself, would I deploy it in the same way? It creates a shift in thinking about responsible spending.

Law V - Cost Aware Architectures Implement Cost Controls

Through identification and classification of critical, supplementary, and non-critical workloads, organisations can determine where costs are acceptable and where they can be optimised. The intersection here with the Well-Architected Security and Cost Optimisation pillars is also paramount to applying this law successfully. Applying principal of least privilege, account separation / segmentation and AWS Organizations OU’s set the foundation for allowing enforceable controls around cost optimisation. By applying strategic Service Control Policies (SCP’s), this can help to prevent users from consuming expensive services in addition to grouping role-based access controls through implementation of IAM Identity Center (formally Single Sign On – SSO). Enforcing tagging policies and automatically stopping non-critical workloads after hours can also be valuable in reducing costs. Implementing these solutions early on are key to maximising their value. In addition, using Cost and Usage reports, Compute Optimizer and Reserved Instances (RI) utilisation reports further provide insight into how services are being used to their fullest extent.

Using a combination of DevOps Guru, CodeWhisperer and Amazon Q, we now have many ways to find improvements in code and service usage through leveraging the power of Machine Learning (ML) and Generative AI. These tools on their own will not fill all the gaps and humans play a role in accepting or rejecting suggestions. By leveraging these tools however, we are given an opportunity to remove human bias and find greater efficiency, potentially reducing costs further.

Law VI - Cost Optimisation is Incremental

Implementing effective cost optimisation needs two things to really work: time and data. The data comes in the form of trend analysis and reports. You need some time and several reports to determine where cost optimisation can occur most effectively. This allows improvements to be made over time such as right-sizing workloads for a reduction in infrastructure costs or optimising the use of Serverless architecture. Cost optimisation is a continual learning and development path. It is important not to look at cost data in isolation as it needs to also be compared with revenue generation so additional costs align to workloads that provide revenue generation to offset those costs. 

Law VII - Unchallenged Success Leads to Assumptions

The phrases “we have always done it this way” or “we are a {insert vendor / programming language} shop” are incredibly damaging to fostering a culture of innovation. Some huge improvements to cost reduction could come from optimising the program language solutions are written in for example. This is especially important in Serverless architecture where you are paying for execution time. This approach however requires balance to ensure supportability, efficiency, and skills across the organisation.

To help with both tackling both phrases, bring the conversation back to innovation by asking: “if you could improve the way you do things now or the programming language you use, what would that look like?”. To delve into more responsible spending: “if it were your money, what would you do differently?” Both questions cause those individuals or groups to reflect on improvements and shift the mindset towards innovation. I have personally had this technique applied to me in my career and it was useful to aid the conversation towards problem solving. Problems are much easier to define than the solutions to resolve them. Become part of the solution, not the problem.

Conclusion

Embracing the 7 Laws of Building for the Frugal Architect is extremely useful for architects navigating the AWS landscape, emphasising cost-efficient and sustainable solutions. These laws, spanning Design, Measuring, and Optimising phases, instill fundamental principles such as considering cost as a non-functional requirement, aligning systems with business needs, and understanding trade-offs in architecture. 

Enjoyed this blog?

Share it with your network!

Move faster with confidence