Now more than ever, as the rate of change of application and solution development increases, best practice one day becomes obsolete and vulnerable the next. Through continuous compliance, continuous integration and continuous delivery we’ve reduced the cycle time for deploying changes – but how do you know these are the right changes?
To address this problem, AWS created the Well Architected Framework – a set of criteria guiding reviews of a given workload. The framework is directive, but not prescriptive, as to how to solve the problem, and should be used to ensure that all key aspects of a workload’s lifecycle, security, resilience and operability are considered.
The Well Architected Framework
In AWS’ own words:
The AWS Well-Architected Framework enables you to review and improve your cloud-based architectures and better understand the business impact of your design decisions. We address general design principles as well as specific best practices and guidance in five conceptual areas that we define as the pillars of the Well-Architected Framework.
(https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf)
The framework defines 5 key pillars to review your workload:
- Operational Excellence – The ability to run and monitor systems to deliver business value and continually improve supporting processes and procedures.
- Security – The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
- Reliability – The ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.
- Performance Efficiency – The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.
- Cost Optimisation – The ability to run systems to deliver business value at the lowest price point.
Operational Excellence
The Operational Excellence pillar is focused on the ability to run and monitor systems to deliver business value, while still retaining a focus on improving processes and procedures.
The Operational Excellence pillar defines six design principles for cloud:
- Perform operations as code – you should apply mature software engineering practices to your infrastructure and operations systems. By performing operations as code, you limit human error and enable consistent responses to events.
- Annotate documentation – Pre-cloud documentation is hand crafted and rarely in sync with the systems it describes. In environments such as cloud, our focus should be automating documentation as part of the build and deploy process.
- Make frequent, small, reversible changes – The pace of change is rapid, and so our changes need to be smaller, faster, simpler to describe and safer to apply. By enabling safe reversals we can limit (or remove) impact to end customers.
- Refine operations procedures frequently – As systems are put into use, we learn how they behave in the real world. These lessons need to be captured and shared to ensure we are learning from all opportunities.
- Anticipate failure – Recognise that failure is not only always present, but often more recoverable than in traditional infrastructure. Ensure that you invest time looking at what might fail, then validate these scenarios to understand if your procedures and processes are resilient to failure.
- Learn from all operational failures – Follow on from anticipating failure, make sure that there is a shared approach to learning from failures that do occur.
Security
The Security pillar is focused on the ability to protect information, systems and assets while still enabling the regular delivery of value.
The cloud is a game changer for security. Through the introduction of automation and Infrastructure as Code there are now an increasing number of inspection points and controls that can protect a system from malicious or inadvertent compromise and exploitation.
There are seven design principles in the Security pillar:
- Implement a strong identity foundation – Use the principle of least privilege to ensure users and systems have access only to what they need. By connecting this with a central identity management service you can reduce the risk of leaked or exfiltrated credentials.
- Enable traceability – Monitoring, Alerting and Auditing of system changes in real time is a key architectural design pattern for resilient and secure systems.
- Defend in depth – You no longer need to focus on security at a single point, such as an edge network firewall, but can introduce security controls and defence points to all layers of a workload.
- Automate security best practices – With an increasing number of changes defined “as code” you enable a more highly auditable system. Continual compliance strategies evolve through automating security, and are more achievable in a cloud environment.
- Protect data in transit and at rest – Nearly all services AWS provides allow strong encryption at rest, and for most situations configuration and management is straightforward.
- Keep people away from data – Review your architecture to ensure that you reduce or eliminate the need for direct access to data. Automated data activities allow increased auditability to change and impact of data operations.
- Prepare for security events – Being prepared requires consideration of what can go wrong and having an incident management process that suits your organisational needs.
Reliability
The Reliability pillar looks at how a system responds to failure, be that infrastructure or service disruption. Having a resilient system is also key to ensure your service can scale up to meet increasing capacity demands, and scale down without impacting customer experience.
The Reliability pillar defines five key design principles:
- Test recovery procedures – Testing failure modes, and testing at scale is difficult to do pre-cloud, and is regularly put in the too hard basket; the time required to set up a valid test environment can be prohibitive. Ensuring your processes focus on understood and practiced recovery procedures build confidence in the design.
- Automate recovery from failure – Many failures can be detected and resolved through automation available from the monitoring platform. More advanced detection approaches can even preempt failure and begin early mitigation.
- Scale horizontally to increase aggregate system availability – By replacing one large asset with multiple smaller ones, you spread the risk of failure to a smaller percentage of the entire system.
- Stop guessing capacity – Resource starvation and contention are regular causes of failure, but measuring actual usage puts you in a position to identify and remediate key bottlenecks sooner.
- Manage change in automation – Deliver all changes through automation, so you can regularly assess and validate that recovery or remediation activity based on the same migration works.
Performance Efficiency
The Performance Efficiency pillar focuses on the ability to use the available resources efficiently and to maintain that efficiency as demands change.
The Performance Efficiency pillar looks at five key design principles:
- Democratise advanced technologies – Leverage AWS’ platform experience by consuming services in more of a product or service architecture. By pushing complexity into AWS’ responsibility domain, you can access cutting-edge complex solutions as a service.
- Go global in minutes – Ensure your design can take advantage of AWS’s global footprint to deliver lower latency and better experience for your customers all around the globe.
- Use serverless architectures – Through reducing the level of management you have to apply in order to deliver a service, you can repurpose effort into higher value activities, and lower the overall transaction cost.
- Experiment more often – Things constantly and rapidly change; ensuring you have a process to review and evaluate the changes will allow you to exploit the efficiencies of a changing cloud environment while identifying and managing risks.
- Develop mechanical sympathy – Align your use of technology to the needs of what you are trying to achieve. It doesn’t have to be a one size fits all approach.
Cost Optimisation
The Cost Optimisation pillar addresses a review of design and best practices to enable a workload to deliver business value for the lowest appropriate cost.
The Cost Optimisation Pillar focused on five design principles:
- Adopt a consumption model – If you are not using it, turn it off or delete it. Over 70% of the hours in a week are outside standard business hours.
- Measure overall efficiency – Ensure you are measuring the business value of a workload to review against the cost of the system to determine overall efficiency.
- Stop spending money on data centre operations – outsource the lowest common denominators, those business expenses which don’t differentiate you from your competition.
- Analyse and attribute expenditure – If you can’t measure it, you can’t manage it. Same goes for cost, if you can’t attribute the source of expenditure, you can’t assess if it is appropriate or excessive.
- Use managed and application level services to reduce cost of ownership – Total cost of ownership of a service is more than just the hardware run cost. By leveraging the appropriate managed services you can drastically reduce the operational costs and risks.
A Well Architected Review
Just like you take your car in for regular servicing, the AWS Well Architected Review (WAR) is an opportunity to stop and assess your cloud environments and specific workloads to ensure they’re running well and efficiently.
A WAR uses the structure of the Well Architected Framework to analyse workloads, provide architectural guidance and best practises approaches, and enables you to ensure they’re secure, high performing, resilient and efficient.
A WAR covers all five pillars, and consists of over 70 specific questions that dive deep into a solution, and call out risks that are in need of improvement. Cevo recommends that you choose two initial pillars to assess your workload, with a followup activity scheduled to complete the full framework assessment.
For example, if you have recently deployed your application to production, it would be a great time to assess the Operational Excellence and Reliability pillars to ensure you can operate and scale your solution in line with changing customer demand.
If you operate in a regulated environment, it would be prudent to review the Security and Cost Optimisation pillars prior to delivering your changes to production.
Even the most talented development and operations team members are likely to miss specific details as they get caught in the weeds, chasing down problems and solving issues to meet deadlines. Ensure you schedule regular Well Architected Reviews to assess best practice alignment, to gauge and improve the quality of your solution.
How can Cevo help
Cevo has delivered numerous complex projects for some of Australia’s largest well known organisations. We have a highly skilled team who deliver with experience and credibility across all aspects of modern solution design.
A Cevo Consultant-led Well Architected Review is a great way to leverage our expertise via a structured day-long workshop, assessing your workload against two of the five WAR pillars.
During this workshop, both functional and non-functional topics will be explored, with the desire to qualify the current state of your workload’s architecture and implementation.
The answers to these questions are recorded and assessed against AWS aligned best practices, with a final report generated including a roadmap to improvement.
If you would like to know more about the Well Architected Framework, or schedule a review of one of your workloads, please contact us to arrange a discussion.