Gone are the days in which we operate AWS within a single account. The workloads we are developing and deploying are becoming more complex and therefore security is an even higher focus.
One tool that we have to break down complex while greatly increasing security is the use of security domain isolation. This is just a fancy way of saying, keep things that don’t need to be together apart.
AWS have been constantly developing and extending the tooling available to create and isolate workloads, while still allowing them to connect (where explicitly required) and be operable on mass.
In a future post we will look at how to operate multiple workloads at scale, but in this initial post we will look at how to establish a multi-account structure and set it up in a way that will provide secure and flexible design moving forward.
First of all we need to recognise that an organisation will have multiple workloads – how coupled those workloads are is beyond this post, but we will assume that for this example we have six (6) different workload, each with their own network, compute and storage.
We want to increase our security and reduce the risk of splash damage by introducing as much isolation between the workloads as possible.
We could simply run each workload on different servers, or even within different networks in those accounts – and for a long time this is how we operated AWS. But the complexity to know who needs to access what, and build complex IAM and Tagging policies to enforce this just turned into a log of busy work.
Why did we spend so much time trying to keep things isolated when we could just simply isolate them – in their own accounts?
Lets separate accounts
It is at this level where things start to drastically improve performance and reduce operational overheads. Once we start to design our solution to have fully separate accounts for each workload we can truly start to have a consistent and strong view of operability and security.
If we go one step further and isolate each workload’s environment from each other via accounts we have a strongly consistent way to promote change across accounts and into production, that allows us to test all aspects of our infrastructure, application and configuration before impacting customer traffic.
Separated workloads by accounts mean that:
- Operators of Workload A don’t need to access Workload B accounts at all, therefore no risk of accidentally changing the wrong thing.
- CI/CD and deployment tooling can also be scoped specifically to the given account, ensuring our deployment processes cannot have any side effects beyond the workload and environment undergoing change.
- If something does go wrong, then we are limiting the blast radius to only the services and systems within this account.
- The ongoing costs can easily be seen, as the aggregate bills for the Workload accounts do not need to be sub-divided or broken down – and there is a full non-repudiation view on the bill, no accidentally tagging with the wrong value.
- We can implement workload based security standards, ensuring that we have the right level of compliance for each workload.
To achieve all of this by any other method, we are building and enforcing complex Tagging requirements and namespace isolations – things that an AWS account provides by default.
So you want to get started - where do we start?
The one key aspect you’ll want to plan out before you get started is the organisation structure. To ensure that you don’t create a future operational nightmare you’ll want to think about a few things that are going to be common across your accounts.
- What constraints and controls do you need to put in place across all accounts?
- For example, we might have data residency requirements that mean we need to restrict which AWS regions can be used.
- Or our workloads require some level of 3rd party compliance, ensuring we have preventative and detective controls consistently applied across all workloads.
- How do you want those constraints to be applied?
- While compliance is important for accounts holding Customer data, some of these may create a level of restriction that limits innovation.
- Or do we want more strict cost controls on experimental accounts to limit the risk of bill shock?
- How do you need to control access to shared services?
- There are always some common services, having these isolated and controlled by the appropriate cost, security and reliability controls will be key to reducing the impact on workload teams.
These things can be configured and controlled through combinations of SCP and IAM policies at the Organisation level. To reduce the operational complexity of applying these controls to every individual account, we can utilise the AWS Organisation facility to collect accounts into Organisational Units (OU) which require the same controls and oversight.
Initial OU structure
At Cevo we recommend a single-tier OU structure, not only is this a hard requirement for AWS Control Tower, but it also ensures a balance between complexity and control.
Our recommended design is to start with the following 7 OU’s.
- Sandbox – this OU should have the tighter rains on cost – these accounts are used for innovation, learning or specific experimentation activities.
- Accounts in this OU are likely isolated from the corporate VPN and considered temporary and untrusted.
- Consider a pattern where each individual, team or department have isolated AWS accounts in this OU for experimentation and innovation.
- Development – this OU provides the active development ground for workload teams – in these accounts, teams have a much fuller and broader access to AWS services and can change and develop their workloads as required.
- This OU should have the same Service restrictions as the Testing and Production OU’s
- These accounts are likely to have less focus on Resilience and Performance, and more on Security and Cost alignment,
- Testing – this OU contains accounts that look and behave much more like Production.
- They are designed to provide a stable and consistent environment for repeatable testing, with greatly limited ability for staff to modify resources manually.
- Accounts in this OU should have more of a balance to Production standards
- The ability for teams to “just make changes” should be more greatly enforced, to ensure that automated CI/CD manages assets in the same way as Production.
- Production – this OU contains the accounts that host the production services, these accounts and the data within them are configured with the greatest level of security.
- Each workload may have multiple production accounts provisioned to cover DR and data resilience requirements.
- Production OU’s should have the strictest controls on access and data protection, with incident level alerting if infrastructure or configuration changes are detected outside managed release activities.
- Maintenance – this OU is used to temporarily house accounts to apply alternate restrictions to undertake abnormal or irregular activities with the account. Accounts should not reside for a long time in this OU.
- This OU allows you to move an account into this OU with alternate controls to make changes without the need to modify deployed SCP’s on the Testing or Production OU’s
- Accounts should NOT live in this OU permanently, only relocated here for a short time to enable specific activities.
- Core – this OU contains the central Audit and Log Archive accounts – these accounts are the central location for log and audit event replication, used for forensics or aggregated security inspection.
- Shared Services – this OU contains any shared platform, networking or security accounts.
- As these accounts often span Development, Testing and Production domains, the controls and access to these accounts need to have a different scope to the Workload accounts.
Not all organisations will have the same scale or structure, starting with these 7 domains is a great way to ensure the right level of account isolation, without creating too much complex operational overhead.
Further OU design principles
AWS has published a list of best practice guidance on designing your OU structure that we recommend you should review as part of your OU design.
These principles are a combination of best practices to reduce operational complexity and technical alignment recommendation to ensure design capabilities with AWS Organisations and Control Tower features.
- Organize based on security and operational needs
- Apply security guardrails to OUs rather than accounts
- Avoid deep OU hierarchies
- Start small and expand as needed
- Avoid deploying workloads to the organization’s management account
- Separate production from non-production workloads
- Assign a single or small set of related workloads to each production account
- Use federated access to help simplify managing human access to accounts
- Use automation to support agility and scale
In a follow-up post, we will combine the design recommendations here with SCP features to show how different cost and security controls can be overlayed to provide a clear and consistent way to operate a multi-account AWS organisation.
I hope this post has given you a solid starting point for designing your AWS Organisation structure to provide the framework to establish cost, security, performance and availability controls.
As always, if you’d like assistance in migrating, developing or operating secure cloud workloads, please reach out to anyone from Cevo for support.