AWS Well-Architected: Security (Part 2)

In part 3 in my latest series showcasing the six pillars of the AWS Well-Architected Framework, we continue to take a look at the Security pillar. As security is such a large topic, we will be covering the security stack across a series of posts. The intent of this blog series is to cover security and security practices at a high level. This is not an in-depth or comprehensive security guide. For comprehensive security guidance, Cevo’s Security Practice has your back.

If you’d like to learn more about the other pillars of the Well-Architected Framework, check out the other blogs in this series via the links below.

What we will be covering

  1. Defining Security Permissions Evaluation Behaviour
  2. Defining Security Boundaries
  3. Evaluating Security Requirements

Why we are learning this

  1. To help others navigate and understand how to design and implement secure architecture that aligns to security requirements
  2. Using the AWS Well-Architected: Security Pillar for guidance to build secure infrastructure

How this will help me

You will:

  1. Be able to successfully define security requirements
  2. Have a better understanding of the components used in securing infrastructure
  3. Have a better understanding of security terms and the services at a high-level
  4. Be able to build effective solutions aligned to the AWS Well-Architected: Security Pillar

How Are Security Policy Decisions Evaluated?

To understand whether security policies being applied will act and behave as they are expected, it is important to recognise how policies are evaluated and how this applies to the logic you want to achieve. Visually, here is how policies are evaluated in the AWS cloud as described in greater detail here:

The most important lesson to know when working with AWS resources for permissions is that policies start with a “deny”. That is – everything is denied by default unless it is permitted, called an “implicit deny” as it is not expressly declared as a deny policy – otherwise known as an “explicit deny”. An implicit deny is a catch-all rule where there is no other permit statement to allow access to a resource. Policies are split between accounts, individual resources, identities, permissions boundaries and sessions. As each is evaluated in order – policies are searched for an “allow” statement that then evaluates the permission further. Combined together, these policy evaluations result in effective permissions.

Defining Security Boundaries

Outside of IAM policy evaluation, let’s take a look at two network-related security boundaries; security-groups and network ACL’s.

If we think about security as gates and consider the OSI layers for a moment – each layer represents a “gate” of permission to the next layer. Network ACL’s in traditional networking referred to applying security permissions at the Data-Link (Layer 2) OSI layer. In modern networking – these extend to the Network layer and Transport layers of the OSI model as well.

In AWS, a Network ACL’s (or NACL’s) is applied to a whole subnet, however it uses IP addresses (Network layer) and TCP/UDP/ICMP ports (Transport layer) to define either input or output traffic and whether traffic is explicitly denied or permitted. This is an important distinction over another construct – Security-Groups.

An ACL is a set of rules but is applied in a single direction – inbound or outbound from the subnet it is applied to. Traffic that is allowed out does not expressly allow any return traffic. This is called being “stateless” as there is no logic applied to recognise return traffic as part of the same flow (or “state”). This is especially difficult to limit when using TCP as it relies on two-way communication between a source and destination host to acknowledge packets. Additionally, you can deny traffic – but this traffic must be denied before a rule that allows the traffic through – as it is evaluated by rules in order they are applied in. As an exception to the rule of start with “deny” – the default NACL’s permit everything inbound / outbound by default. For good reason too – as there needs to be an understanding of them prior to modifying the default behaviour.

A security-group starts with a default behaviour of “deny” as nothing is permitted. This is called an implicit “deny” as it is implied that in the absence of a permit statement, everything is denied. Unlike a NACL, security-groups are what we refer to as “stateful” as any return traffic as part of an existing permitted flow inbound or outbound is implicitly allowed by the corresponding security-group rule. Additionally, a security-group is applied to a specific resource network interface – not the whole subnet. As shown below – this is a table representing the differences between NACL’s and security-groups for comparison:

Security group

Network ACL

Operates at the instance level

Operates at the subnet level

Applies to an instance only if it is associated with the instance

Applies to all instances deployed in the associated subnet (providing an additional layer of defense if security group rules are too permissive)

Supports allow rules only

Supports allow rules and deny rules

Evaluates all rules before deciding whether to allow traffic

Evaluates rules in order, starting with the lowest numbered rule, when deciding whether to allow traffic

Stateful: Return traffic is allowed, regardless of the rules

Stateless: Return traffic must be explicitly allowed by the rules

Visually, here is how the gates are applied when evaluating the network boundaries:

This requires a mindset shift when comparing against traditional and/or on-premise models where segregation of resources are typically separated by “networks” such as VLAN’s and physical connectivity – only focussing on North-South (between networks) and not East-West (within the same network). With modern networks using Software-Defined Networking (SDN) – we start to see that model represents more like networking in the cloud – combining North-South and East-West filtering / isolation. More information about this can be found here.

In any case – permissions applied by NACL’s and security-groups are applied at the Network and Transport layers – but remember there is a stack and just because the traffic is permitted at this level – it may be denied further up the stack. As security-groups must be defined against resource network interfaces, they are important to understand in their role to protect resources. For more comprehensive security covering your AWS infrastructure as a whole – there is the AWS Network Firewall service.

Effective permissions require all policies to be evaluated before allowing access to the resource. This may be in the form of identity permissions (username/password or security keys) or resource based permissions (roles, policies and file permissions). So we must consider all gates being permitted all the way up the OSI stack before we can be given effective permissions to any resource.

Evaluating Security Requirements

When talking about security requirements there is a lot of overlap between key security constructs like encryption, key management, firewalling, logging, identity and resource policies with operational excellence such as automation, backups, patching, secure remote access, alerting and monitoring. A key concept for understanding security requirements is having an awareness of your functional requirements such as system to system communication and operational functions such as your availability requirements as a lot of this drives your architecture – therefore what you need to secure. Defining what you need to secure enables you to understand what is involved in securing your workload.

A common term used in security terminology is called the Principle of Least Privilege. What this means is to only provide the access that you need – no more and no less. This principle can be done however is seldom implemented to completion as it is inherently difficult to filter down to specific IP addresses and users – especially in enterprise applications. The reality is that least privilege can be implemented using controls such as resource and/or identity-based policies by grouping identities into user groups and using security-group chaining (using other security-groups as sources/destinations). Integrating where possible with trusted identity sources such as Active Directory (as Managed AD or even Azure AD) for SAML via Single Sign-On or Cognito/SimpleAD to generate temporary credentials.

In summary:

  1. Know what your application requirements are
  2. Understand any system to system communication and protocols
  3. Look at what is needed to meet your functional requirements and operational controls
  4. Think about how to automate the features you need to reduce the need to have to login to your workloads
  5. Use temporary credentials where possible
  6. Use chained security-groups instead of specific IP addresses to maintain elasticity
  7. Where possible – integrate applications with trusted identity sources
  8. Understand your threats and where they are coming from to take appropriate action

Enjoyed this blog?

Share it with your network!

Move faster with confidence