Background
I was recently doing some work with a customer where they were running many AWS Marketplace Amazon Machine Images (AMI’s). Many of these instances were operating out-of-date software packages and it was difficult to gain insight into running packages etc. The operating systems were varied, ranging from older Windows instances and several variants of Linux distributions. In addition, there was risk of exposed remote administration ports such as TCP 22 (SSH) and 3389 (RDP) through misconfigured security-groups.
The Goal
There were several major milestones to unpack and overcome with alignment targeted toward AWS Foundational Best Practices and the CIS Benchmark v3.0.0. We were seeking the following improvements to the platform and workloads:
Item | Control | Risk | Mitigation |
1 | Security-Groups should not allow unrestricted access to remote administration ports (SSH and RDP) | CRITICAL | Systems Manager – Session Manager and Fleet Manager |
2 | EC2 Instances should be managed under Systems Manager | MEDIUM | Systems Manager – Inventory and Patch Manager |
3 | Vulnerability Management | MEDIUM | Enable Inspector and run SSM-Agent |
4 | Malware Protection | HIGH | Enable GuardDuty Malware Protection for EC2 |
5 | Detective Controls for Organization | HIGH | GuardDuty, Systems Manager and Config |
6 | Proactive Alerting | HIGH | SNS topic subscriptions |
7 | Preventative Controls | MEDIUM | Service Control Policies (SCP’s), IAM Roles and Config remediation rules |
Key Visibility
The customer had not previously enabled these services or integrations, and this lacking visibility meant that they were unaware of exposure and risks. Just like a murky river in Northern Australia, you cannot see the crocodiles. The closest you can get is a signpost warning you of the danger. The challenge is that AMI’s that do not have the SSM Agent installed need to get it done. Getting this done at scale after an EC2 instance has been initially launched is challenging. This problem may seem simple on the surface; however, it is far from simple. Most of these AMI’s are closed from SSH or provide no username and password to login with. SSH keys are also a challenge as is the lack of AWS CLI being installed.
So, how do we deploy packages like the AWS CLI and SSM Agent without logging into the instance
For this, we rely on the magic of EC2 Userdata and the Instance Profile!
What Is Userdata?
When we deploy an EC2 instance initially, we usually configure things like the name, networking, security-groups, storage, whether to attach a key pair, the EC2 instance profile and, of course, the image used for initial build. Where we need to customise some configuration (could be for root certificates, proxy settings, installing packages and configuring the EC2 instance), we use the advanced feature of Userdata. Here, we define a shell, powershell, or bash script to carry out those tasks. This can be in the form of pasting in a script or through a file we attach to the userdata to run. The EC2 instance runs this as part of the cloud-init process where it stores and logs the whole thing. This can be seen in the system log by going Actions > Monitor and troubleshoot > Get system log:
This is a straightforward process when building an EC2 and works well. The script runs during bootup and does what it is meant to do. Here is an example of Userdata installing the pre-requisites for the SSM Agent and deploying the agent in a Debian Linux 12 distribution:
#!/bin/bash -v
apt update -y -q
apt install -y -q apt-transport-https ca-certificates curl gnupg-agent software-properties-common
apt install -y -q python3 python3-pip python3-setuptools python3-testtools python3-toolz python3-wheel
apt install -y -q amazon-ec2-utils
apt install -y -q awscli
curl -sS "https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/debian_amd64/amazon-ssm-agent.deb" -o "/tmp/amazon-ssm-agent.deb"
dpkg -i "/tmp/amazon-ssm-agent.deb"
apt show amazon-ssm-agent
systemctl daemon-reload
systemctl restart amazon-ssm-agent
systemctl status -l amazon-ssm-agent
# Configure AWS Systems Manager Agent software (Start Daemon awsagent)
if [ $(systemctl is-enabled amazon-ssm-agent) = "disabled" ]; then
systemctl enable amazon-ssm-agent
systemctl is-enabled amazon-ssm-agent
fi
EC2 Instance Profile
For the instance profile, you need to attach an EC2 role with the AWS managed policy called “AmazonSSMManagedInstanceCore” to allow management under Systems Manager and for Session Manager / Fleet Manager. For patch management, the “AmazonSSMPatchAssociation” managed policy is also required.
Finally, the Trust Relationship needs to be the Principal Service of “ec2.amazonaws.com” with the “sts:AssumeRole” action:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
NOTE: To automate the process of creating the role, attaching policies and applying the EC2 instance profile and/or managed policies to all EC2 instances, you can use the Systems Manager Quick Start for Default Host Management, followed by the Quick Start for Host Management from the Delegated Administrator of your AWS Organization. For a standalone account, the new consolidated console experience can be accessed via the Explore Nodes feature and will automate the creation of these resources on your behalf.
This works seamlessly and the instance immediately showed up in Systems Manager as a managed node. I was able to see the full inventory and carry out a remote SSH session using Session Manager.
You can find out more about using AWS Session Manager and Fleet Manager for RDP to Windows here:
So, what about those scenarios when there was never such a script added to the Userdata prior to launching?
Adding SSM Agent Post-Launch Using EC2 Userdata
The AWS Managed Services guidance describes the process involved is to:
AMS uses EC2 user data to run the installation script on your instances. To add the user data script and run it on your instances, AMS must stop and start each instance.
If your instance already has an existing user data script, then AMS completes the following steps during the auto installation process (I manually added step 4 – always verify your work):
- Creates a backup of the existing user data script.
- Replaces the existing user data script with the SSM Agent installation script.
- Restarts the instance to install SSM Agent.
- Verify the instance is now managed under Systems Manager
- Stops the instance and restores the original script.
- Restarts the instance with the original script.
The detail is lacking here and just adding your script into the edited Userdata results in nothing working. This is due to AWS no longer reading the Userdata post launch. To get it to read and run the new script, we need a way to influence the behaviour of the cloud-init script to run the new Userdata upon next boot. The key to this is found here:
Ref: https://repost.aws/knowledge-center/execute-user-data-ec2
While it is not overly clear in the explanation, the pre-script before the bash script is the secret sauce to getting this to work. Adding the example and modifying the last line of the bash script with my own resulted in the SSM Agent and its dependencies being successfully installed. The EC2 instance showed up in the Systems Manager console as a new managed instance.
NOTE: I had already attached the EC2 instance profile to the instance prior to testing the script – this is key to allowing the EC2 instance to talk to Systems Manager after the SSM Agent is deployed.
Code: Installing SSM Agent on Unmanaged Instances
The fully working Userdata script for an unmanaged Debian 12 instance is below:
Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0
--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="cloud-config.txt"
#cloud-config
cloud_final_modules:
- [scripts-user, always]
--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"
#!/bin/bash -v
apt update -y -q
apt install -y -q apt-transport-https ca-certificates curl gnupg-agent software-properties-common
apt install -y -q python3 python3-pip python3-setuptools python3-testtools python3-toolz python3-wheel
apt install -y -q amazon-ec2-utils
apt install -y -q awscli
curl -sS "https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/debian_amd64/amazon-ssm-agent.deb" -o "/tmp/amazon-ssm-agent.deb"
dpkg -i "/tmp/amazon-ssm-agent.deb"
apt show amazon-ssm-agent
systemctl daemon-reload
systemctl restart amazon-ssm-agent
systemctl status -l amazon-ssm-agent
# Configure AWS Systems Manager Agent software (Start Daemon awsagent)
if [ $(systemctl is-enabled amazon-ssm-agent) = "disabled" ]; then
systemctl enable amazon-ssm-agent
systemctl is-enabled amazon-ssm-agent
fi
--//--
Conclusion
There is no subtle way to automate this process. To edit the userdata, you must stop the EC2 instance, which causes a temporary interruption to traffic. It is also not a script you want to run for every boot as it can delay services. This means removing it after it has initially run – yet another stop / start cycle. While this process is cumbersome and highly manual, it serves a purpose where you have a handful of AMI’s not running the SSM Agent. However, the pay-off for the effort is well worth it. Once the instance is managed under Systems Manager, the world of possibilities opens.
As a managed instance:
- Remote Command becomes an available feature, allowing you to manage installations of other packages in bulk using SSM Automation Documents such as “AWS-ConfigureAWSPackage”
- Patch Management becomes easier using Patch Manager
- Simplification of remote access and security-groups – no need for inbound rules for SSH and RDP
- Managing the state of applications – super important for legacy apps and regulated apps alike
- Ability to centrally manage approved packages for self-service
- Change Manager can help with managing patching and scheduled maintenance windows
- Diagnose and Remediate issues with applications (a NEW feature)
- Explore and gain insight into your organization-wide instance using Explore Nodes (a NEW feature)
Useful Resources
Systems Manager is a highly under-rated service and due to a lack of awareness of its broad capabilities. The AWS DevOps Engineer – Professional certification covers Systems Manager in-depth, and it is also a core component for the AWS Security- Specialty certifications. There are many great resources out there for learning more about Systems Manager including these:
Getting Started with Systems Manager
Deep Dive into Systems Manager
Use Cases and Best Practices
https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-best-practices.html
Infrastructure Security Learning Plan
https://explore.skillbuilder.aws/learn/learning-plans/1812/plan
DevOps Engineer – Professional – Basic Learning Plan
https://explore.skillbuilder.aws/learn/public/learning_plan/view/2184/plan