ECS AUTOSCALING
It is quite hard to come up with efficient scaling policies for Amazon Elastic Container Services (ECS). The more distributed your architecture, the more issues with cascading load and increasing latency you are going to face.
But fear not, the promised salvation in the form of autoscaling for your services is here to save the day and distribute your computing load evenly across your micro services. So let’s examine what we have to work with to achieve that.
SCALING SERVICES
Autoscaling of ECS services is implemented as an automated action executed upon an event: scale in or scale out. The source of such event can be an alarm with either a StepScaling
type of policy or a TargetTrackingScaling
type. Usage of target tracking is very similar to the implementation for DynamoDB, with the options of ECSServiceAverageCPUUtilization
and ECSServiceAverageMemoryUtilization
metrics available for tracking.
Notice that ECS can track only average metrics of the service, so you need to make sure that tasks have load distributed evenly on the load balancer. Significant gaps between maximum and average consumption can lead to a termination of a task due to “out of memory” or out of CPU credits, and lead to 502 errors.

SCALING SERVICES
ECS scaling policies can be combined to produce even greater efficiency in load distribution. Usage of the StepScaling
policies can handle scale out events on Application Load Balancers (ALBs) or SQS metrics by estimating the load in the input source. ALB’s Target Group metrics such as AWS/ApplicationELB/RequestCountPerTarget
are a good baseline to start policies. Size of the SQS queue is another example of a deterministic metric to estimate incoming load for service.
A combination of StepScaling
and TargetTrackingScaling
looking at ECSServiceAverageCPUUtilization
or ECSServiceAverageMemoryUtilization
can allow greater flexibility in how your service can react on load. If it is possible to determine whether the service in question is mostly CPU or memory bound, then selection of a threshold for one of these average metrics should be pretty easy by observing the service under generated test load.

CLOUDFORMATION SUPPORT FOR ECS SCALING
To define an ECS service with scaling policies in CloudFormation you need to have a cluster, instance role for EC2 hosts, and other essentials omitted from this example.
First we need a service role to perform scaling actions on our behalf.
ScalingRole:
Type: AWS::IAM::Role
Properties:
RoleName: ScalingRole
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service:
- application-autoscaling.amazonaws.com
Action:
- sts:AssumeRole
ScalingRolePolicy:
Type: AWS::IAM::Policy
Properties:
Roles:
- !Ref ScalingRole
PolicyName: ScalingRolePolicyPolicy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- application-autoscaling:*
- ecs:RunTask
- ecs:UpdateSerice
- ecs:DescribeServices
- cloudwatch:PutMetricAlarm
- cloudwatch:DescribeAlarms
- cloudwatch:GetMetricStatistics
- cloudwatch:SetAlarmState
- cloudwatch:DeleteAlarms
Resource: '*'
Now we’re going to have a look at a service definition, its target group for ALB, scaling targets and policies, and a CloudWatch alarm. For this example we are going to define ExampleCPUAutoScalingPolicy
for a new capacity to grow to a value so that current usage ECSServiceAverageCPUUtilization
accounts for 50% and ExampleRequestsAutoScalingPolicy
when we have more than 1000 requests per target within a minute.
ExampleTargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Port: 80
Protocol: HTTP
VpcId: !Ref VpcId
HealthCheckIntervalSeconds: 30
HealthCheckPath: /status
HealthCheckTimeoutSeconds: 15
HealthyThresholdCount: 2
UnhealthyThresholdCount: 6
Matcher:
HttpCode: 200
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: 30
ExampleService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: !Ref ExampleTask # omitted
PlacementStrategies:
- Field: attribute:ecs.availability-zone
Type: spread
DesiredCount: 1
Cluster: ExampleCluster # omitted
LoadBalancers:
- TargetGroupArn: !Ref ExampleTargetGroup
ContainerPort: 8080
ContainerName: example-service
ExampleAutoScalingTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MaxCapacity: !Ref MaxServicesCount # parameters
MinCapacity: !Ref MinServicesCount
ResourceId:
Fn::Sub:
- service/ExampleCluster/${ServiceName}
- ServiceName: !GetAtt ExampleService.Name
RoleARN: !GetAtt ScalingRole.Arn
ScalableDimension: ecs:service:DesiredCount
ServiceNamespace: ecs
ExampleCPUAutoScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: ExampleCPUAutoScalingPolicy
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref ExampleAutoScalingTarget
TargetTrackingScalingPolicyConfiguration:
DisableScaleIn: True
TargetValue: 50
ScaleInCooldown: 60
ScaleOutCooldown: 60
PredefinedMetricSpecification:
PredefinedMetricType: ECSServiceAverageCPUUtilization
ExampleRequestsAutoScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: ExampleRequestsAutoScalingPolicy
PolicyType: StepScaling
ScalingTargetId: !Ref ExampleAutoScalingTarget
ScalableDimension: ecs:service:DesiredCount
ServiceNamespace: ecs
StepScalingPolicyConfiguration:
AdjustmentType: ChangeInCapacity
Cooldown: 60
MetricAggregationType: Average
StepAdjustments:
- MetricIntervalLowerBound: 0
ScalingAdjustment: 1
- MetricIntervalUpperBound: 0
ScalingAdjustment: -1
ExampleRequestsAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: RequestCountPerTarget
Namespace: AWS/ApplicationELB
Statistic: Sum
Period: 60
EvaluationPeriods: 1
Threshold: 1000
AlarmActions:
- !Ref ExampleRequestsAutoScalingPolicy
OKActions:
- !Ref ExampleRequestsAutoScalingPolicy
Dimensions:
- Name: TargetGroup
Value: !GetAtt ExampleTargetGroup.TargetGroupFullName
ComparisonOperator: GreaterThanOrEqualToThreshold
Notice that the parameters section of the ExampleCPUAutoScalingPolicy
resource contains DisableScaleIn: true
for a specific reason. In order to guarantee that requests for target scaling events have priority over target tracking, the scale in logic of tracking can be disabled completely.
STABILITY, STABILITY IS THE KEY
Ok, so now we have the service scaling up and down based on the number of requests per target in Elastic Load Balancers (ELB). However, you will notice that the threshold in StepAdjustments
for scale up starts right after scale down. It means that your service’s desired count would oscillate around some value, going up and down with new tasks spun up.
To allow for a window of stability, you need to have a range with ScalingAdjustment: 0
, whereby you have a boundary to increase and decrease desired count. That way it is possible to have an alarm to alert on the scale in boundary, and StepAdjustments
to interpret the range.
Lets see an example, where we want to scale out on more than RequestsScaleOutThreshold
requests per target, and scale in on less than RequestsScaleInThreshold
:
ExampleRequestsAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: RequestCountPerTarget
Namespace: AWS/ApplicationELB
Statistic: Sum
Period: 60
EvaluationPeriods: 1
Threshold: 500 # scale in boundary to trigger the alarm
AlarmActions:
- !Ref ExampleRequestsAutoScalingPolicy
Dimensions:
- Name: TargetGroup
Value: !GetAtt ExampleTargetGroup.TargetGroupFullName
ComparisonOperator: GreaterThanOrEqualToThreshold
ExampleRequestsAutoScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: ExampleRequestsAutoScalingPolicy
PolicyType: StepScaling
ScalingTargetId: !Ref ExampleAutoScalingTarget
ScalableDimension: ecs:service:DesiredCount
ServiceNamespace: ecs
StepScalingPolicyConfiguration:
AdjustmentType: ChangeInCapacity
Cooldown: 60
MetricAggregationType: Average
StepAdjustments:
- MetricIntervalLowerBound: !Ref RequestsScaleOutThreshold
ScalingAdjustment: 1
- MetricIntervalLowerBound: !Ref RequestsScaleInThreshold
MetricIntervalUpperBound: !Ref RequestsScaleOutThreshold
ScalingAdjustment: 0
- MetricIntervalUpperBound: !Ref RequestsScaleInThreshold
ScalingAdjustment: -1
MetricIntervalLowerBound=RequestsScaleInThreshold
and MetricIntervalUpperBound=RequestsScaleOutThreshold
where ScalingAdjustment=0
and no changes are made to desired count. This will ensure that oscillation of desired count does not happen to you ONLY ONE? TAKE TWO!
Another approach would be to define alarms, one to scale out and one to scale in. Each would have a specific range and specific policy associated. Such approach in fact is used quite a lot, but the problem is that CloudWatch alarms are not free, in fact they are pretty expensive.
Further reading and additional details are found here: AWS::ApplicationAutoScaling::ScalableTarget and AWS::CloudWatch::Alarm