Well Architected Serverless – AWS Lambda PowerTools for Python

BLOG ARTICLE

CloudWatch Lambda Insights

Overview

The AWS Lambda PowerTools for Python is an open source framework library that assists building well-architected serverless applications. Amongst other great features, it implements a number of observability best practices across the three core observability pillars.

Logging

The Logger class provides a custom Python logger that outputs structured, json based, log entries. It support message strings and complex objects taking care of serializing the log output. In addition common use cases such as logging the Lambda event payload and capturing cold start information are handled for you including appending custom keys to the logger at runtime.

Metrics

The Metrics class makes collecting custom metrics from your application simple, without the need to make synchronous requests to external systems. This functionality is powered by Amazon CloudWatch Embedded Metric Format (EMF), which allows for capturing metrics asynchronously.

Tracing

The Tracer class provides a simple way to send traces from functions to AWS X-Ray to provide visibility into function calls, interactions with other AWS services, or external HTTP requests. Annotations easily can be added to traces to allow filtering traces based on key information.

In this post, we’ll take a sample serverless application and implement a number of observability best practices using the AWS Lambda PowerTools.

Getting Started

Clone the following GitHub repository containing an AWS SAM based sample serverless application.

$ git clone https://github.com/ScottScovell/well-architected-serverless.git
In the README you will find a walk-through of the sample serverless application and details on how to run locally before deploying to AWS.

You can either implement the changes below as you walk through this post, or checkout the completed feature branch part-2/aws-lambda-powertools and review the code before deploying into your AWS account.

Setting up AWS Lambda PowerTools for Python

First we need to add the AWS Lambda PowerTools dependency to the requirements.txt file for each of our functions.

boto3
aws-lambda-powertools

Next we add the imports for the Tracer, Logger, and Metrics components to our Python Lambda function.

# Import AWS Lambda Powertools
from aws_lambda_powertools import Tracer, Logger, Metrics

tracer = Tracer()
logger = Logger()
metrics = Metrics()

We can add decorators to our handler function to enrich the capabilities of these components. For example, to inject the event context into our log statements, we can use the following logger decorator

@logger.inject_lambda_context
def lambda_handler(message, context):

This configures AWS Lambda Power Tools to capture key fields from the Lambda context as well as Lambda runtime information such as cold starts, memory size etc.

Should we wish to log the incoming event as well, we can add the following parameter to the inject_lambda_context.

@logger.inject_lambda_context(log_event=True)
def lambda_handler(message, context):

We can also do this via an environment variable, POWERTOOLS_LOGGER_LOG_EVENT, as shown in the next step.

Lets add the decorator for metrics next:

@logger.inject_lambda_context
@metrics.log_metrics
def lambda_handler(message, context):

This decorator validates, serializes and flushes any metrics data logged in the app to stdout. When creating metrics we need to define a namespace. We can add this as a parameter to the decorator or via environment variable as shown in the step below.

Lastly, let’s add the decorator for tracing.

@logger.inject_lambda_context
@metrics.log_metrics
@tracer.capture_lambda_handler
def lambda_handler(message, context):

This decorator automatically captures any response or exception raised by the Lambda handler and writes it into the tracing metadata. It also setups the ColdStart annotation so filtering is made easier when troubleshooting.

We also need to grant Lambda permission to log traces to AWS X-Ray. We do this in SAM by setting Tracing to Active in each of our functions in our SAM template.yaml file. We can use the Globals section to apply the setting to every function defined in our application.

Globals:
Function:
Timeout: 30
Tracing: Active

While we are here, lets enable API tracing as well so we can capture end-to-end visibility of our service.

Globals:
Api:
TracingEnabled: true
Function:
Timeout: 30
Tracing: Active

We can also capture traces in other methods of our Lambda function using the following method decorator

@tracer.capture_method
def helper_Method(params):

We can turn tracing on or off using the POWERTOOLS_TRACE_DISABLED environment variable. Useful when running unit tests locally or as part of automated testing stages of CI pipeline.

To complete the setup we need to add a couple of environment variables to each of our Lambda functions. To do this consistently, we’ll add these to the Globals section of our SAM template.

Globals:
Api:
TracingEnabled: true
Function:
Timeout: 30
Tracing: Active
Environment:
Variables:
LOG_LEVEL: INFO
POWERTOOLS_SERVICE_NAME: well-architected-serverless
POWERTOOLS_LOGGER_LOG_EVENT: True
POWERTOOLS_METRICS_NAMESPACE: well-architected-serverless
VariableDescriptionSample value
LOG_LEVELSets how verbose Logger should be (INFO, by default)INFO
POWERTOOLS_SERVICE_NAMESets service key that will be present across all log statementswell-architected-serverless
POWERTOOLS_LOGGER_LOG_EVENTLog the incoming eventTrue
POWERTOOLS_METRICS_NAMESPACELogical container where all metrics will be placedwell-architected-serverless
POWERTOOLS_TRACE_DISABLEDDisable trace events globally0

With our setup complete, lets enhance our code to write out logs and metrics in a standardised, well-architected way.

Logging

Start by converting those print statements into logger calls using the built in log level convenience methods.

For example, simple log statements would look like this:

# Simple log statement
logger.info('## List Entities')

The resulting structure of the log entry would look like this

{
"level": "INFO",
"location": "lambda_handler:18",
"message": "## List Entities",
"timestamp": "2021-03-14 09:14:00,181",
"service": "well-architected-serverless",
"sampling_rate": 0,
"cold_start": true,
"function_name": "well-architected-serverless-ListEntitiesFunction-1EGOGZHDCJ8W9",
"function_memory_size": "128",
"function_arn": "arn:aws:lambda:us-east-1:XXXXXXXXXXXX:function:well-architected-serverless-ListEntitiesFunction-1EGOGZHDCJ8W9",
"function_request_id": "d4809e0b-26f7-4389-b14c-055f28340868",
"xray_trace_id": "1-604dd3d7-64dd892e0334dfe63a8693b7"
}

Example: Logging with a complex object

# Log using object structure
logger.info({
"environment": environment,
"region": region
})

..and the structure of the log entry with a complex object would look something like this

{
"level": "INFO",
"location": "lambda_handler:37",
"message": {
"environment": "false",
"region": "us-east-1"
},
"timestamp": "2021-03-14 09:14:00,182",
"service": "well-architected-serverless",
"sampling_rate": 0,
"cold_start": true,
"function_name": "well-architected-serverless-ListEntitiesFunction-1EGOGZHDCJ8W9",
"function_memory_size": "128",
"function_arn": "arn:aws:lambda:us-east-1:XXXXXXXXXXXX:function:well-architected-serverless-ListEntitiesFunction-1EGOGZHDCJ8W9",
"function_request_id": "d4809e0b-26f7-4389-b14c-055f28340868",
"xray_trace_id": "1-604dd3d7-64dd892e0334dfe63a8693b7"
}

 

Example: Logging exceptions

except Exception:
logger.exception('Caught exception')
raise

The structure of the log entry would look something like this

{
"level": "ERROR",
"location": "lambda_handler:70",
"message": "Caught exception",
"timestamp": "2021-03-14 09:14:02,203",
"service": "well-architected-serverless",
"sampling_rate": 0,
"cold_start": true,
"function_name": "well-architected-serverless-ListEntitiesFunction-1EGOGZHDCJ8W9",
"function_memory_size": "128",
"function_arn": "arn:aws:lambda:us-east-1:XXXXXXXXXXXX:function:well-architected-serverless-ListEntitiesFunction-1EGOGZHDCJ8W9",
"function_request_id": "d4809e0b-26f7-4389-b14c-055f28340868",
"exception": "Traceback (most recent call last):\n File \"/var/task/app.py\", line 56, in lambda_handler\n results = table.scan()\n File \"/var/task/boto3/resources/factory.py\", line 520, in do_action\n response = action(self, *args, **kwargs)\n File \"/var/task/boto3/resources/action.py\", line 83, in __call__\n response = getattr(parent.meta.client, operation_name)(*args, **params)\n File \"/var/task/botocore/client.py\", line 357, in _api_call\n return self._make_api_call(operation_name, kwargs)\n File \"/var/task/aws_xray_sdk/ext/botocore/patch.py\", line 38, in _xray_traced_botocore\n return xray_recorder.record_subsegment(\n File \"/var/task/aws_xray_sdk/core/recorder.py\", line 435, in record_subsegment\n return_value = wrapped(*args, **kwargs)\n File \"/var/task/botocore/client.py\", line 676, in _make_api_call\n raise error_class(parsed_response, operation_name)\nbotocore.errorfactory.ResourceNotFoundException: An error occurred (ResourceNotFoundException) when calling the Scan operation: Requested resource not found",
"xray_trace_id": "1-604dd3d7-64dd892e0334dfe63a8693b7"
}

Metrics

To demonstrate metrics, let’s add metrics to our create_entity and delete_entity functions. Metrics are added using the add_metric method. Custom metric dimensions can be added using the add_dimension method for metric aggregation.

For example:

# Create metric for successful entity being added
metrics.add_dimension(name="Region", value=region)
metrics.add_metric(name="EntityCreated", unit=MetricUnit.Count, value=1)

return {
   'statusCode': 201,
   'headers': {},
   'body': json.dumps({'Message': 'Entity created'})
}

The structure of the log entry will look something like this

{
  "_aws": {
    "Timestamp": 1615711712586,
    "CloudWatchMetrics": [
      {
        "Namespace": "well-architected-serverless",
        "Dimensions": [
            [
              "Region",
              "service"
            ]
         ],
         "Metrics": [
           {
             "Name": "EntityCreated",
             "Unit": "Count"
           }
         ]
      }
    ]
  },
  "Region": "us-east-1",
  "service": "well-architected-serverless",
  "EntityCreated": [
  1
]
}

Tracing

In addition to the automatic capture of exceptions and responses, we can manually add trace metadata and annotations.

To manually add trace metadata

# Trace DynamoDB results as custom metadata
tracer.put_metadata(key="dynamodb_results", value=results)

As mentioned above, exceptions and responses are captured in trace output by default. Should we wish to disable these we can provide additional params to the handler decorator

@tracer.capture_lambda_handler(capture_error=False)

or

@tracer.capture_lambda_handler(capture_response=False)

We can also disable tracing altogether (e.g. in production environments) using the POWERTOOLS_TRACE_DISABLED environment variable

Build and deploy the sample application

To build and deploy your application for the first time, run the following:

sam build --use-container
sam deploy --guided

Use the following responses during the guided deployment

Stack Name [basic-lambda-service]: well-architected-serverless
AWS Region [us-east-1]:
Parameter REGION [us-east-1]:
#Shows you resources changes to be deployed and require a 'Y' to initiate deploy
Confirm changes before deploy [y/N]: y
#SAM needs permission to be able to create roles to connect to the resources in your template
Allow SAM CLI IAM role creation [Y/n]: Y
ListEntitiesFunction may not have authorization defined, Is this okay? [y/N]: y
GetEntityFunction may not have authorization defined, Is this okay? [y/N]: y
CreateEntityFunction may not have authorization defined, Is this okay? [y/N]: y
DeleteEntityFunction may not have authorization defined, Is this okay? [y/N]: y
Save arguments to configuration file [Y/n]: Y
SAM configuration file [samconfig.toml]:
SAM configuration environment [default]:

Generate some API Requests

Locate and copy the API Gateway Endpoint URL from the output values displayed after deployment. Replace the url in the following commands with your endpoint url.

Run the following to generate some entities via the API:

# Create entities
for i in {1..20}; do
curl -X POST https://<your endpoint url>/Prod/entities -H "Content-Type: application/json" -d '{"id": "'$i'", "SomeKey":"SomeValue"}';
done

Check entities were created by running the following to list entities

# Get list of entities
curl https://<your endpoint url>/Prod/entities;

Now generate some GET requests for each of the entities we created

# Get each entity
for i in {1..20}; do
curl https://<your endpoint url>/Prod/entities/$i;
done

Lastly, generate some DELETE requests

# Clean up entities
for i in {1..20}; do
curl -X DELETE https://<your endpoint url>/Prod/entities/$i;
done

View Logs in CloudWatch

In the CloudWatch Console, navigate to Log Groups and search for our Lambda log groups using /aws/lambda/well-architected-serverless as the log group filter 

Select one and view a recent log streams to verify logs are being written and explore the json output structure being generated. If you set the POWERTOOLS_LOGGER_LOG_EVENT environment variable to True, you should see the event message passed by API Gateway that triggered our Lambda function.

By default, we’ll also be capturing the following Lambda context info:

  • cold_start
  • function_name
  • function_memory_size
  • function_arn
  • function_request_id


Because we also enabled Tracing in API Gateway, we’ll see our X-Ray context trace id.

In our create_entity and delete_entity functions we are emitting metric events for each entity we create or delete. In the CloudWatch logs for those functions we’ll also see structured metric log events for these.

To view metrics in the CloudWatch Console, navigate to Metrics and select the metrics we defined under the well-architected-serverless custom namespace.

View Traces in AWS X-Ray

In the Console, navigate to AWS X-Ray and view the Service Map using the menu item on the left hand side.

In the Service Map, we’ll see a graphical representation of requests made to our application arranged visually by Health or Traffic depending on the dimension selected.

Tip: Be sure to check the relative time window at the top right hand side. Currently the max time window is last 6 hours.

Here we can see the request hitting our API Gateway endpoint, where it is assigned a trace id that is passed downstream as the unique identifier of the trace context. We can see our Lambda functions (nodes) processing those requests with visual representation of its size indicating traffic volumes. The Health dimension will resize the nodes by their health status. You’ll also notice a node representing our DynamoDB table and the corresponding API traffic from our Lambda functions.

Upon selecting a node, we can view aggregate statistics for that node and handy links to view trace logs and CloudWatch logs. Selecting View Traces and viewing one of the captured traces shows the end-to-end trace of that request through our application.

From this view, we can see the type, status, and total duration of the end-to-end request. We can drill down into our Lambda function and gain insight into where the time is being spent in our function. Is there a big overhead in start-up and invocation of our Lambda or is it latency of downstream services like DynamoDB?. Was there a cold start or not? X-Ray provides this level of granularity for us with very little effort as outlined above.

Selecting the Raw data tab at the top will show the raw data collected in the trace context as it passes through our application stack. In our scenario above we were tracing the DynamoDB responses in our Lambda function which we can inspect in the trace context.

AWS X-Ray also allows us to view the trace context at each of the nodes in our stack (referred to as subsegments) by selecting that node or row in the table below.

CONCLUSION

In this post we covered installing and setting up the AWS Lambda PowerTools for Python to provide an easy to use framework that implements observability best practices. We applied the framework to a sample serverless application and deployed it to AWS using AWS SAM. Following patterns like these will not only improve the observability of your serverless applications, it will provide a consistent approach across multiple projects making maintenance and operability of your serverless application that much easier…future you might even thankyou for it 🙂