In this lab, we will create a Kafka producer application using AWS Lambda, which sends fake taxi ride data into a Kafka topic on Amazon MSK. A configurable number of the producer Lambda function will be invoked by an Amazon EventBridge schedule rule. In this way we are able to generate test data concurrently based on the desired volume of messages.
Lab 1 Produce data to Kafka using Lambda (this post)
Lab 2 Write data to Kafka from S3 using Flink
Lab 3 Transform and write data to S3 from Kafka using Flink
Lab 4 Clean, Aggregate, and Enrich Events with Flink
Lab 5 Write data to DynamoDB using Kafka Connect
Lab 6 Consume data from Kafka using Lambda
Architecture
Fake taxi ride data is generated by multiple Kafka Lambda producer functions that are invoked by an EventBridge schedule rule. The schedule is set to run every minute and the associating rule has a configurable number (e.g. 5) of targets. Each target points to the same Lambda function. In this way we are able to generate test data using multiple Lambda functions based on the desired volume of messages.
Infrastructure
The infrastructure is created using Terraform and the source can be found in the GitHub repository of this post.
VPC and VPN
A VPC with 3 public and private subnets is created using the AWS VPC Terraform module (infra/vpc.tf). Also, a SoftEther VPN server is deployed in order to access the resources in the private subnets from the developer machine (infra/vpn.tf). The details about how to configure the VPN server can be found in this post.
MSK Cluster
An MSK cluster with 2 brokers is created. The broker nodes are deployed with the kafka.m5.large instance type in private subnets and IAM authentication is used for the client authentication method. Finally, additional server configurations are added such as enabling topic auto-creation/deletion, (default) number of partitions and default replication factor.
# infra/variables.tf |
Security Group
The security group of the MSK cluster allows all inbound traffic from itself and all outbound traffic into all IP addresses. These are necessary for Kafka connectors on MSK Connect that we will develop in later posts. Note that both the rules are too generous, however, we can limit the protocol and port ranges in production. Also, the security group has additional inbound rules that can be accessed on port 9098 from the security groups of the VPN server and Lambda producer function.
# infra/msk.tf |
Lambda Function
The Kafka producer Lambda function is deployed conditionally by a flag variable called producer_to_create. Once it is set to true, the function is created by the AWS Lambda Terraform module while referring to the associating configuration variables (local.producer.*).
# infra/variables.tf |
IAM Permission
The producer Lambda function needs permission to send messages to the Kafka topic. The following IAM policy is added to the Lambda function as illustrated in this AWS documentation.
# infra/producer.tf |
Lambda Security Group
We also need to add an outbound rule to the Lambda function’s security group so that it can access the MSK cluster.
# infra/producer.tf |
Function Source
The TaxiRide class generates one or more taxi ride records by the _create _ method where random records are populated by the random module. The Lambda function sends 100 records at a time followed by sleeping for 1 second. It repeats until it reaches MAX_RUN_SEC (e.g. 60) environment variable value. A Kafka message is made up of an ID as the key and a taxi ride record as the value. Both the key and value are serialised as JSON. Note that the stable version of the kafka-python package does not support the IAM authentication method. Therefore, we need to install the package from a forked repository as discussed in this GitHub issue.
# producer/app.py “-73.97333527,40.76407242”, “-73.99310303,40.75263214”, “-73.97010803,40.75979996”, “-73.99373627,40.74176025”, “-73.9697876,40.75758362”, “-73.99397278,40.74086761”, “-73.9595108,40.76280975”, “-73.99025726,40.73703384”, “-73.99046326,40.75100708”, “-73.9536438,40.77526093”, “-73.97222137,40.67683029”, “-73.98626709,40.73276901”, “-73.98240662,40.73148727”, “-73.98776245,40.75037384”, “-73.9921875,40.73451996”, “-73.98435974,40.74898529”, “-74.00798798,40.74022675”, “-73.99419403,40.74555969”, “-73.97693634,40.7599144”, “-73.99306488,40.73812866”, “-73.99037933,40.76152802”, “-73.98442078,40.74978638”, “-73.97813416,40.72935867”, “-73.97171021,40.75943375”, “-73.91600037,40.74634933”, “-73.99924469,40.72764587”, |
A sample taxi ride record is shown below.
{ |
EventBridge Rule
The AWS EventBridge Terraform module is used to create the EventBridge schedule rule and targets. Note that the rule named crons has a configurable number of targets (eg 5) and each target points to the same Lambda producer function. Therefore, we are able to generate test data using multiple Lambda functions based on the desired volume of messages.
# infra/producer.tf |
Deployment
The application can be deployed (as well as destroyed) using Terraform CLI as shown below. As the default value of producer_to_create is false, we need to set it to true in order to create the Lambda producer function.
# initialize |
Once deployed, we can see that the schedule rule has 5 targets of the same Lambda function among others.
Monitor Topic
A Kafka management app can be a good companion for development as it helps monitor and manage resources on an easy-to-use user interface. We’ll use Kpow Community Edition in this post, which allows you to link a single Kafka cluster, Kafka connect server and schema registry. Note that the community edition is valid for 12 months and the licence can be requested on this page. Once requested, the licence details will be emailed, and they can be added as an environment file (env_file).
The app needs additional configurations in environment variables because the Kafka cluster on Amazon MSK is authenticated by IAM – see this page for details. The bootstrap server address can be found on AWS Console or executing the following Terraform command.
$ terraform output -json | jq -r ‘.msk_bootstrap_brokers_sasl_iam.value’ |
Note that we need to specify the compose file name when starting it because the file name (compose-ui.yml) is different from the default file name (docker-compose.yml). We can run it by docker-compose -f compose-ui.yml up -d and access on a browser via localhost:3000.
# compose-ui.yml |
We can see the topic (taxi-rides) is created, and it has 5 partitions, which is the default number of partitions.
Also, we can inspect topic messages in the Data tab as shown below.
Summary
In this lab, we created a Kafka producer application using AWS Lambda, which sends fake taxi ride data into a Kafka topic on Amazon MSK. It was developed so that a configurable number of the producer Lambda function can be invoked by an Amazon EventBridge schedule rule. In this way, we are able to generate test data concurrently based on the desired volume of messages.