A beginner’s guide to AWS Aurora Data API

Introduction

The AWS Aurora RDS Data API allows seamless interaction with Aurora databases without requiring a persistent database connection. Instead, it utilises a secure HTTP endpoint, fully integrated with the AWS SDK, to execute SQL queries. To enhance security and simplify credential management, your code should securely reference database credentials stored in AWS Secrets Manager. 

The Data API is an optional feature that must be explicitly enabled before use. Refer to this documentation for the supported AWS Regions and Aurora versions. Once enabled, you can leverage the Aurora Query Editor in the AWS Console to execute SQL statements directly on your database – eliminating the need for complex network setups, such as secure tunnels through an EC2 instance in a private network. 

However, the Data API comes with certain limitations. Be sure to review them in this documentation to determine if it meets your use case. 

Why use Aurora Data API?

When working with serverless compute, a traditional database connection pool can be quickly exhausted due to the ephemeral nature of serverless environments like AWS Lambda. While this issue can be mitigated using Amazon RDS Proxy, AWS Aurora Data API remains a great alternative, offering a connectionless, scalable, and IAM-secured approach to interacting with Aurora databases. 

Designed for serverless and API-driven applications, the Data API eliminates the need to manage persistent database connections, reducing overhead and simplifying architecture. It leverages IAM-based authentication instead of traditional credentials, enhancing security while integrating seamlessly with AWS services such as Lambda, EC2, ECS/EKS, API Gateway, and Step Functions to name a few. This makes Data API an efficient and secure choice for handling database interactions in a modern, cloud-native ecosystem. 

Overview

Our goal is to set up an Aurora RDS cluster with the Data API enabled, along with two Lambda functions that can interact with the Data API endpoint. The purpose of having two Lambda functions is to demonstrate the two different access patterns available for the RDS Data API.  

  1. Accessing RDS Data API from the internet if you need quick, global access to RDS Data API that can be secured with strong IAM policies. 
    1. This Lambda function is deployed in a private subnet within a VPC.  
    2. To reach the RDS Data API endpoint, it routes traffic through a NAT Gateway to access the internet. 
  2. Accessing RDS Data API within AWS Network if you prioritise security, performance, and want to avoid internet exposure. 
    1. This Lambda function is deployed in an isolated subnet, which has no direct internet access.  
    2. Instead of routing traffic through a NAT Gateway, it leverages AWS PrivateLink via a VPC endpoint to securely connect to the RDS Data API. This eliminates the need for public internet exposure while maintaining a high-performance, low-latency connection within your VPC. 

Building a Serverless REST API with AWS Aurora Data API

In this exercise, we will be using the AWS CDK to provision all the required services. You can find the AWS CDK source code of this blog in this Github URL https://github.com/cevoaustralia/serverless-rest-api-with-rds-data-api/. Using AWS CDK, we can fully automate the deployment of a serverless REST API by provisioning and configuring all required AWS services in just a few steps. 

This architecture consists of core services VPC, Aurora Serverless (MySQL/PostgreSQL), AWS Lambda (Node.js), Amazon API Gateway, and IAM Roles for secure database access. By leveraging infrastructure as code, we eliminate manual setup, ensuring a scalable, repeatable, and reliable deployment process. If you are new to AWS CDK and Infrastructure as Code (IaC), please refer to this official guide from AWS

Below is the full architecture diagram of what we’ll be working with. 

Launching the resources using AWS CDK

Once you have downloaded the CDK source code and installed the required packages, you’re now ready to deploy the infrastructure. 

Run the following codes to kick-off the CDK Deployment:

				
					$ cdk bootstrap                         
 ⏳  Bootstrapping environment aws://1234567890/us-east-1... 
Trusted accounts for deployment: (none) 
Trusted accounts for lookup: (none) 
Using default execution policy of 'arn:aws:iam::aws:policy/AdministratorAccess'. Pass '--cloudformation-execution-policies' to customize. 
CDKToolkit: creating CloudFormation changeset... 
 ✅  Environment aws://1234567890/us-east-1 bootstrapped. 
 
$ cdk deploy --all --require-approval never
✨  Synthesis time: 11.52s 
RdsDataApiStack: start: Building 0da69a4c4bda72a9ff9689b56c8a05acb30f38ed0802bd2d0d719731505fd2c9:current_account-current_region 
RdsDataApiStack: success: Built 0da69a4c4bda72a9ff9689b56c8a05acb30f38ed0802bd2d0d719731505fd2c9:current_account-current_region 
RdsDataApiStack: start: Publishing 0da69a4c4bda72a9ff9689b56c8a05acb30f38ed0802bd2d0d719731505fd2c9:current_account-current_region 
RdsDataApiStack: success: Published 0da69a4c4bda72a9ff9689b56c8a05acb30f38ed0802bd2d0d719731505fd2c9:current_account-current_region 
RdsDataApiStack: deploying... [1/1] 
RdsDataApiStack: creating CloudFormation changeset... 
 ✅  RdsDataApiStack 
✨  Deployment time: 46.82s 
				
			

The first step is it will set up Aurora Serverless with Data API. AWS CDK automates the creation of an Aurora Serverless Cluster with Data API enabled, allowing seamless database connectivity without persistent connections. To enhance security, database credentials are stored in AWS Secrets Manager, eliminating the need for hardcoded credentials. 

Next, it deploys the Lambda functions that queries Aurora. Each Lambda function is assigned an IAM Role, granting it the necessary permissions to securely access the RDS Data API. By using IAM-based authentication, we remove the need for managing database usernames and passwords manually, making the architecture more secure and scalable. 

Finally, AWS CDK provisions Amazon API Gateway to expose the Lambda function as a REST endpoints (/nat and /private-link). API Gateway is linked to the Lambda function, and the API is deployed automatically. With this setup, serverless applications can securely interact with Aurora using HTTP-based requests, all while benefiting from fully managed, automated infrastructure provisioning. 

Reviewing the core services

RDS Aurora with RDS Data API Enabled

RDS Data API Settings

The Lambda functions DataApiNatFunction which connects to the RDS Data API via NAT Gateway and DataApiPrivateLinkFunction which connects to the RDS Data API via AWS PrivateLink. 

And lastly, the API Gateway with endpoints pointing to the Lambdas. 

Now we can start preparing for the API endpoint testing. 

Testing

Before we can send a request to our API Gateway, we first need to retrieve its Invoke URL. To do this, open the API Gateway service in the AWS Console, navigate to Stages from the left-hand menu, and select the “prod” stage. Copy the Invoke URL (https://pcy01u1338.execute-api.us-east-1.amazonaws.com/prod), as it will be required to make API requests. 

Next, we need to obtain the API key to authenticate our request. In the API Gateway page, go to API Keys from the left-hand menu. You’ll see a list of available API keys—find ”DataAPIKey” and click the copy button to retrieve its value (iL4vMWUK4R8hdutx1V97U3aFJce07Q8u1wS8eNOOiL4vMWUK4R8hdutx1V97U3aFJce07Q8u1wS8eNOO). This key will be included in our request headers for authentication. 

With both the Invoke URL and API key ready, we can now execute the API request. Below is the complete curl command to run from your command line terminal. Our Lambda function executes a simple SQL statement, SHOW DATABASES;, which will return a list of available databases in the response. 

				
					$ curl -X GET "https://pcy01u1338.execute-api.us-east-1.amazonaws.com/prod/nat" \                                                                     
     -H "x-api-key: iL4vMWUK4R8hdutx1V97U3aFJce07Q8u1wS8eNOO"
{
  "message": "List Databases",
  "response": {
    "$metadata": {
      "httpStatusCode": 200,
      "requestId": "c0595447-4dcb-4ad5-96f5-c02cc69ea57e",
      "attempts": 1,
      "totalRetryDelay": 0
    },
    "numberOfRecordsUpdated": 0,
    "records": [
      [
        {
          "stringValue": "information_schema"
        }
      ],
      [
        {
          "stringValue": "mysql"
        }
      ],
      [
        {
          "stringValue": "performance_schema"
        }
      ],
      [
        {
          "stringValue": "sys"
        }
      ]
    ]
  }
}

$ curl -X GET "https://pcy01u1338.execute-api.us-east-1.amazonaws.com/prod/private-link" \                                                                     
     -H "x-api-key: iL4vMWUK4R8hdutx1V97U3aFJce07Q8u1wS8eNOO"
{
  "message": "List Databases",
  "response": {
    "$metadata": {
      "httpStatusCode": 200,
      "requestId": "c0595447-4dcb-4ad5-96f5-c02cc69ea57e",
      "attempts": 1,
      "totalRetryDelay": 0
    },
    "numberOfRecordsUpdated": 0,
    "records": [
      [
        {
          "stringValue": "information_schema"
        }
      ],
      [
        {
          "stringValue": "mysql"
        }
      ],
      [
        {
          "stringValue": "performance_schema"
        }
      ],
      [
        {
          "stringValue": "sys"
        }
      ]
    ]
  }
}

				
			

📝 Note that you might encounter a DatabaseResumingException error when you first hit the endpoints or when it stays in idle for a particular time. That’s because in our CDK code for the RDS Aurora Serverless, the serverlessV2MinCapacity property is set to zero and that means the server will enter a pause state if there are no activities. In your production workload, it depends on the workload you are running but it is set to a valuation that’s more than zero. Learn more about the database capacity here. 

Cleanup

To avoid unnecessary costs, ensure that you delete any unused resources once you’re done testing. You can do this easily by running the cdk destroy in your command terminal. 

Conclusion

AWS Aurora Data API simplifies database access for serverless applications by eliminating the need for persistent connections. By integrating it with AWS Lambda and API Gateway, you can build highly scalable and efficient APIs with minimal operational overhead. 

What's next?

In my example, I used the npm package @aws-sdk/client-rds-data to interact with the RDS Data API since our use case is quite basic. However, for real-world applications, you may need a more sophisticated database abstraction layer, such as an ORM or a query builder, to manage complex queries efficiently. Since RDS Data API is relatively new, there are only a few libraries that support it, such as Drizzle ORM and Kysely, that simplifies database interactions. 

At Cevo, our team of experts can help modernise and uplift your business through cutting-edge solutions in Migration, Modernisation, Data and AI/ML, ensuring your applications are built for scalability and long-term success. 

Enjoyed this blog?

Share it with your network!

Move faster with confidence