In the previous article, we talked about the reasons for using a Vector database and why we should deploy it in a private and controlled environment, showed a high-level diagram for the solution, and provided a small introduction of the AWS CDK toolkit. If you missed it or the “Exploring The Power of Vector Databases” articles please follow the links and check them out.
Now it’s time to go deep into the technical part of the solution and build the code that will create the complete solution, so follow me on this short journey into the world of Infrastructure-as-code using AWS CDK.
Starting with the basic resources: the VPC
On AWS program libraries, the Virtual Private Cloud resources are part of the EC2 services, and it’s not different on the CDK. We will use the L2 construct vpc, which is part of the cdk.aws_ec2 library, and therefore we will add the library at the import section of the stack code:
from aws_cdk import ( |
For every new resource we use, we will be required to add the correspondent library to the import statement. If you notice that VSCode doesn’t “colour” the code properly during the editing process, check if there is any missing library in this section.
Next, we have the VPC resources for a basic but fully functional cloud network environment. All resources are created in the class VectorDbInfraStack(Stack) class and after the __init__ function:
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None: |
Bringing the resources created during the code’s deployment, this is what we will have:
Note that we didn’t have to define several of the resources created by the CDK construct, and it “fills the gaps” of our code with generic values. Even the network’s CIDR is provided during the deployment process, if not defined by the developer, but that doesn’t prevent you from defining your desired IP address range if using the generic 10.0.0.0/16 is not acceptable.
We will need a Security Group to associate most of the resources and allow access by adding the required rules. In a complete solution, we would want to segregate the services by security group and add rules to provide access between them. But in this example, we will create a single security group to associate the resources running on the private subnet, then another one to allow clients running on the public subnet to reach the API Gateway. As long we don’t mix up the SGs everything should work as expected:
# Security Group for all services |
Let’s finish the VPC resources section by adding the VPC endpoint for the API Gateway, which will allow the communication between the service and the resources running within the VPC:
# VPC Endpoint for the API Gateway Self, “VectorDbVpcEndpoint”, subnets=ec2.SubnetSelection( subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS), vectorDbNlbSecurityGroup, vectorDbApiGatewaySecurityGroup ]) |
The diagram with the security groups added is the following:
The Vector Database container setup
For this solution, we will use the ChromaDB vector database service, which will be serving embedding collections behind our API Gateway as suggested in the service’s deployment documentation. The use case for a Vector Database is covered in another article, so our focus is to deliver the infrastructure that will support this service.
ChromaDB can be executed as a Server application and has a Docker image available that makes the whole process easy to execute. On AWS, there are several options to execute a container in stand-alone mode, which ranges from launching an EC2 with Docker service installed, using Lightsail containers, deploying an AppRunner application, or building a more complex infrastructure with the Elastic Container Services, and finally, for those with k8s knowledge, building a complete Elastic Kubernetes Service. Each service provides a level of freedom and flexibility on par with the price and management effort but in the end, all of them will allow our “dockerised” application image to run in a container and be accessible through the network.
We will use the first option for now, so the next step in our stack code is to define a bootstrap process to bring the EC2 instance running Amazon Linux 2023 up, install the docker service, and build the docker image using the “docker-compose” command.
But let’s stop for a second now to discuss data persistence. We don’t want our database to be lost if something happens to the docker container or even with the EC2 instance, so we will use an Elastic File System to store the SQLite database generated by the ChromaDB service.
First, let’s add the efs and iam resources from the cdk library in the code’s import section:
from aws_cdk import ( |
Then we can add the EFS resource and the IAM policy to allow the clients to mount the filesystem:
# Adds an EFS filesystem to persist the Vector database data |
Now if something happens with the container we can just start a new one using the same database files from the last execution, and we can even schedule backups of the files.
Now we can proceed with the docker-compose definition. Since it uses YAML to define the compose file structure, I opted to have the whole definition in a Dictionary and then use the PyYAML library to generate the string that we will add to the EC2 user data, avoiding any indentation issues.
First, install the PyYAML library in our virtual environment using the terminal session:
(.venv) ~/Projects/apigatewayapp$ pip install pyyaml |
It’s better to add the new library to our requirements.txt file, so we don’t need to remember to install it in case we push the code to a git repository and use it again in the future:
$ pip freeze | grep -i pyyaml | tee -a requirements.txt |
Going back to the stack code, we will add a import command in the top of the file to have the library resources available in our template:
from aws_cdk import ( |
We will then reference the EFS filesystem ID in the docker-compose volume definition, and for that, we need to have the EFS resource created first, then define the dictionary with the compose commands after the EFS code:
# Docker compose YAML |
During the instance bootstrap, docker-compose will download the image ghcr.io/chroma-core/chroma:0.4.13 from the docker repository, expose the port 8000 of the container on the same port of the host, and map the EFS filesystem using the NFS protocol with all the options provided in the “o:” parameter of the chromadb volume. This was really tricky to get right but works quite well.
Now that we have the container creation figured out, let’s add the EC2 user data that will install docker, download the docker-compose program, enable the service and bring the container up:
# EC2 Instance User Data |
Just ensure that the line with the code “cat >> /home/ec2-user/docker-compose.yml << EOF\n”+yaml.safe_dump(composeFile)+“EOF” has no line break, since this is a single command and will create the docker-compose.yml file with the content from the composeFile dictionary using the yaml.safe_dump function.
With the user data done, we can finally have an EC2 instance:
# EC2 instance with docker services |
Deploying the stack at this point will launch a t3a.small instance with 15GB of EBS volume in the private subnet, with a docker container listening at the TCP port 8000 and mounting an EFS filesystem using NFS:
Enabling service access with a Network Load Balancer
One of the requirements of a private API Gateway is to have a network load balancer as the target for the VPC link that will be used in the private integration. In the next steps, we will create the load balancer, do a little trick to change the associated security group, add a target group, and finally bring a listener up to accept connections. Let’s start by expanding the required imports with all remaining resources:
from aws_cdk import ( |
Defining the private NLB is quite straightforward, requiring only to set the internet_facing as false and the subnet as PRIVATE_WITH_EGRESS:
# Private Network Load Balancer |
Notice that there is no option to choose a Security Group in the NLB definition, but we definitely need to use the same as the EC2 instance running the service and also the API gateway VPC link. There is a workaround for that as below:
# Workaround to replace the default NLB security group: |
The NLB target is our EC2 instance, hence we have do create a target definition:
# Private NLB EC2 target definition |
Next, we have the target group definition, where we define the TCP port at the target where the service is listening to connections, the type and destination of the target, and the health check duration and timeout:
# Private NLB Target Group |
The last step is to create the listener and define to which target group it will forward the requests:
# Private NLB Listener |
At this point, it should be already possible to connect to the ChromaDB through the NLB address if we deploy the stack:
Creating the gateway for your API requests
Adding an API Gateway to the cloud infrastructure is not a simple task, requiring several steps. Using the Private option adds more steps to the process, so let’s tackle each of them at a time.
The first step is to define the VPC Link and associate it with the NLB created previously:
# API Gateway VPC Link |
The API Gateway definition is not complex, but we need to ensure that the resource policy is properly defined since it’s what will prevent external access from anywhere outside the VPC to reaching the service, keeping it private:
# API Gateway definition |
Next, we can add the resource to the API Gateway. We will use the “greedy” path {proxy+}, meaning that any path used in the REST request will be forwarded to the integration. We will explain this further when we execute a connection request test to the API Gateway:
# Adds resource to the API Gateway |
Then there is the integration, which creates the connection to the application server running on the VPC. We are using ‘ANY’ as a method since the destination will always be the same for all REST methods, we have the request parameters to pass the {proxy} variable to the integration destination, and the target of the integration which is the DNS name of our newly-created NLB concatenated with the TCP port, the ‘/api/’ string and whatever is sent in the request method as path. Since the ChromaDB expects the incoming requests to use the path ‘/api/v1/’ plus any other string like ‘collections’, we deployed our stage as ‘api’ to fulfil the first part of the path, then we use ‘{proxy}’ at the integration URL to comply with what the server expects:
# Adds an integration associated with the VPC link, pointing to the internal NLB at port 8000 and /api/{proxy+} path |
We have the integration ready so now we can add the method in our API Gateway to forward the requests to the integration:
# Adds a method to the API Gateway resource, pointing to the integration |
And the last step is to create the stage deployment, which we will call ‘api’:
# Creates a stage for the API gateway |
To complete our stack, we will output the URL of our API Gateway so other stacks can reference the value if required:
# Output the api stage URL |
And now we have the whole solution available in our diagram:
Deploying and testing the solution
The CDK stack deployment is simply running the “cdk deploy” command, but before doing that let’s check some requirements. The first thing you need to do is check the AWS credentials that will be used to deploy the solution, and running the command below will show that is currently configured:
$ aws configure list |
In my environment, I use SSO credentials, which means I will have to pass the profile I’ve created to access my AWS account as a parameter when running the CDK command.
If this is the first time a CDK stack is being deployed on your AWS account, you will need to do the bootstrapping process before you can go forward. This will launch the “CDKToolkit” stack which will create the roles and resources required by the tool, so if you aren’t sure if your environment was bootstrapped or not just list for this stack using the aws cloudformation command:
$ aws cloudformation list-stacks –query ‘StackSummaries[?StackName==`CDKToolkit`]’ –profile myAwsProfile |
If the command returns []it means that the account needs to be bootstrapped:
$ cdk bootstrap –profile myAwsProfile |
With the bootstrapping part sorted out, we can go ahead and finally deploy the solution:
(.venv)$ cdk deploy –profile myAwsProfile |
The tool will display all the sensitive changes being made in the AWS account during the deployment:
To proceed with the deployment just press ‘y’, then monitor the stack execution:
Do you wish to deploy these changes (y/n)? y |
After the deployment, you can test the API Gateway functionality by using the following steps:
1 – Fetch the REST API ID:
$ aws apigateway get-rest-apis –query ‘items[?name==`VectorDbApiGw`].id’ –profile myAwsProfile |
2 – Use the API Id to fetch the resource ID:
$ aws apigateway get-resources –rest-api-id 1234567890 –query ‘items[?path==`”/{proxy+}”`].id’ –profile myApiProfile |
3 – Using both the API and Resource ID, invoke the API method test:
$ aws apigateway test-invoke-method –rest-api-id 1234567890 –resource-id abcdef –http-method GET –path-with-query-string v1/collections –profile myApiProfile |
Success! We have the status code 200 and the response’s body [], meaning that the application service is reachable and is responding correctly.
We can also do a more comprehensive test by simulating the client application. For that, just follow the steps described below:
1 – Launch an EC2 instance of any size (t2.small is enough) in the Public Subnet, associated with the public security group (the one with “VectorDbApiGatewaySecurityGroup” string in the name).
2 – Add a new rule to the public security group to allow SSH connections from your local IP, as described in this AWS documentation. We don’t recommend opening SSH access from the public subnet’s security group to the world (0.0.0.0/0) due to security reasons, but if you know what you are doing it’s also an option.
3 – Connect into the instance using SSH with your key pair, or using the aws ec2-instance-connect command like in the example below:
$ aws ec2-instance-connect ssh –instance-id i-0a1b2c3d4e5f6g –profile cevo-dev |
4 – Create the virtual environment and install the required Pip libraries:
[ec2-user@ip-xx-xx-xx-xx ~]$ python3 -m venv .venv […] |
5 – Open the Python3 REPL session:
(.venv) [ec2-user@ip-xx-xx-xx-xx ~]$ python3 |
6 – Import the chromadb library and create the client session. We will need the API Gateway URL provided in the stack output as VectorDbInfraStack.VectorDbApiGwStageUrl without the “https://” and “/api/” strings:
>>> import chromadb |
7 – In a new deployment, we don’t expect to see any collections created, but let’s run the command to list them:
>>> chroma_client.list_collections() |
8 – We can now follow the instructions from the ChromaDB documentation and create a collection, then add some data and test the results:
>>> collection = chroma_client.create_collection(name=“my_collection”) |
9 – Adding the first data to the collection will download additional models from ONNX required by Chroma:
>>> collection.add( |
10 – And finally, let’s query the database for a document:
>>> collection.query( |
And this is it! Our solution works and we can use the Vector Database to create collections, add data and execute queries. If you are curious about how the API Gateway is handling the requests, just open the AWS console, go to the VectorDbApiGw REST API, and check the Dashboard for all the metrics:
Removing the solution and cleaning up the account
To remove the deployed solution, you just need to run the cdk destroy command from the stack directory, but don’t forget to terminate the client instance launched in the public subnet before deleting the stack:
(.venv)$ cdk destroy –profile cevo-dev |
Notice that the EFS filesystem won’t be removed by the cdk destroy command, so you will need to delete it manually. First, we need to get the filesystem ID from the environment:
$ aws efs describe-file-systems –query ‘FileSystems[?Name==`vectordb_filesystem`].FileSystemId’ –profile myAwsProfile |
Then we can delete the filesystem:
$ aws efs delete-file-system –file-system-id fs-061120167361d4076 –profile myAwsProfile |
You may also have to delete all Cloudwatch Logs related to the VectorDbInfraStack. You can do this easily through the AWS Console:
Conclusion
That’s it! I hope you enjoyed the article and learned a little bit about CDK, private API Gateways and Vector Databases. Cheers!