Deploying an Open-Source Vector Database on AWS – Part 2

In the previous article, we talked about the reasons for using a Vector database and why we should deploy it in a private and controlled environment, showed a high-level diagram for the solution, and provided a small introduction of the AWS CDK toolkit. If you missed it or the “Exploring The Power of Vector Databases” articles please follow the links and check them out.

Now it’s time to go deep into the technical part of the solution and build the code that will create the complete solution, so follow me on this short journey into the world of Infrastructure-as-code using AWS CDK.

Starting with the basic resources: the VPC

On AWS program libraries, the Virtual Private Cloud resources are part of the EC2 services, and it’s not different on the CDK. We will use the L2 construct vpc, which is part of the cdk.aws_ec2 library, and therefore we will add the library at the import section of the stack code:

 

from aws_cdk import (
    Stack,
    aws_ec2 as ec2
)

For every new resource we use, we will be required to add the correspondent library to the import statement. If you notice that VSCode doesn’t “colour” the code properly during the editing process, check if there is any missing library in this section. 

Next, we have the VPC resources for a basic but fully functional cloud network environment. All resources are created in the class VectorDbInfraStack(Stack) class and after the __init__ function:

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # Basic VPC with 1 private subnet and 1 public subnet
        vectorDbVpc = ec2.Vpc(self, “VectorDbVpc”,
            nat_gateways=1,
            max_azs=1,
            create_internet_gateway=True,
            subnet_configuration=[
                ec2.SubnetConfiguration(
                    name=“VectorDbPublicSubnet”,
                    subnet_type=ec2.SubnetType.PUBLIC,
                ),
                ec2.SubnetConfiguration(
                    name=“VectorDbPrivateSubnet”,
                    subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS,
                )
            ],
        )

Bringing the resources created during the code’s deployment, this is what we will have:

Note that we didn’t have to define several of the resources created by the CDK construct, and it “fills the gaps” of our code with generic values. Even the network’s CIDR is provided during the deployment process, if not defined by the developer, but that doesn’t prevent you from defining your desired IP address range if using the generic 10.0.0.0/16 is not acceptable.

We will need a Security Group to associate most of the resources and allow access by adding the required rules. In a complete solution, we would want to segregate the services by security group and add rules to provide access between them. But in this example, we will create a single security group to associate the resources running on the private subnet, then another one to allow clients running on the public subnet to reach the API Gateway. As long we don’t mix up the SGs everything should work as expected:

 

        # Security Group for all services
        vectorDbNlbSecurityGroup = ec2.SecurityGroup(
                self, “VectorDbNatLbSecurityGroup”,
                vpc=vectorDbVpc,
                allow_all_outbound=True,
                disable_inline_rules=True,
                description=“VectorDbNLbSecurityGroup”,
        )
        # Adds inbound rules to the security group
        vectorDbNlbSecurityGroup.add_ingress_rule(
                ec2.Peer.any_ipv4(),
                ec2.Port.tcp(8000),
                “Allow TCP access to the NLB and Container”)
        vectorDbNlbSecurityGroup.add_ingress_rule(
                ec2.Peer.any_ipv4(),
                ec2.Port.tcp(443),
                “Allow HTTPS access to the API Gateway”)
        vectorDbNlbSecurityGroup.add_ingress_rule(
                ec2.Peer.any_ipv4(),
                ec2.Port.tcp(22),
                “Allow SSH access to the EC2 Instance”)
        vectorDbNlbSecurityGroup.add_ingress_rule(
                ec2.Peer.ipv4(vectorDbVpc.vpc_cidr_block),
                ec2.Port.tcp(2049),
                “Allow EFS access”)
        # Security Group for API Gateway clients
        vectorDbApiGatewaySecurityGroup = ec2.SecurityGroup(self, “VectorDbApiGatewaySecurityGroup”,
            vpc=vectorDbVpc,
            allow_all_outbound=True,
            disable_inline_rules=True,
            description=“API Gateway Security Group for client access”,
        )
        # Adds inbound rules to the security group
        vectorDbApiGatewaySecurityGroup.add_ingress_rule(
                ec2.Peer.ipv4(vectorDbVpc.vpc_cidr_block),
                ec2.Port.tcp(22),
                “Allow SSH access to the resources on the subnet”)

Let’s finish the VPC resources section by adding the VPC endpoint for the API Gateway, which will allow the communication between the service and the resources running within the VPC:

 

        # VPC Endpoint for the API Gateway
        vectorDbApiGwVpcEndpoint = ec2.InterfaceVpcEndpoint(

                Self,

                “VectorDbVpcEndpoint”,
                service=ec2.InterfaceVpcEndpointAwsService.APIGATEWAY,
                vpc=vectorDbVpc,
                private_dns_enabled=True,

                subnets=ec2.SubnetSelection(

                        subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS),
                security_groups=[

                        vectorDbNlbSecurityGroup,

                        vectorDbApiGatewaySecurityGroup

                ])

The diagram with the security groups added is the following:

The Vector Database container setup

For this solution, we will use the ChromaDB vector database service, which will be serving embedding collections behind our API Gateway as suggested in the service’s deployment documentation. The use case for a Vector Database is covered in another article, so our focus is to deliver the infrastructure that will support this service.

ChromaDB can be executed as a Server application and has a Docker image available that makes the whole process easy to execute. On AWS, there are several options to execute a container in stand-alone mode, which ranges from launching an EC2 with Docker service installed, using Lightsail containers, deploying an  AppRunner application, or building a more complex infrastructure with the Elastic Container Services, and finally, for those with k8s knowledge, building a complete Elastic Kubernetes Service. Each service provides a level of freedom and flexibility on par with the price and management effort but in the end, all of them will allow our “dockerised” application image to run in a container and be accessible through the network.

We will use the first option for now, so the next step in our stack code is to define a bootstrap process to bring the EC2 instance running Amazon Linux 2023 up, install the docker service, and build the docker image using the “docker-compose” command.

But let’s stop for a second now to discuss data persistence. We don’t want our database to be lost if something happens to the docker container or even with the EC2 instance, so we will use an Elastic File System to store the SQLite database generated by the ChromaDB service.

First, let’s add the efs and iam resources from the cdk library in the code’s import section:

from aws_cdk import (
    Stack,
    aws_ec2 as ec2,
    aws_iam as iam,
    aws_efs as efs,
)

Then we can add the EFS resource and the IAM policy to allow the clients to mount the filesystem:

        # Adds an EFS filesystem to persist the Vector database data
        vectorDbEfsFilesystem = efs.FileSystem(self, ‘VectorDbEfsFilesystem’,
            vpc=vectorDbVpc,
            file_system_name=‘vectordb_filesystem’,
            vpc_subnets=ec2.SubnetSelection(
                subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS
            ),
            security_group=vectorDbNlbSecurityGroup
        )
        # Adds EFS policy
        vectorDbEfsFilesystem.add_to_resource_policy(iam.PolicyStatement(
            effect=iam.Effect.ALLOW,
            actions=[“elasticfilesystem:ClientMount”],
            principals=[iam.AnyPrincipal()],
            resources=[“*”]
        ))

Now if something happens with the container we can just start a new one using the same database files from the last execution, and we can even schedule backups of the files.

Now we can proceed with the docker-compose definition. Since it uses YAML to define the compose file structure, I opted to have the whole definition in a Dictionary and then use the PyYAML library to generate the string that we will add to the EC2 user data, avoiding any indentation issues.

First, install the PyYAML library in our virtual environment using the terminal session:

(.venv) ~/Projects/apigatewayapp$ pip install pyyaml
Collecting pyyaml
  Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB)
Installing collected packages: pyyaml
Successfully installed pyyaml-6.0.1

It’s better to add the new library to our requirements.txt file, so we don’t need to remember to install it in case we push the code to a git repository and use it again in the future:

$ pip freeze | grep -i pyyaml | tee -a requirements.txt

Going back to the stack code, we will add a import command in the top of the file to have the library resources available in our template:

from aws_cdk import (
    Stack,
    aws_ec2 as ec2,
    aws_iam as iam,
    aws_efs as efs,
)
import yaml

We will then reference the EFS filesystem ID in the docker-compose volume definition, and for that, we need to have the EFS resource created first, then define the dictionary with the compose commands after the EFS code:

        # Docker compose YAML
        composeFile={
            “version”: “3.9”,
            “networks”:
                {
                    “net”: {
                        “driver”: “bridge”
                    }
                },
            “services”:{
                “server”:{
                    “image”: “ghcr.io/chroma-core/chroma:0.4.13”,
                    “volumes”: [
                        “index_data:/index_data”,
                        “chromadb:/chroma/chroma”
                        ],
                    “ports”: [
                            “8000:8000”
                    ],
                    “networks”: [
                        “net”
                    ],
                }
            },
            “volumes”: {
                “index_data”: {
                    “driver”: “local”
                },
                “backups”: {
                    “driver”: “local”
                },
                “chromadb”: {
                    “driver”:“local”,
                    “driver_opts”:{
                        “type”: “nfs”,
                        “o”: “addr=”+vectorDbEfsFilesystem.file_system_id+“.efs.”+Stack.of(self).region+“.amazonaws.com,rw,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport”,
                        “device”: “:/”,
                    }
                }
            }
        }

During the instance bootstrap, docker-compose will download the image ghcr.io/chroma-core/chroma:0.4.13 from the docker repository, expose the port 8000 of the container on the same port of the host, and map the EFS filesystem using the NFS protocol with all the options provided in the “o:” parameter of the chromadb volume. This was really tricky to get right but works quite well.

Now that we have the container creation figured out, let’s add the EC2 user data that will install docker, download the docker-compose program, enable the service and bring the container up:

        # EC2 Instance User Data
        vectorDbEc2InstanceUserData = ec2.UserData.for_linux()
        vectorDbEc2InstanceUserData.add_commands(
        #    “amazon-linux-extras install docker”,
            “yum install -y docker”,
            “usermod -a -G docker ec2-user”,
            “curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose”,
            “chmod +x /usr/local/bin/docker-compose”,
            “ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose”,
            “systemctl enable docker”,
            “systemctl start docker”,
            “cat >> /home/ec2-user/docker-compose.yml << EOF\n”+yaml.safe_dump(composeFile)+“EOF”,
            “mkdir /home/ec2-user/config”,
            “docker-compose -f /home/ec2-user/docker-compose.yml up -d”,
            )

Just ensure that the line with the code “cat >> /home/ec2-user/docker-compose.yml << EOF\n”+yaml.safe_dump(composeFile)+“EOF” has no line break, since this is a single command and will create the docker-compose.yml file with the content from the composeFile dictionary using the yaml.safe_dump function. 

With the user data done, we can finally have an EC2 instance:

 

        # EC2 instance with docker services
        vectorDbEc2Instance = ec2.Instance(self,
            “VectorDbEc2Instance”,
            instance_type=ec2.InstanceType.of(
                    ec2.InstanceClass.T3A, ec2.InstanceSize.SMALL
            ),
            instance_name=“VectorDbEc2Instance”,
            vpc=vectorDbVpc,
            machine_image=ec2.MachineImage.latest_amazon_linux2023(),
            security_group=vectorDbNlbSecurityGroup,
            vpc_subnets=ec2.SubnetSelection(
                    subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS
            ),
            block_devices=[
            ec2.BlockDevice(
                volume = ec2.BlockDeviceVolume.ebs(15),
                device_name = “/dev/xvda”,
            )
        ],
        user_data=vectorDbEc2InstanceUserData,
    )

Deploying the stack at this point will launch a t3a.small instance with 15GB of EBS volume in the private subnet, with a docker container listening at the TCP port 8000 and mounting an EFS filesystem using NFS:

Enabling service access with a Network Load Balancer

One of the requirements of a private API Gateway is to have a network load balancer as the target for the VPC link that will be used in the private integration. In the next steps, we will create the load balancer, do a little trick to change the associated security group, add a target group, and finally bring a listener up to accept connections. Let’s start by expanding the required imports with all remaining resources:

 

from aws_cdk import (
    Stack,
    aws_ec2 as ec2,
    aws_elasticloadbalancingv2 as elbv2,
    aws_elasticloadbalancingv2_targets as targets,
    aws_apigateway as apigw,
    aws_iam as iam,
    aws_efs as efs,
    Duration,
    CfnOutput)

Defining the private NLB is quite straightforward, requiring only to set the internet_facing as false and the subnet as PRIVATE_WITH_EGRESS:

 

        # Private Network Load Balancer
        vectorDbNlb = elbv2.NetworkLoadBalancer(self, “VectorDbNlb”,
            vpc=vectorDbVpc,
            internet_facing=False,
            load_balancer_name=“VectorDbNlb”,
            vpc_subnets=ec2.SubnetSelection(
                subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS
            )
        )

Notice that there is no option to choose a Security Group in the NLB definition, but we definitely need to use the same as the EC2 instance running the service and also the API gateway VPC link. There is a workaround for that as below:

 

        # Workaround to replace the default NLB security group:
        # https://stackoverflow.com/questions/76997098/aws-cdk-supporting-nlb-security-group
        vectorDbNlbDefaultSecurityGroup = vectorDbNlb.node.find_child(‘Resource’)
        vectorDbNlbDefaultSecurityGroup.add_property_override(
            ‘SecurityGroups’, [
                vectorDbNlbSecurityGroup.security_group_id
            ]
        )

The NLB target is our EC2 instance, hence we have do create a target definition:

        # Private NLB EC2 target definition
        vectorDbNlbEc2Target = targets.InstanceTarget(vectorDbEc2Instance)

Next, we have the target group definition, where we define the TCP port at the target where the service is listening to connections, the type and destination of the target, and the health check duration and timeout:

        # Private NLB Target Group
        vectorDbNlbTargetGroup = elbv2.NetworkTargetGroup(self, “VectorDbNlbTargetGroup”,
            vpc=vectorDbVpc,
            port=8000,
            target_type=elbv2.TargetType.INSTANCE,
            targets=[vectorDbNlbEc2Target],
            health_check=elbv2.HealthCheck(
                interval=Duration.seconds(10),
                timeout=Duration.seconds(5),
            )
        )

The last step is to create the listener and define to which target group it will forward the requests:

        # Private NLB Listener
        vectorDbNlbListener = vectorDbNlb.add_listener(“VectorDbNlbListener”,
            port=8000,
            protocol=elbv2.Protocol.TCP,
            default_target_groups=[vectorDbNlbTargetGroup]
        )

At this point, it should be already possible to connect to the ChromaDB through the NLB address if we deploy the stack:

Creating the gateway for your API requests

Adding an API Gateway to the cloud infrastructure is not a simple task, requiring several steps. Using the Private option adds more steps to the process, so let’s tackle each of them at a time.

The first step is to define the VPC Link and associate it with the NLB created previously:

 

        # API Gateway VPC Link
        vectorDbApiGwVpcLink = apigw.VpcLink(self, “VectorDbApiGwVpcLink”,
            description=“VectorDbApiGw”,
            targets=[vectorDbNlb]
            )        

The API Gateway definition is not complex, but we need to ensure that the resource policy is properly defined since it’s what will prevent external access from anywhere outside the VPC to reaching the service, keeping it private:

 

        # API Gateway definition
        vectorDbApiGw = apigw.RestApi(self, “VectorDbApiGw”,
            rest_api_name=“VectorDbApiGw”,
            endpoint_configuration=apigw.EndpointConfiguration(
                types=[apigw.EndpointType.PRIVATE],
                vpc_endpoints=[vectorDbApiGwVpcEndpoint]
            ),
            deploy=True,
            deploy_options=apigw.StageOptions(
                tracing_enabled=True
            ),
            policy=iam.PolicyDocument(
                statements=[
                    iam.PolicyStatement.from_json({
                        “Effect”: “Deny”,
                        “Principal”: “*”,
                        “Action”: “execute-api:Invoke”,
                        “Resource”: “execute-api:/*”,
                        “Condition”: {
                            “StringNotEquals”: {
                                “aws:sourceVpc”: vectorDbVpc.vpc_id
                            }
                        }
                    }),
                    iam.PolicyStatement.from_json({
                        “Effect”: “Allow”,
                        “Principal”: “*”,
                        “Action”: “execute-api:Invoke”,
                        “Resource”: “execute-api:/*”})
                ]),
        )

Next, we can add the resource to the API Gateway. We will use the “greedy” path {proxy+}, meaning that any path used in the REST request will be forwarded to the integration. We will explain this further when we execute a connection request test to the API Gateway:

 

        # Adds resource to the API Gateway
        vectorDbApiGwResource = vectorDbApiGw.root.add_resource(“{proxy+}”)

Then there is the integration, which creates the connection to the application server running on the VPC. We are using ‘ANY’ as a method since the destination will always be the same for all REST methods, we have the request parameters to pass the {proxy} variable to the integration destination, and the target of the integration which is the DNS name of our newly-created NLB concatenated with the TCP port, the ‘/api/’ string and whatever is sent in the request method as path. Since the ChromaDB expects the incoming requests to use the path ‘/api/v1/’ plus any other string like ‘collections’, we deployed our stage as ‘api’ to fulfil the first part of the path, then we use ‘{proxy}’ at the integration URL to comply with what the server expects:

        # Adds an integration associated with the VPC link, pointing to the internal NLB at port 8000 and /api/{proxy+} path
        vectorDbApiGwIntegration = apigw.Integration(
            type=apigw.IntegrationType.HTTP_PROXY,
            integration_http_method=‘ANY’,
            options=apigw.IntegrationOptions(
                vpc_link=vectorDbApiGwVpcLink,
                connection_type=apigw.ConnectionType.VPC_LINK,
                request_parameters={
                    ‘integration.request.path.proxy’: ‘method.request.path.proxy’
                },
            ),
            uri=“http://”+vectorDbNlb.load_balancer_dns_name+“:8000/api/{proxy}”,
        )

We have the integration ready so now we can add the method in our API Gateway to forward the requests to the integration:

        # Adds a method to the API Gateway resource, pointing to the integration
        vectorDbApiGwResource.add_method(“ANY”, vectorDbApiGwIntegration,
            request_parameters={
                ‘method.request.path.proxy’: True
            })

And the last step is to create the stage deployment, which we will call ‘api’:

 

        # Creates a stage for the API gateway
        vectorDbApiGwStage = apigw.Stage(self, “VectorDbApiGwStage”,
            stage_name=“api”,
            deployment=apigw.Deployment(self, “VectorDbApiGwDeployment”,
                api=vectorDbApiGw,
                description=“VectorDbApiGwDeployment”))

To complete our stack, we will output the URL of our API Gateway so other stacks can reference the value if required:

 

        # Output the api stage URL
        CfnOutput(self, “VectorDbApiGwStageUrl”,
            value=vectorDbApiGwStage.url_for_path(),
        )

And now we have the whole solution available in our diagram:

Deploying and testing the solution

The CDK stack deployment is simply running the “cdk deploy” command, but before doing that let’s check some requirements. The first thing you need to do is check the AWS credentials that will be used to deploy the solution, and running the command below will show that is currently configured:

$ aws configure list
      Name                    Value             Type    Location
      —-                    —–             —-    ——–
  profile                <not set>             None    None
access_key                <not set>             None    None
secret_key                <not set>             None    None
    region           ap-southeast-2      config-file    ~/.aws/config

In my environment, I use SSO credentials, which means I will have to pass the profile I’ve created to access my AWS account as a parameter when running the CDK command.

If this is the first time a CDK stack is being deployed on your AWS account, you will need to do the bootstrapping process before you can go forward. This will launch the “CDKToolkit” stack which will create the roles and resources required by the tool, so if you aren’t sure if your environment was bootstrapped or not just list for this stack using the aws cloudformation command:

$ aws cloudformation list-stacks –query ‘StackSummaries[?StackName==`CDKToolkit`]’ –profile myAwsProfile
[
    {
        “StackId”: “arn:aws:cloudformation:ap-southeast-2:12345678912:stack/CDKToolkit/1a2b3c4d-5e6f7g-8h9i1j-1a2b-0642fba66870”,
        “StackName”: “CDKToolkit”,
        “TemplateDescription”: “This stack includes resources needed to deploy AWS CDK apps into this environment”,
        “CreationTime”: “2022-12-14T05:06:17.187000+00:00”,
        “LastUpdatedTime”: “2023-04-03T08:06:24.885000+00:00”,
        “StackStatus”: “UPDATE_COMPLETE”,
        “DriftInformation”: {
            “StackDriftStatus”: “NOT_CHECKED”
        }
    }
]

If the command returns []it means that the account needs to be bootstrapped:

$ cdk bootstrap –profile myAwsProfile

With the bootstrapping part sorted out, we can go ahead and finally deploy the solution:

(.venv)$ cdk deploy –profile myAwsProfile

The tool will display all the sensitive changes being made in the AWS account during the deployment:

To proceed with the deployment just press ‘y’, then monitor the stack execution:

 

Do you wish to deploy these changes (y/n)? y
VectorDbInfraStack: deploying… [1/1]
VectorDbInfraStack: creating CloudFormation changeset…

✅  VectorDbInfraStack

✨  Deployment time: 588.1s

Outputs:
VectorDbInfraStack.VectorDbApiGwEndpoint097D11A6 = https://1234567890.execute-api.ap-southeast-2.amazonaws.com/prod/
VectorDbInfraStack.VectorDbApiGwStageUrl = https://1234567890.execute-api.ap-southeast-2.amazonaws.com/api/
Stack ARN:
arn:aws:cloudformation:ap-southeast-2:1234567890abc:stack/VectorDbInfraStack/[redacted]
✨  Total time: 596.75s

After the deployment, you can test the API Gateway functionality by using the following steps:

1 – Fetch the REST API ID:

$ aws apigateway get-rest-apis –query ‘items[?name==`VectorDbApiGw`].id’ –profile myAwsProfile
[
    “1234567890”
]

2 – Use the API Id to fetch the resource ID:

 

$ aws apigateway get-resources –rest-api-id 1234567890 –query ‘items[?path==`”/{proxy+}”`].id’ –profile myApiProfile
[
    “abcdfe”
]

3 – Using both the API and Resource ID, invoke the API method test:

 

$ aws apigateway test-invoke-method –rest-api-id 1234567890 –resource-id abcdef –http-method GET –path-with-query-string v1/collections –profile myApiProfile
{
    “status”: 200,
    “body”: “[]”,
    “headers”: {
        “content-length”: “2”,
        “content-type”: “application/json”,
        “date”: “Thu, 05 Oct 2023 01:53:06 GMT”,
        “server”: “uvicorn”
    },
    “multiValueHeaders”: {
        “content-length”: [
            “2”
        ],
        “content-type”: [
            “application/json”
        ],
        “date”: [
            “Thu, 05 Oct 2023 01:53:06 GMT”
        ],
        “server”: [
            “uvicorn”
        ]
    },
    “log”: “Execution log for request 5afbddb2-f825-493b-918f-9dd98cde5d11\nThu Oct 05 01:53:07 UTC 2023 : Starting execution for request: 5afbddb2-f825-493b-918f-9dd98cde5d11\nThu Oct 05 01:53:07 UTC 2023 : HTTP Method: GET, Resource Path: v1/collections\nThu Oct 05 01:53:07 UTC 2023 : Method request path: {proxy=v1/collections}\nThu Oct 05 01:53:07 UTC 2023 : Method request query string: {}\nThu Oct 05 01:53:07 UTC 2023 : Method request headers: {}\nThu Oct 05 01:53:07 UTC 2023 : Method request body before transformations: \nThu Oct 05 01:53:07 UTC 2023 : Endpoint request URI: http://VectorDbNlb-1a2b3c4d5e.elb.ap-southeast-2.amazonaws.com:8000/api/v1/collections\nThu Oct 05 01:53:07 UTC 2023 : Endpoint request headers: {x-amzn-apigateway-api-id=tgi85xlm4c, User-Agent=AmazonAPIGateway_tgi85xlm4c, Host=VectorDbNlb-a94df5ae00718d3c.elb.ap-southeast-2.amazonaws.com}\nThu Oct 05 01:53:07 UTC 2023 : Endpoint request body after transformations: \nThu Oct 05 01:53:07 UTC 2023 : Sending request to http://VectorDbNlb-1a2b3c4d5e.elb.ap-southeast-2.amazonaws.com:8000/api/v1/collections\nThu Oct 05 01:53:07 UTC 2023 : Received response. Status: 200, Integration latency: 29 ms\nThu Oct 05 01:53:07 UTC 2023 : Endpoint response headers: {date=Thu, 05 Oct 2023 01:53:06 GMT, server=uvicorn, content-length=2, content-type=application/json}\nThu Oct 05 01:53:07 UTC 2023 : Endpoint response body before transformations: []\nThu Oct 05 01:53:07 UTC 2023 : Method response body after transformations: []\nThu Oct 05 01:53:07 UTC 2023 : Method response headers: {date=Thu, 05 Oct 2023 01:53:06 GMT, server=uvicorn, content-length=2, content-type=application/json}\nThu Oct 05 01:53:07 UTC 2023 : Successfully completed execution\nThu Oct 05 01:53:07 UTC 2023 : Method completed with status: 200\n”,
    “latency”: 44
}

Success! We have the status code 200 and the response’s body [], meaning that the application service is reachable and is responding correctly.

We can also do a more comprehensive test by simulating the client application. For that, just follow the steps described below:

1 – Launch an EC2 instance of any size (t2.small is enough) in the Public Subnet, associated with the public security group (the one with “VectorDbApiGatewaySecurityGroup” string in the name).

2 – Add a new rule to the public security group to allow SSH connections from your local IP, as described in this AWS documentation. We don’t recommend opening SSH access from the public subnet’s security group to the world (0.0.0.0/0) due to security reasons, but if you know what you are doing it’s also an option.

3 – Connect into the instance using SSH with your key pair, or using the aws ec2-instance-connect command like in the example below:

 

$ aws ec2-instance-connect ssh –instance-id i-0a1b2c3d4e5f6g –profile cevo-dev
The authenticity of host ‘xx.xx.xx.xx (xx.xx.xx.xx)’ can‘t be established.
ED25519 key fingerprint is SHA256:[redacted].
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added ‘xx.xx.xx.xx‘ (ED25519) to the list of known hosts.
  ,     #_
  ~\_  ####_        Amazon Linux 2023
  ~~  \_#####\
  ~~     \###|
  ~~       \#/ ___   https://aws.amazon.com/linux/amazon-linux-2023
  ~~       V~’ ‘->
    ~~~         /
      ~~._.   _/
        _/ _/
      _/m/’
[ec2-user@ip-xx-xx-xx-xx ~]$

4 – Create the virtual environment and install the required Pip libraries:

[ec2-user@ip-xx-xx-xx-xx ~]$ python3 -m venv .venv
[ec2-user@ip-xx-xx-xx-xx ~]$ source .venv/bin/activate
(.venv) [ec2-user@ip-xx-xx-xx-xx ~]$ pip install chromadb
Collecting chromadb
  Downloading chromadb-0.4.13-py3-none-any.whl (437 kB)
    |████████████████████████████████| 437 kB 5.6 MB/s   

[…]
Building wheels for collected packages: pypika
  Building wheel for pypika (pyproject.toml) … done
  Created wheel for pypika: filename=PyPika-0.48.9-py2.py3-none-any.whl size=53723 sha256=0b912daa8a28518d8f1111c99f7aad48029093bf80e337a2935607ac0a7a60e0
  Stored in directory: /home/ec2-user/.cache/pip/wheels/f7/02/64/d541eac67ec459309d1fb19e727f58ecf7ffb4a8bf42d4cfe5
Successfully built pypika
Installing collected packages: urllib3, typing-extensions, sniffio, idna, exceptiongroup, charset-normalizer, certifi, tqdm, six, requests, pyyaml, pydantic-core, packaging, mpmath, humanfriendly, h11, fsspec, filelock, click, anyio, annotated-types, zipp, websockets, watchfiles, uvloop, uvicorn, sympy, starlette, python-dotenv, python-dateutil, pydantic, protobuf, numpy, monotonic, huggingface-hub, httptools, flatbuffers, coloredlogs, backoff, typer, tokenizers, pypika, pulsar-client, posthog, overrides, onnxruntime, importlib-resources, fastapi, chroma-hnswlib, bcrypt, chromadb
Successfully installed annotated-types-0.5.0 anyio-3.7.1 backoff-2.2.1 bcrypt-4.0.1 certifi-2023.7.22 charset-normalizer-3.3.0 chroma-hnswlib-0.7.3 chromadb-0.4.13 click-8.1.7 coloredlogs-15.0.1 exceptiongroup-1.1.3 fastapi-0.103.2 filelock-3.12.4 flatbuffers-23.5.26 fsspec-2023.9.2 h11-0.14.0 httptools-0.6.0 huggingface-hub-0.16.4 humanfriendly-10.0 idna-3.4 importlib-resources-6.1.0 monotonic-1.6 mpmath-1.3.0 numpy-1.26.0 onnxruntime-1.16.0 overrides-7.4.0 packaging-23.2 posthog-3.0.2 protobuf-4.24.4 pulsar-client-3.3.0 pydantic-2.4.2 pydantic-core-2.10.1 pypika-0.48.9 python-dateutil-2.8.2 python-dotenv-1.0.0 pyyaml-6.0.1 requests-2.31.0 six-1.16.0 sniffio-1.3.0 starlette-0.27.0 sympy-1.12 tokenizers-0.14.0 tqdm-4.66.1 typer-0.9.0 typing-extensions-4.8.0 urllib3-2.0.6 uvicorn-0.23.2 uvloop-0.17.0 watchfiles-0.20.0 websockets-11.0.3 zipp-3.17.0

5 – Open the Python3 REPL session:

 

(.venv) [ec2-user@ip-xx-xx-xx-xx ~]$ python3
Python 3.9.16 (main, Sep  8 2023, 00:00:00)
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type “help”, “copyright”, “credits” or “license” for more information.
>>>

6 – Import the chromadb library and create the client session. We will need the API  Gateway URL provided in the stack output as VectorDbInfraStack.VectorDbApiGwStageUrl without the “https://” and “/api/” strings:

>>> import chromadb
>>> chroma_client=chromadb.HttpClient(
host=‘1234567890.execute-api.ap-southeast-2.amazonaws.com’,
port=443,
ssl=True)

7 – In a new deployment, we don’t expect to see any collections created, but let’s run the command to list them:

>>> chroma_client.list_collections()
[]

8 – We can now follow the instructions from the ChromaDB documentation and create a collection, then add some data and test the results:

>>> collection = chroma_client.create_collection(name=“my_collection”)
>>> chroma_client.list_collections()
[Collection(name=my_collection)]
>>> 

9 – Adding the first data to the collection will download additional models from ONNX required by Chroma:

>>> collection.add(
    documents=[“This is a document”, “This is another document”],
    metadatas=[{“source”: “my_source”}, {“source”: “my_source”}],
    ids=[“id1”, “id2”]
)
/home/ec2-user/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████████████████████████████████████████████████████████| 79.3M/79.3M [00:12<00:00, 6.74MiB/s]
>>> 

10 – And finally, let’s query the database for a document:

>>> collection.query(
    query_texts=[“This is a query document”],
    n_results=2
)
{‘ids’: [[‘id1’, ‘id2’]], ‘distances’: [[0.711121446165086, 1.010977382542355]], ’embeddings’: None, ‘metadatas’: [[{‘source’: ‘my_source’}, {‘source’: ‘my_source’}]], ‘documents’: [[‘This is a document’, ‘This is another document’]]}
>>> 

And this is it! Our solution works and we can use the Vector Database to create collections, add data and execute queries. If you are curious about how the API Gateway is handling the requests, just open the AWS console, go to the VectorDbApiGw REST API, and check the Dashboard for all the metrics:

Removing the solution and cleaning up the account

To remove the deployed solution, you just need to run the cdk destroy command from the stack directory, but don’t forget to terminate the client instance launched in the public subnet before deleting the stack:

(.venv)$ cdk destroy –profile cevo-dev
Are you sure you want to delete: VectorDbInfraStack (y/n)? y
VectorDbInfraStack: destroying… [1/1]

✅  VectorDbInfraStack: destroyed

Notice that the EFS filesystem won’t be removed by the cdk destroy command, so you will need to delete it manually. First, we need to get the filesystem ID from the environment:

$ aws efs describe-file-systems –query ‘FileSystems[?Name==`vectordb_filesystem`].FileSystemId’ –profile myAwsProfile
[
    “fs-1234567890abcdef1”
]

Then we can delete the filesystem:

$ aws efs delete-file-system –file-system-id fs-061120167361d4076 –profile myAwsProfile

You may also have to delete all Cloudwatch Logs related to the VectorDbInfraStack.  You can do this easily through the AWS Console:


Conclusion

That’s it! I hope you enjoyed the article and learned a little bit about CDK, private API Gateways and Vector Databases. Cheers!

Enjoyed this blog?

Share it with your network!

Move faster with confidence