Using the new Resource Tagging API in anger

Finding those hard to find AWS resources by tag names

Our team wanted to validate the Application Load Balancers (ALBs) in our environment after a deployment - the problem was they were lost amongst all the other ones in there. They were tagged, but we couldn’t work out how to find them - until we saw the Resource Tagging API.

Steve Mactaggart

Building quality into an automated delivery pipeline can sometimes cause a head scratcher when you have to figure out how to validate the work you’re currently doing. This happened recently as we started extending our validation of Amazon Web Services (AWS) resources to a set of Application Load Balancers (ALBs).

In our AWS account we have many ALBs, across a number of different environments. When validating a deployment into one of the development environments we only want to validate those ALBs, and have the build fail if there are any unhealthy nodes.

The script to walk through the complexity of the ALBs will be for another post, but the biggest challenge we ran into was the ability to query the list of ALBs for ones with certain tags.

Python is the tool of choice, and boto3 is a great library for connecting to the AWS API.

From the boto3 API docs you’ll see there is a describe_load_balancers call which is suspiciously simple in its implementation.

response = client.describe_load_balancers(
    LoadBalancerArns=[
        'string',
    ],
    Names=[
        'string',
    ],
    Marker='string',
    PageSize=123
)

I can query by ARN or Name, but nothing like the tag filters that you can apply to an EC2 describe_instances.

A limitation emerges

Looking deeper into the AWS API this is not a limitation of boto3, it seems that AWS don’t give you an API to query Load Balancers by tag. Well that sucks. Of course one option is to load them all and filter them in our script, but that is not efficient or performant at all.

There must be an easier way - using the AWS console allows you to filter the Load Balancers by tag, but nothing could be found in the boto3 or AWS API documentation. Possibly the console uses some un-published API to perform this function.

Luckily we have a great crew at Cevo, and one of them pointed to a little used API, the Resource Tagging API. It isn’t one that we had used before, however after having a look, that’s because it’s only recently released.

Here comes the AWS Resource Tagging API

With this API you can query for tags across a whole raft of different AWS services in the one call, and because boto3 is awesome it has full support for this out of the box.

response = client.get_resources(
    PaginationToken='string',
    TagFilters=[
        {
            'Key': 'string',
            'Values': [
                'string',
            ]
        },
    ],
    ResourcesPerPage=123,
    TagsPerPage=123,
    ResourceTypeFilters=[
        'string',
    ]
)

This looks much better for our use case - we can filter by the tags we know, and limit that to the resource type elasticloadbalancing:loadbalancer.

Bringing it together

Here is a simple script that wraps this up in a simple cli tool; obviously you can adapt this as you see fit for your needs.

import boto3
import sys
import argparse

client = boto3.client('resourcegroupstaggingapi')

def lookup(key, value):

    def lookup_for_tags(token):
        response = client.get_resources(
            PaginationToken=token,
            TagFilters=[
                {
                    'Key': key,
                    'Values': [value]
                }
            ],
            ResourcesPerPage=50,
            ResourceTypeFilters=[
                'elasticloadbalancing:loadbalancer',
            ]
        )

        return response


    total_results = []
    response = lookup_for_tags("")
    page_token = ""

    while True:
        total_results += response["ResourceTagMappingList"]
        page_token = response["PaginationToken"]

        if page_token == "":
            break

        response = lookup_for_tags(page_token)

    for r in results:
        print r["ResourceARN"]


def parse_args(args):
    parser = argparse.ArgumentParser(
        prog="alb_lookup",
        description="Search for ALBs based on tags.",
    )
    parser.add_argument('--tag_name', default="environment", help="The name of the tag to filter on")
    parser.add_argument('tag_value', help="The value of the tag to filter on")

    return parser.parse_args(args)


def main():
    cli_args = sys.argv[1:]
    args = parse_args(cli_args)
    lookup(args.tag_name, args.tag_value)


if __name__ == '__main__':
    main()

One last gotcha

The keen eyed reader will notice that this supports iterating through calls to the API using the PaginationToken; while this is good practice to ensure scripts support more than the initial page of results - in this case it is actually required.

There is an interesting implementation detail (limitation) of the Resource Group Tagging API that seems to do filtering on the server side that doesn’t exclude the “miss” results from the response.

In our case where we have lots of resources, the first 2-3 calls to this API returned no results, we had to keep trying (with the new PaginationToken each time) until no token was returned to ensure that all resources had been reviewed.

This might get fixed in upcoming versions of the API, but just be aware that your script (and Filter) might actually be working correctly, you just need to keep going through the pages.

Contact us

We will get back to you within 24 hours.