Fast Feedback 101 [DevOps Series]

Overview

Today, I’m going to talk about fast feedback, what it is, why it’s important, and walk through an example of how we made it happen in one of our own projects. As I write this, I’m currently waiting for a Cloudformation script update to deploy a lambda version of a product to a development environment. This is an implementation of Infrastructure as Code, where code stored in version control defines how we build, deploy, and run our product. There’s just one problem. The pipeline to build and deploy is slow. Terrifyingly slow. It takes a full ten minutes to find out if there was a problem deploying, and another fifteen minutes on top of that for the deployed code to run, and then I’ll have to spend another ten minutes or so looking over the output to make sure it meets expectations. That’s thirty five minutes before I get any feedback if my change was successful, even if it’s a one line change. That’s thirty five minutes where I can’t move forward on this project. Let’s call that thirty five minutes our cycle time.

Waiting on our Cycle time

Superficially, a cycle time of thirty five minutes looks ok. However, if I have to wait for thirty five minutes between each change, I’m going to spend a large portion of my work day either twiddling my thumbs, or constantly swapping back and forth between two or three different tasks. It’s well established that multi-tasking is a good way to kill productivity, so the latter isn’t really a solution. There’s several problems with this setup.
  • Firstly, I can’t test any of the code without a deployment, even if it’s a single function that doesn’t touch any hosted services.
  • The deployment pipeline is slow. I’m deploying a tiny change to a small lambda, and it shouldn’t take ten minutes to do that.
  • I’m running the entire data set through the deployed change, only to throw away 99.99% of the data. Clearly, that’s inefficient.
  • Finally, I’m testing things manually. It might be “only” ten minutes to look at the data, but there’s already 60 odd commits in this repository, which means that I’ve probably spent a good ten hours testing this manually, not to mention waiting around for fifteen minutes every time a deployment went out. A good set of automated tests with a much smaller dataset could have cut this to a few minutes, or even seconds.

Unit Testing the important parts

Let’s start on the first item. Earlier, I mentioned cycle time, encompassing the deploy-run-test loop. But – what if I could just run and test the parts I cared about. In the middle of my lambda is a set of functions that query a DynamoDB table based on the inputs, and return a set of data. These functions comprise the majority of the lambda code. With a bit of dependency injection and some test doubles, I can start running and testing parts of my code locally. With this change, the time it takes to get feedback on simple code changes drops from 35 minutes to about a few seconds. I haven’t changed the cycle time, but at least one part of the feedback loop is radically faster, and has drastically cut the number of times I’ll have to go through the complete cycle.

Cutting Deployment times

The deployment pipeline is a bit more tricky, as there’s a tremendous variety of reasons for slow pipelines. In this case, it turned out that there were a couple of docker images being built in the same repository. These are being built from scratch for every run, but 95% of the custom pieces are shared with every other docker image deployed by our organisation. The common docker image base was split off into its own repository, and we can now build just the docker layer we need. This change cuts the deployment from ten to three minutes.

This kind of problem is common in CI/CD pipelines. If you’re not sure where to start, then Cevo is your go to cloud partner for solving DevOps issues.

Removing Manual Testing

Finally, there’s the manual testing. We can cut down the ten minute wait by setting up each branch to deploy it’s own cloudformation stack, and running tests against a much smaller dataset. This reduces the run time to less than a minute.

The unit tests earlier mean that we no longer need to test every possible case, just the entry and exit handling of the lambda, so we’re down to the happy data processing path, and error path – so that’s just two manual test cases, which is about two minutes all up.

Review

My thirty five minute cycle of…

  • Ten minutes to build and deploy.
  • Fifteen minutes to run.
  • Ten minutes to test (manually).

 

… is now a total of six minutes:

  • Three minutes to build and deploy.
  • One minute to run.
  • Two minutes to test (manually).

Automate everything

There’s one last trick. If you’ve been following me on my DevOps 101 series, you already know what it is. We’re going to automate the last step. We only want to test the input and output of the lambda so we’re looking for a tool we can use for integration testing. The code is in python, so I reach for Behave, a python BDD tool that uses the Gherkin syntax. This syntax makes it simpler for other staff to understand the use cases and more tests in the future if needed. With the reduced dataset, the testing time is down to a handful of seconds. Barely four minutes to deploy and test a change in a real environment. A 9x improvement in productivity!

Success!

In this process, I’ve radically reduced the time to get feedback on my changes in two different ways. One was by making the process more efficient, and the other was by removing the need to go through the full cycle for every change. These two strategies are the heart of fast feedback.

If your team or organisation is suffering under the weight of a slow process and needs fast feedback, give Cevo a call. We’re your DevOps and Cloud Partners, and fast feedback is our jam.

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn