Docker multistage builds: a minimal pipeline as code

Using later features of Docker to implement pipeline-as-code

Have you wanted to get started with pipeline as code, but don’t have a CI/CD server? Want to do more with Docker than just putting code into containers? This post is for you

Colin Panisset

Realising pipeline-as-code

One of the golden promises of pipeline as code is that your pipeline definitions (how to build, test, and perhaps deploy your application) travel with the source code, test data and even deployment scripts through the commit history of your version control repository. You can then take a snapshot of a particular point in time and know with a high degree of certainty that everything you need to recreate a particular build is kept together. This aids new starters on a project to understand how things are put together, and builds repeatability and confidence in the build/test/deploy cycle.

There are many, many fine CI/CD toolchains which support pipeline as code natively; but if you’re stuck with one that doesn’t support this, or if you want to try it out without committing yourself to a particular one, you can still create a repeatable build/test pipeline and deliver a deployable artifact using a consistent process if you’re using Docker already.

To get started, you need to have Docker 17.05 or later; that’s the only requirement. We’ll be using the multistage build features available in that version and later. If you’ve never used Docker before, you can find an excellent introduction at the Docker site

Setting the scene

Let’s create a build and test pipeline for a simple service, written in Go. The actual language doesn’t matter, because all the dependencies for build and test will be encapsulated inside the Docker images as the pipeline runs; I just like Go. Because this is a demonstration, we’ll build a simple service which takes a date in an unknown format, and returns an ISO8601 date in UTC.

Our service will use some external packages that we’d like to vendor so that we know which versions we’ll be using. The build will have to take into account dependency management and unit tests, plus building a final minimal deployable artifact. We’re assuming that the method of image publication and deployment won’t be part of this pipeline.

Building the basic pipeline

Since we’re using the multistage features of Docker, our pipeline definition will actually be a Dockerfile. If you’ve used Docker before, but haven’t ever used the multistage capability, it boils down to:

  • Build an image using the normal Dockerfile commands
  • In the same Dockerfile, start building a second image, copying bits as needed from the first image
  • Keep doing that, if you need to, copying bits from earlier images as required
  • Build your final, minimal, deployable image

The advantages here are enormous:

  1. You don’t end up shipping all your build-time dependencies or intermediate layers (of adding and removing files) to production
  2. You can use different images for different stages of the build pipeline, as needs demand (eg separate build, unit-test, security-scan, fuzz-test stages all from different base images)
  3. Your final deployable image is as small as can be, while including everything required to run your code.
  4. All the layers that can be cached will be, resulting in reduced build times (after the first build)
  5. The final deployable image can be built from a single source code repository

Our initial Dockerfile

Let’s start with a Dockerfile that assumes we’ve already built and tested our application code. This will look very familiar to people who already use this method for generating production Docker images.

In this example, our runtime is a statically-compiled Go binary, called utcservice. We’ll assume that it’s been compiled and tested, and is sitting in the same directory as the Dockerfile:

FROM scratch
ADD utcservice /
EXPOSE 8080
ENTRYPOINT [“/utcservice”]

We build the container with

localhost$ docker build -t utcservice:final .

which gives us a nice, tiny image to deploy:

localhost$ docker images utcservice
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
utcservice          latest              ecbd26fc5c1a        3 minutes ago       10.1MB

10MB for a production deployable is pretty nice, but this is the end goal – we’ve skipped over so much! Let’s fill in the blanks.

Adding tests

Our basic Dockerfile starts from the end – a tested, compiled binary. If we want unit tests (and every codebase should have unit tests, as per the test pyramid), we can make use of Go’s lovely inbuilt support for tests. Of course, this assumes that you have Go installed, and the version is ok, and and and … so instead, let’s run the build and the tests inside a Docker container.

While Docker builds its images, it creates an ephemeral container for each layer. We can use those ephemeral containers to run our tests. If we start off with a base image which includes all the required Go tools, we can just add our application code, run the tests, build the binary, and away we go (pun intended):

A Dockerfile which does the test/build step looks like:

# start from an upstream, maintained build image for Go
FROM golang:1.10

# Install dep (for managing versioned go package dependencies)
RUN curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh

# create the location for our code, and go there
RUN mkdir -p /go/src/github.com/cevoaustralia/utcservice
WORKDIR /go/src/github.com/cevoaustralia/utcservice

# Copy our code into the work directory
ADD *.go Gopkg* /go/src/github.com/cevoaustralia/utcservice/

# Make sure we have the correct versioned dependencies installed
RUN dep ensure

# Actually run the tests, and then build the binary if the tests passed
RUN go test && go build

We build the image with the command:

docker build -t utcservice:test -f Dockerfile.test-build .

Great! We have a single Dockerfile which installs all the versioned build-time dependencies, runs the tests and creates our run-time binary. In the days before multistage builds, we’d have a couple of choices at this point. We could:

  • copy the binary out of the resulting image into the host filesystem, then add it to a new clean base image (meaning we now have to have wrapper scripts, and a host filesystem that we can access, and so on); or
  • add more RUN steps to remove the build-time dependencies, resulting in the most minimal runtime image possible, at the cost of shipping all the intermediate layers of added-and-removed changes every time.

For sake of comparison, the build-time image before any cleanup clocks in at:

localhost$ docker images utcservice:test
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
utcservice          test                e47540c4077c        13 minutes ago      787MB

787MB! That’s quite a difference, and it’s only going to get bigger if we remove build-time components, as each additional layer just adds to the deltas. In addition, if we leave it with all the build-time dependencies in it, we’re shipping all sorts of tools and things to production that probably shouldn’t be there.

Combining Dockerfiles

Instead, let’s combine the two stages we have above: our test-and-build stage, and our final deployable stage, into a single Dockerfile. It’s easy:

# start from an upstream, maintained build image for Go
# This time, we're giving it a friendly name for use later in the pipeline
FROM golang:1.10 AS build

# Install dep (for managing versioned go package dependencies)
RUN curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh

# create the location for our code, and go there
RUN mkdir -p /go/src/github.com/cevoaustralia/utcservice
WORKDIR /go/src/github.com/cevoaustralia/utcservice

# Copy our code into the work directory
ADD *.go Gopkg* /go/src/github.com/cevoaustralia/utcservice/

# Make sure we have the correct versioned dependencies installed
RUN dep ensure

# Actually run the tests, and then build a statically-linked binary if the tests passed
RUN go test && \
    CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-extldflags "-static"'

# --------------------------
# This is the second stage -- starting from the most minimal possible image
FROM scratch

# We copy in the built binary from the previous stage
COPY --from=build /go/src/github.com/cevoaustralia/utcservice/utcservice /

# and away we go
EXPOSE 8080
ENTRYPOINT ["/utcservice"]

We build the final image with the command

localhost$ docker build -t utcservice:final Dockerfile.test-build-runtime .

This is a simple 2-step build pipeline! We’ve encapsulated all our build and test dependencies, separated them from the deliverable runtime, and ended up with the same 10MB container to deploy to production.

Going Further: Multiple Sources

Imagine we now want to combine some kind of static content with our webservice. Someone else in our organisation has already created a Docker image with the content installed, so we just have to add it. A tiny addition to the existing Dockerfile is all it takes:

# … existing Dockerfile up there ^^

# Add the clock logo from the pre-built static assets image
COPY --from=dockerrepo.example.com/common/static-assets:latest /clock.png /

You can reference Docker images by name, and Docker will pull them down if need be. Cool, huh?

The payoff

Hopefully you’ve learned a bit about Docker’s multistage feature, and one way of delivering robust, minimal production-ready artifacts in a repeatable way. The potential value is significant: increased confidence to build and release, reducing the time it takes for you to realise return on your investment.

If you’d like to know more, get in touch!

The code

If you’d like to see the complete set of Dockerfiles, Go source code and so forth, you can get it at https://github.com/cevoaustralia/blog-docker-pipeline-as-code