You may have heard of Docker, the engine which provides a way to build, distribute, and run Linux containers — but did you know it also works on Windows? Windows Containers provide a great answer to certain application deployment, resilience, and migration challenges but they aren’t a silver bullet. In this post, I’ll explore some of the ins and outs of running a containerised Windows workload on the AWS Elastic Container Service (ECS) platform.
This is part 1 of 2 — in the second part, I’ll implement a Windows container workload for real.
A brief introduction to containers
What are containers?
There is a spectrum between physical computers (sometimes called “tin”) and serverless (aka “functions as a service”, implemented by AWS Lambda and similar):
tin ———– virtual machines ————– containers —————- serverless
Physical systems and virtual machines each run their own operating system, with its own kernel that must be booted up and start managing devices before they can do work.
Containers are more analogous to individual processes running in that operating system in that they use an already-running kernel, but use the features of that kernel to provide some level of isolation from other containers. In addition, a container has its own internal filesystem, so you can bundle up application dependencies, shared libraries, static assets, and so on and deliver them as a single image to be run in the container host.
Why are containers important?
Containers allow you to build a single deployable artifact which can be tested, then deployed to different environments without making any changes. Since they encapsulate all the runtime dependencies for an application, they reduce the risk of incomplete deployments, mismatched versions, and deploying untested code.
Because containers provide isolated runtime environments, you can use them to increase the utilisation of a single server, whether it’s physical or a virtual machine. This becomes especially important when you’re talking about paying a per-server license fee (as is the case with Windows). Reducing the count of Windows servers you need to run, while not compromising security or performance of the deployed workloads sounds like a pipe dream, but it’s perfectly achievable and with containers you can reduce your annual license cost as well as the amount of manual intervention usually required to run Windows workloads.
Windows Containers fundamentals
To a first glance, Windows containers look identical to Linux containers. You can build and operate them the same way after all (a Dockerfile, fed into docker build, pushed to a repository with docker push, run with docker run, and so on).
In addition, support for the Windows registry is provided (a good thing, since it’s so fundamental to most Windows applications) and each running container has its own separate registry.
Isolation Mode
In Linux, there is only a single isolation mode: default. This makes use of kernel namespaces and cgroups to separate one running Docker container from another.
In Windows, there are two modes: process, and hyperv.
Process isolation is very similar to the Linux namespaces/cgroups mode: your container runs in the host operating system kernel.
Hyperv isolation can be thought of as running a mini virtual machine inside the host, so there’s full kernel isolation.
There are implications here.
First, security. By having a completely separate kernel, hyperv isolation is obviously more secure; but if you run applications multi-tenanted on Windows systems (eg you run more than one copy of the program) then process isolation already provides you with greater segregation and therefore better security than your current setup.
Second, performance. Running a hyperv mini vm isn’t free; there’s a startup cost, and a performance overhead associated with it. Stratoscale’s testing showed that there’s a 9-12% performance overhead to hyperv on bare metal, and it’s worse on virtual machines like EC2.
Third, kernel versions. On Linux, it mostly doesn’t really matter whether a container was built on a 4.x kernel — you can run it on a 5.x system with no problems. On Windows, however, process isolation means that you must build your container on a system that’s the same release version as your target, using a base image that’s the same version as well. For example, Windows Server 2019 may be an 1809 or a 1909 kernel, and you have to match that version up all along the pipeline. Hyperv avoids this, so you can run on different kernel versions; you can even run Linux containers in a hyperv VM (but that’s not what we’re here for).
Fourth, code pages. If you’ve been using Windows for long enough, these two words may send a shudder down your spine. There’s a problem with hyperv in that it doesn’t pass through the host code page, which can result in some … weird … string manipulation problems.
Container Images
All container images are composed of layers — there’s a base layer, which often has a filesystem with just enough in it to be useful. Each step in a Dockerfile adds a layer, which is cryptographically labelled so when pulling an image you only have to transfer layers you don’t already have.
When container images are built, the instructions in the Dockerfile are cryptographically hashed so that previous layers that have already been built (and haven’t changed) don’t have to be rebuilt, or re-downloaded. This can greatly speed up the process of creating image updates.
You can create a container image based on other images that exist, either locally or in a container registry; this means that it’s simple to create “base” images that are common to multiple applications, and the common layers only need to be transferred to the runtime environment once for all the applications.
Windows in Containers Caveats
It’s not all perfect; there are some challenges with running a Windows-based workload inside containers, which can mean that certain existing workloads aren’t well suited for containerisation.
No GUI interaction
You don’t get to see, or interact with, any GUI that your application creates. If the application can run in a headless mode (and you can control it fully through the command-line, an API, or through configuration files) you’re fine, but if the application requires pointy-clicky in order to function, then containers are not for you. From the application’s perspective there is a GUI — it can create what it thinks are windows, and buttons, and whatnot, and all of that will function perfectly normally — but you just can’t attach an actual GUI to it, via RDP or any other means. It appears that this is a specific technical decision from Microsoft, and that (as of writing) there are not only no plans to support it, but that container base images from Microsoft have been specifically patched to prevent it.
This rules out the question of running a full desktop session in a container. Not only is it technically impossible, Microsoft’s license terms explicitly prohibit it.
Windows container images are large
At time of writing, a Windows Server Core base image (1 layer) version 1809 weighed in at 3.7GB, BEFORE adding any application code, whereas a fully-functional Alpine Linux base image is only 4.4MB!
If you’re considering containerising an existing application, you may not be able to get away with any of the nanoserver base images as too much is missing. This has implications for how long it can take to build, the storage requirements, and the time to pull a container image to a runtime host.
Potential for future licensing changes
Microsoft doesn’t allow pushing their base layers to a repository that they don’t own, and you have to start from their base layers. You can cache the base layers in the compute worker node, but not anywhere else. This means that every time you want to build an image, say in a fresh agent in a build pipeline, you have to pull down those base layers from Microsoft’s repository.
Because Microsoft keeps their base layers close to their chest, there’s a possibility that they could, at some point in the future, start charging for it and denying downloads for unlicensed use. This would hugely impact the value proposition of Windows containers, which is currently a great way to multi-tenant on a single license, and would give them a new (if temporary) revenue boost.
Technical differences
Windows containers don’t support memory reservation — only the memory attribute of a container. In Docker, memory reservation sets the lower bound for how much memory (RAM) a container will be allocated. This is useful for orchestration frameworks to understand where a container could be scheduled. The memory attribute sets the maximum amount of memory that the container can use. Attempts by the processes inside the container to use more than the maximum memory result in that process getting a memory allocation error (as if it had tried to use more that the amount of memory that a physical machine actually had, without swapfiles being set up).
There is a poorly-documented workaround for this though; you can set the environment variable
ECS_ENABLE_MEMORY_UNBOUNDED_WINDOWS_WORKAROUND
to true before starting the ECS Agent, and then just use the memoryReservation setting on your container definitions; Windows and ECS will then use the memoryReservation to schedule your containers, but won’t actually enforce any memory limits, or actually reserve any memory at all.
Operations and Observability
In most cases, Windows applications have been developed with the expectation that at some point somebody will log in to the server and click about to manage and operate the service. This could include troubleshooting, restarting processes, looking at log files, and so forth.
These capabilities all have to be supported in a container world, but the way they are achieved must change.
First, logging in to the server. As we saw above, you cannot RDP to a container and it appears to be the direction from Microsoft that you never will be able to. You can execute a command-prompt shell inside a running container though, whether you’re more comfortable with Powershell or vanilla CMD; but this flies somewhat in the face of container “thinking” where you try to treat a container as a disposable, stateless machine. This isn’t always possible with existing applications, many of which retain state in memory while they’re running, so there may be additional development work required to make the application truly container-native.
Second, restarting processes. In a container-native mindset, each container runs a single process which is tied to the lifecycle of the container itself. If the process crashes, something “outside” starts a new container from the same image, with the same parameters and environment. This is nice, but can require careful configuration and even some additional devops-type work to avoid data loss for applications which use local disk (or the registry) for storing their state.
Log management is almost simple by comparison. In a container-native world, logs should be emitted by the application on the terminal and the container environment takes care of getting those logs to wherever they need to go (usually a centralised logging facility). if your application writes log files to disk, the use of some cunning wrapper scripts to dump them out to the terminal can work; or external volumes can be used to capture the log files for posterity. The usual caveats around log size, log file rotation, and so forth still apply.
Metrics and monitoring can be easier in a containerised world, as most container environments capture and report metrics for things like CPU and memory use out of the box, plus you get metrics around rate of restarts, number of copies of the application running, and so on; instrumenting your applications for deeper inspection, using metrics gathering frameworks like Prometheus is a task that requires additional development activity.
Amazon ECS
Amazon’s Elastic Container Service (ECS) is a container orchestration service which takes care of much of the “grunt work” of running containerised workloads.
With a fully-managed control plane, you choose how you bring the compute capacity to the party, and ECS looks after container scheduling (which worker node to run the container on), restarts (keeping containers running), scaling (adding more containers if load goes up, and removing them if it goes down), and attaching containers to load balancers to direct traffic to the back-end.
There are three models for bringing the compute to ECS:
- EC2, which are the AWS Virtual Machine equivalent, and which are referred to as “instances”;
- Fargate, which is a fully-managed compute platform for container workloads; and
- On-premises worker nodes, which install and run the ECS Agent (and must be configured with AWS IAM credentials in order to operate as part of the cluster). Recently, AWS Outposts have become available which allows you to combine the ease of use of EC2 with the “I can see the cables” nature of on-premises compute.
Put simply, ECS organises compute nodes into a fleet, which runs Services that comprise Tasks, each of which can have multiple containers (up to 10).
ECS and Windows
Windows is a fully-supported compute worker platform for ECS, although there are some features which are available for Linux-based workloads that haven’t made it to Windows yet. The EC2 and on-premises models are supported, but Fargate is not available (likely for licensing reasons).
Task-based networking also isn’t available yet (which is where each container could have its own separate network interface — making network security easier to set up with least-privilege access).
With that caveat out of the way, Windows on ECS is perfectly functional and features are being added all the time.
If you’d like to know some of the things on the roadmap for Windows on ECS, you can check out the AWS Container Roadmap at GitHub.
Conclusion
Windows and Containers are a great way to increase utilisation of the most expensive resource — the actual Windows-licensed VM — for certain workloads. ECS is a fast, straightforward way to orchestrate containers, whether Windows or Linux, on the AWS platform. If you’d like to explore containerised Windows workloads further, please get in contact with us!