The Big Picture The aim of this blog post isn’t to get into the nitty gritty how to build and run a containerized app on your favorite cloud provider with a specific clustering technology. Rather, the aim is to discuss containers – where they came from, how they differ from VMs, scalability considerations, container repositories, running containers locally, what a container image consists of, and finally clustering and orchestration tools available to manage and maintain your container workloads.
If you are wondering whether containers are right for you, and what considerations you may need to think about before embarking on a journey to containerize all the things, this blog post is for you. A Brief History of Containers Few technologies have consumed the interest of the tech community and conferences in the DevOps space lately like containers. Containerization as a concept has been around for many years if you count jailing, again in 2008 with the more complete LXC Linux container management system, and most recently with Docker. Docker the company was founded in 2010 as dotCloud and was renamed to Docker in 2013, and it is at this point that the container revolution really started to build momentum. Docker partnered with RedHat in 2013 and with AWS in 2014 to create the Elastic Container Service (ECS).
What Are Containers and Why Are They Useful To understand why contains are useful, looking at the concepts of the paradigm shift of bare metal servers to virtual machines will help put things in context. In the traditional datacenter, model servers were a bare metal piece of hardware with an operating system and a designated function, but they had many drawbacks. Servers had to be purchased with max loads in mind to handle spikes in capacity – and often sat idle – consumed large amounts of power, were difficult and expensive to backup and maintain disaster recovery.
Server virtualization revolutionized the world when one could purchase a large powerful server. Instead of sitting idle, the hardware could be partitioned up into software representations of RAM, CPU, Disk, Network I/O, etc. The underlying operating system responsible for this was the Hypervisor; common household names in this space are VMware, Hyper-V, Citrix XenServer, and KVM. While the gains associated with virtual machines giving economies of scale (ie: multiple virtual servers per physical server), easing the process of backups and disaster recovery, as well as giving rise to public cloud providers like AWS and Azure, there was still the underlying inefficiencies of having multiple copies of the same OS running consuming unnecessary disk and compute capacity. Enter containers. Containers share the kernel of the underlying container host, however they don’t need to run a full operating system.
Rather, the software and dependencies are wrapped in a container that run on top of the container host. This allows a single container host (be it bare metal, virtual machine, or cloud offering) to host a substantially larger amount of services than the traditional server virtualization model.
From the cost perspective, this allows further benefits that achieve greater economies of scale than what was previously possible. From the development and operations perspective, the improvements are even more vast. Among the chief benefits of containers is the ability to develop in local environments on containers, and ship the container confidently from local dev through lower environments into production with exactly the same set of dependencies all safely wrapped in a container. Given the nature of containers and the multilayered filesystem of Docker, it’s easy to rapidly deploy new images, pull down the most recent image, and commit new images to the container repo. This makes changing libraries without tainting a local or production environment substantially easier, as well as providing rapid rollbacks in the event of a deployment failing or critical bug discovered upon deployment. Scalability Considerations Much in the same respect that applications can be scaled by adding more virtual machines to a load balancer, containers can also scale this way.
When approaching containerization of an application, it’s best practice to segment out the application as much as possible so that each component can be scaled and deployed independently for the consummate application or service they are a part of. What this might look like will vary depending on your application. This may be an Nginx container coupled with one or more PHP-FPM containers, a Solr container, and a MariaDB container. In this respect if a single part of the application needs to be scaled it can be scaled independently of the rest.
So far the picture painted of containers seems pretty rosey, however there are some serious gotchas to be aware of when dealing with containers. Containers are fantastic for handling stateless workloads. If you have a microservice or an application that performs a specific function and doesn’t need to retain data locally it’s a perfect candidate for containerization. It wasn’t more than a few years ago that use of containers in production environments was cautioned against for this very reason, but at the time of writing,containers are production ready and have been for at least the past year, provided that stateful applications are carefully planned.
For these stateful applications there are several options in a containerized environment, including mapping volumes from host through to the container, thus ensuring that the data stored in the volume persists if the container dies or is replaced with a new container, if the containers are running on a cluster of hosts this can also be accomplished by mapping network attached storage directly to the containers or to the container host and passing the volume. Depending on the use of clustering technologies, there are also options at the cluster level for providing persistent volumes for use by specific sets of containers.
If your application is running in a cloud environment such as AWS or Azure you may also leverage things such as Application Load Balancers, RDS (or Azure SQL), as well as other services you wish to offload (CDN, DB Caching, Search, shared storage, etc). While the container itself becomes the atomic unit of scaling it is also important to ensure close monitoring of your container hosts to provide adequate provisioning as well. Some clustering services we will discuss later (such as KOPS and ECS) have mechanisms for setting up alerting metrics that can trigger actions to scale both container workloads and the underlying hosts running the container workloads. Container Repositories Much like we have repositories to check in and out our code bases, containers also have repositories where they are stored and revisions to containers are checked in.
The most common container registry is the freely available Docker Hub. From the Docker Hub you can find official containers for Debian, Ubuntu, CentOS, Nginx, MariaDB, Gunicorn, etc.
These can be used as the initial building block for your customized container images. Once you customize your images though, you will need to store them somewhere. Docker Hub offers 1 free private container repo to members, and several paid options for multiple private container repos.
This is akin to GitHub’s paid service to host private git repos. Another option is the Amazon Elastic Container Repository, or ECR. ECR allows you to store your container images in a private repository within AWS.
Much like their other services, there are no costs for ingress bandwidth, but there are charges for egress bandwidth outside of AWS. If your container workload is hosted within AWS, then ECR may be an excellent choice – the latency is low and speed is high, allowing larger container images to more rapidly be downloaded and deployed to your container hosts, as well as the ease of integrating your images with services such as ECS. Similarly, Azure provides a container repository offering that utilizes the same Docker CLI options, and is built for easy use with open source tools.
There are a plethora of different docker repositories available, both public and private each with different costs, features, and limitations. Running Containers Locally One of the key benefits to containerization applications is to develop on the same platform as production, which in this case is a container that has all dependencies wrapped in itself that can be shipped through various stages of deployment and various host operating systems with consistent results. Running containers locally allows for developers to work off the same consistent image both reducing the likelihood of incompatibilities from mismatched prod and dev environments. It also allows for developers to quickly onboard and offboard on projects with their local environments, and affords opportunities to test new libraries or upgrades of libraries without polluting the local development environment. Additionally, QA engineers have an easier time spinning up a container and all of its dependencies locally, allowing for more rapid QA work.
To run containers locally, the easiest option is to install Docker on your local machine. Docker makes this install process relatively simple for multiple platforms. You can visit this following to download the latest version of docker for your platform. Locally on a Windows machine, Docker will run in Hyper-V with a special VM created during the Docker install process. You can additionally choose to install the Docker toolkit and instead choose to use VirtualBox instead. The process of installing Docker for Mac is also simple – just go to the same link above to download Docker for Mac. In the past, Docker utilized VirtualBox to run a VM to provide the Linux kernel to containers.
This option is still available if you install the Docker toolkit and have specific needs around this. Otherwise the newer version of Docker run on Hyperkit, a native virtualization environment. In any case once Docker is installed locally you will be able to use the Docker CLI to pull images, as well as create and run Docker containers. If your production environment runs Kubernetes (discussed later in this piece), you may also choose to install minikube on your local environment to simulate the behaviour of a production Kubernetes environment. Even if this is the case, you will need a container engine installed locally to satisfy the needs of minikube, whether that’s Docker or Rkt. Anatomy of a Container I would be remiss to have a discussion about containers from the big picture level without talking about the anatomy of a container.
Essentially in an ideal world, a container would be as tiny as possible only including the binaries and code needed to run the microservice or application in the container. Essentially the container itself is a small Linux operating system with a standard file system layout (/etc, /opt, /var, /usr, /home, etc). One of the most interesting points of containers themselves is the unique layered file system AuFS. The design of AuFS is such that the base image of a container exists as a layer of the filesystem.
Each subsequent container used in a build creates a new layer of the filesystem. Each layer from bottom up is read-only, except the top layer to reduce duplication of data. If a change is needed on a file that exists in a lower level filesystem of the container, it is copied up to the top layer and modified there. This essentially creates a system of diffs for the filesystem. Beyond the file system unique to Docker, the only other real point of interest is the process by which containers are built. To build a container there are really 2 possible approaches. The first is to spin up a container, customize it, and run commands to tag it as a new container image.
This is a very easy approach, but it is not best practice to use this method. The other way to build container images is to create a Docker file with a series of directives that will be run. Once the container is built it can be tagged for use and committed to the container repo. This method is the preferred best practice methodology.