k8s for babies

This page is a living document under construction! Information may be incomplete or inaccurate.

Many coworkers and friends have come to me asking about Kubernetes (k8s) and "Docker" (i.e. containers). Usually, they have a cursory understanding (containers is put stuff in box, k8s is orchestrate boxes), but feel helpless with anything practical. Worse, they lack a starting point to even search the internet.

A lot of resources I see either dive straight into the command line, or wax poetic about how containers work. I love a good history lesson about BSD Jails, but I'm a huge dork. This aims to be something in between: enough theory to understand what's happening, but also a path to practical understanding.

First Light

In the beginning, there was FreeBSD jail... or something.

What the hell is a "Docker"?

In very broad strokes, containerization is an operating system level virtualization technology. Its purpose is to run software in isolated environments. Containers are lighter weight than a full virtual machine (happening more at the hardware level), but more isolated than something like Java (happening entirely in user space).

Docker is a specific implementation of this operating system level virtualization mechanism. A handful of other popular implementations exist, like Podman. Usually, we mean anything following the specifications from the Open Container Initiative (OCI). If you're curious about the details and history, check out these wikipedia articles!

For the remainder of this, we'll refer to this nebula of technologies in an implementation-agnostic way, ("containers", "images", etc). There are significant differences, but for most practical purposes they are identical.

Images and Containers

There are two distinct, but very intertwined pieces of containerization:

Image: Defines all of the pieces necessary to run a particular piece of software. Includes the base OS components, libraries, filesystem, and executable binaries. The rubber stamp.
Container: A running instance of an image. Takes all of the pieces of the image and hooks them into the host OS. Things like the compute hardware, network, files on physical disk are connected. The actual stamp on the paper.

These two terms often get used interchangably, but the distinction is important. This is especially true in k8s land when Pods make things more complicated.

Ephemerality

Containers are ephemeral. This is the single most important thing about containers that many new people get confused about. Everything that happens in a container lives and dies with the container. Something like creating or modifying files in the container does not persist once the container is destroyed. This is so fundamental to the concept that the canonical way to restart a something in k8s is to destroy the containers and recreate them. To persist changes, data has to leave the sandbox, either with a mounted volume or over the network. TODO: talk about these later

Ok, but what the hell is a "Docker"?

Let's look at how you build images:

Dockerfile

FROM rancher/cowsay:latest

COPY whale.cow .

CMD ["cowsay", "-f", "whale.cow", "One cannot step into the same river twice."]

This is a Dockerfile. The name stuck around from Docker, but it's universal across the ecosystem. It's a set of instructions for how to define an image. This one says to base off Rancher's cowsay image, copy in a cow, then run cowsay as the command. Images are self-contained, so everything needed by them must be copied in. Remember the ephemerality, these will be baked in the image, you can't just edit and persist the changes later.

See the Dockerfile reference for all the different things you can do.

Images

Images are identified with either a <name>:<tag> syntax or a SHA256 hash. The former is most common, but the latter is always a unique identifier (as unique as SHA256 can get anyway).

Containers in the Kennel

Stop teasing me about k8s and just tell me how it works!!!

Doggy Day Care

So now you want to "deploy" the containers you have. What the hell is a "deploy"? We're going to skip over a few technologies here.

The typical pipeline goes like this:

With just Docker, you're stuck with writing out this giant command with like, 15 arguments, to run one container. For your app, you have a frontend, backend, database, and ingress, so that's 4 containers and 60 command line arguments. You write a shell script: deploy-app.sh. Before you know it, this script is 200 lines long and explodes if anyone else but you tries to run it. You get clever and learn docker-compose. You write a nice compose.yml and commit it to git, brimming with confidence. Then, your boss tells you to run the app in the on-prem k8s cluster. "The hell is a cube cuddle?" you ask. ChatGPT sighs, feeding you some hallucination about shiba inus.

What the hell is a cube cuddle?

Kubernetes is, similar to Docker, a particular implementation of a broader technology: container orchestration. In the simplest terms, it runs a bunch of containers together.