Loading learning content...
Container images are the foundational artifacts of container technology. They capture everything needed to run an application—code, dependencies, configuration, and filesystem—in an immutable, portable format. An image built once runs identically everywhere, solving the fundamental challenge of software distribution.
But images are more than just "zip files of applications." They feature sophisticated layer mechanics that enable efficient storage and distribution, content-addressable storage that guarantees integrity, and standardized formats that ensure interoperability across the ecosystem.
Understanding how images work—from Dockerfile instructions to registry distribution—transforms you from an image consumer to an image architect. You'll build smaller, faster, more secure images and troubleshoot issues that mystify developers who treat images as black boxes.
By the end of this page, you will understand container image structure and the OCI image specification, how Dockerfiles create layers, best practices for building optimized and secure images, how content-addressable storage works, and how images are distributed through registries.
A container image consists of two main components: layers (the filesystem content) and metadata (configuration and layer references). Understanding this structure is essential for building and troubleshooting images.
Layers:
Each layer represents a set of filesystem changes—files added, modified, or deleted. Layers are:
CONTAINER IMAGE STRUCTURE========================= ┌─────────────────────────────────────────────────────────────────────────┐│ IMAGE MANIFEST ││ (JSON document that describes the image) ││ ││ { ││ "schemaVersion": 2, ││ "mediaType": "application/vnd.oci.image.manifest.v1+json", ││ "config": { ││ "mediaType": "application/vnd.oci.image.config.v1+json", ││ "digest": "sha256:abc123...", ◄── Points to config blob ││ "size": 7023 ││ }, ││ "layers": [ ◄── Ordered list of layers ││ { "digest": "sha256:111...", "size": 32543210 }, ◄── Layer 1 ││ { "digest": "sha256:222...", "size": 1543210 }, ◄── Layer 2 ││ { "digest": "sha256:333...", "size": 432100 } ◄── Layer 3 ││ ] ││ } │└─────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ▼ ▼┌─────────────────────────────────────┐ ┌──────────────────────────────────┐│ CONFIG BLOB │ │ LAYER BLOBS ││ (JSON: runtime configuration) │ │ (tar.gz: filesystem changes) ││ │ │ ││ { │ │ Layer 1: Base OS ││ "architecture": "amd64", │ │ ┌────────────────────────────────││ "os": "linux", │ │ │ /bin/sh, /lib/*, /etc/* ││ "config": { │ │ │ (Debian/Alpine/Ubuntu base) ││ "Env": ["PATH=/usr/local/..."], │ │ └────────────────────────────────││ "Cmd": ["/bin/sh"], │ │ ││ "WorkingDir": "/app", │ │ Layer 2: Runtime ││ "ExposedPorts": {"80/tcp":{}}, │ │ ┌────────────────────────────────││ "Labels": {...} │ │ │ /usr/local/bin/python ││ }, │ │ │ /usr/local/lib/python3.11/ ││ "rootfs": { │ │ └────────────────────────────────││ "type": "layers", │ │ ││ "diff_ids": [ │ │ Layer 3: Application ││ "sha256:aaa...", │ │ ┌────────────────────────────────││ "sha256:bbb...", │ │ │ /app/main.py ││ "sha256:ccc..." │ │ │ /app/requirements.txt ││ ] │ │ └────────────────────────────────││ }, │ │ ││ "history": [...] │ └──────────────────────────────────┘│ } │└─────────────────────────────────────┘ UNIFIED VIEW (what container sees)┌─────────────────────────────────────────────────────────────────────────┐│ / ││ ├── bin/ (from Layer 1) ││ ├── lib/ (from Layer 1) ││ ├── etc/ (from Layer 1, maybe modified by Layer 2) ││ ├── usr/ ││ │ └── local/ ││ │ ├── bin/python (from Layer 2) ││ │ └── lib/python3.11/ (from Layer 2) ││ └── app/ ││ ├── main.py (from Layer 3) ││ └── requirements.txt (from Layer 3) │└─────────────────────────────────────────────────────────────────────────┘Every blob (layer or config) is identified by its SHA256 digest. This means identical content always has the same identifier, enabling deduplication across images. If two images share a base layer, it's stored only once. This also provides integrity verification—if content is corrupted, the digest won't match.
A Dockerfile is a script that defines how to build an image. Each instruction in a Dockerfile creates a new layer (with some exceptions). Understanding which instructions create layers is crucial for optimizing image size and build performance.
Layer-Creating Instructions:
| Instruction | Creates Layer? | Purpose | Example |
|---|---|---|---|
FROM | Yes (uses existing) | Base image | FROM python:3.11-slim |
RUN | Yes | Execute commands | RUN apt-get update && apt-get install -y curl |
COPY | Yes | Copy local files | COPY ./app /app |
ADD | Yes | Copy + extract archives | ADD archive.tar.gz /app |
ENV | No (metadata) | Set environment variable | ENV NODE_ENV=production |
EXPOSE | No (metadata) | Document port | EXPOSE 8080 |
CMD | No (metadata) | Default command | CMD ["python", "app.py"] |
ENTRYPOINT | No (metadata) | Main executable | ENTRYPOINT ["docker-entrypoint.sh"] |
WORKDIR | No (metadata) | Working directory | WORKDIR /app |
USER | No (metadata) | Runtime user | USER appuser |
LABEL | No (metadata) | Image metadata | LABEL version="1.0" |
ARG | No (build-time) | Build argument | ARG VERSION=latest |
1234567891011121314151617181920212223242526272829303132333435
# Each instruction is shown with its layer impact # Layer 1: Pull base image (reused from registry)FROM python:3.11-slim # No layer: just metadataENV PYTHONDONTWRITEBYTECODE=1ENV PYTHONUNBUFFERED=1 # No layer: just metadataWORKDIR /app # Layer 2: Install system dependencies# Combine commands with && to minimize layersRUN apt-get update && \ apt-get install -y --no-install-recommends \ gcc \ libpq-dev \ && rm -rf /var/lib/apt/lists/* # Clean up in same layer! # Layer 3: Install Python dependencies# Copy requirements first for better cachingCOPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt # Layer 4: Copy application code# This changes frequently, so it's a separate layerCOPY . . # No layer: just metadataEXPOSE 8000USER appuser # No layer: just metadataCMD ["gunicorn", "app:app", "-b", "0.0.0.0:8000"]Docker caches layers and reuses them if nothing has changed. But cache invalidation cascades—if Layer 2 changes, Layers 3 and 4 must be rebuilt even if their instructions are identical. That's why we copy requirements.txt before copying all code: if only code changes, pip install is cached.
Multi-stage builds are a powerful technique for creating small, production-optimized images. They allow you to use different base images for building and running your application, copying only the necessary artifacts to the final image.
The problem multi-stage solves:
Build-time dependencies often dwarf runtime requirements. A Go application might need:
Without multi-stage builds, your image includes the entire Go toolchain. With multi-stage, you build in one image and copy just the binary to a minimal runtime image.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
# Multi-stage build for a Go application# Each FROM starts a new stage; only final stage goes to image ############################################## STAGE 1: Build#############################################FROM golang:1.21-alpine AS builder # Build dependenciesRUN apk add --no-cache git ca-certificates WORKDIR /build # Copy go module files first (cache dependency downloads)COPY go.mod go.sum ./RUN go mod download # Copy source codeCOPY . . # Build binary# CGO_ENABLED=0 for static binary# -ldflags for smaller binaryRUN CGO_ENABLED=0 GOOS=linux go build \ -ldflags='-w -s -extldflags "-static"' \ -o /app/server ./cmd/server ############################################## STAGE 2: Runtime#############################################FROM scratch AS runtime# 'scratch' is an empty image - minimal possible size # Copy CA certificates for HTTPSCOPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ # Copy only the compiled binaryCOPY --from=builder /app/server /server # Non-root user (numeric since no passwd in scratch)USER 65534 EXPOSE 8080ENTRYPOINT ["/server"] # Result:# - Builder stage: ~900MB (Go toolchain + dependencies)# - Runtime stage: ~10MB (just the binary + certs)# - Savings: 99% size reduction!Common multi-stage patterns:
| Pattern | Build Stage | Runtime Stage | Use Case |
|---|---|---|---|
| Compiled Language | golang, rust | scratch, alpine | Go, Rust, C binaries |
| Node.js | node (npm install) | node:slim (run only) | React, Next.js apps |
| Python | python + build tools | python-slim | Apps with C extensions |
| Java | maven/gradle | openjdk:jre | Spring Boot apps |
| Testing | Full + test tools | Slim runtime | Run tests, deploy tested |
| CI/CD Assets | Full toolchain | nginx | Build static sites |
Use AS to name stages (AS builder, AS runtime). You can then use --target to build specific stages: 'docker build --target builder' builds only the builder stage, useful for debugging build issues or running tests in CI.
Optimized images are smaller, faster to build, faster to distribute, and more secure. Here are essential best practices organized by impact:
1. Choose Minimal Base Images:
| Base Image | Size | Use Case | Trade-offs |
|---|---|---|---|
scratch | 0 MB | Static binaries (Go, Rust) | No shell, no utilities |
alpine | ~5 MB | General purpose, small | musl libc (some compat issues) |
distroless | ~20 MB | Security-focused, minimal | No shell, harder to debug |
python:3.11-slim | ~150 MB | Python apps | Debian-based, glibc |
python:3.11 | ~1 GB | Full development | Includes compilers, tools |
ubuntu:22.04 | ~75 MB | General purpose | Familiar tools, larger |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
# ❌ BAD: Large, inefficient DockerfileFROM python:3.11 WORKDIR /appCOPY . .RUN pip install -r requirements.txtRUN apt-get updateRUN apt-get install -y curlRUN apt-get clean # Cleaning in separate layer doesn't help!CMD ["python", "app.py"] # Problems:# - Full python image (~1GB)# - Each RUN is a new layer# - apt-get clean in separate layer = files still in previous layer# - Copies everything including .git, __pycache__, etc.# - Poor cache utilization # ──────────────────────────────────────────────────────── # ✅ GOOD: Optimized DockerfileFROM python:3.11-slim AS base # Set environment variablesENV PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 \ PIP_NO_CACHE_DIR=1 \ PIP_DISABLE_PIP_VERSION_CHECK=1 FROM base AS builder WORKDIR /app # Install build dependenciesRUN apt-get update && \ apt-get install -y --no-install-recommends gcc libpq-dev && \ rm -rf /var/lib/apt/lists/* # Install Python dependencies into a virtual envRUN python -m venv /opt/venvENV PATH="/opt/venv/bin:$PATH" COPY requirements.txt .RUN pip install -r requirements.txt FROM base AS runtime # Copy virtual environment from builderCOPY --from=builder /opt/venv /opt/venvENV PATH="/opt/venv/bin:$PATH" # Create non-root userRUN useradd -m -r appuser && \ mkdir -p /app && chown appuser:appuser /appUSER appuserWORKDIR /app # Copy only application codeCOPY --chown=appuser:appuser ./src . EXPOSE 8000CMD ["gunicorn", "app:app", "-b", "0.0.0.0:8000"] # Improvements:# - Slim base image (~150MB vs ~1GB)# - Multi-stage eliminates build dependencies# - Combined RUN commands with cleanup# - Virtual env for clean dependency copy# - Non-root user for security# - Proper cache orderingImage security is critical—a vulnerability in your base image or dependencies becomes a vulnerability in every container running that image. Adopting security practices during image creation prevents costly remediation later.
Security Principles for Container Images:
1234567891011121314151617181920212223242526272829303132333435
# Security-focused Dockerfile example FROM python:3.11-slim-bookworm AS builder # Don't run as root during build eitherRUN groupadd -g 1001 appgroup && \ useradd -u 1001 -g appgroup appuser WORKDIR /app # Pin versions for reproducibilityCOPY requirements.txt .RUN pip install --no-cache-dir --user -r requirements.txt FROM gcr.io/distroless/python3-debian12 AS runtime # Distroless: no shell, no package manager, minimal CVEs # Copy from builder with correct ownershipCOPY --from=builder /home/appuser/.local /home/appuser/.localCOPY --chown=1001:1001 ./src /app # Use numeric UID (no passwd file in distroless)USER 1001 ENV PATH="/home/appuser/.local/bin:$PATH"WORKDIR /app # Expose (documentation only)EXPOSE 8000 # Fixed entrypoint (not shell form)ENTRYPOINT ["python", "app.py"] # No CMD default arguments (explicit > implicit)Scanning Images for Vulnerabilities:
1234567891011121314151617181920212223242526272829
# Scan image with Trivy (popular open-source scanner)$ trivy image myapp:latest myapp:latest (debian 12.0)===========================Total: 23 (UNKNOWN: 0, LOW: 15, MEDIUM: 6, HIGH: 2, CRITICAL: 0) ┌───────────────────┬────────────────┬──────────┬───────────────────────────────┐│ Library │ Vulnerability │ Severity │ Installed Version │├───────────────────┼────────────────┼──────────┼───────────────────────────────┤│ curl │ CVE-2023-38545 │ HIGH │ 7.88.1-10 ││ openssl │ CVE-2023-3446 │ MEDIUM │ 3.0.9-1 │└───────────────────┴────────────────┴──────────┴───────────────────────────────┘ # Scan during CI/CD build to prevent vulnerable images from deploying# Fail build if HIGH or CRITICAL vulnerabilities found:$ trivy image --exit-code 1 --severity HIGH,CRITICAL myapp:latest # Scan Dockerfile for misconfigurations$ trivy config ./Dockerfile Dockerfile (dockerfile)========================Tests: 23 (SUCCESSES: 20, FAILURES: 3)Failures: 3 MEDIUM: Specify version for base image─────────────────────────────────────Use specific image version instead of 'latest'Integrate vulnerability scanning into your CI/CD pipeline. Block deployments if critical vulnerabilities are found. Use registry-side scanning (Docker Hub, Harbor, ECR) to catch issues in stored images. Set up alerts for new CVEs affecting your images.
Container registries store and distribute images. Understanding registry operations—authentication, pushing, pulling, and tagging—is essential for working with containers in any environment.
Registry Concepts:
| Concept | Description | Example |
|---|---|---|
| Registry | Server that stores images | docker.io, gcr.io, ghcr.io |
| Repository | Collection of related images | library/nginx, mycompany/myapp |
| Tag | Version identifier within a repository | v1.0.0, latest, dev |
| Digest | Immutable content identifier | @sha256:abc123... |
| Manifest | JSON describing image layers | Retrieved on pull |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
# Image naming: [REGISTRY/][NAMESPACE/]NAME[:TAG|@DIGEST] # Examples:docker.io/library/nginx:latest # Docker Hub officialdocker.io/myuser/myapp:v1.0.0 # Docker Hub usergcr.io/my-project/backend:abcdef # Google Container Registryghcr.io/myorg/frontend:main # GitHub Container Registryregistry.example.com/api:2.1.0 # Private registry # ─────────────────────────────────────────────────────── # Authentication$ docker login # Docker Hub$ docker login ghcr.io # GitHub$ docker login gcr.io # GCR (use gcloud helper)$ cat ~/.docker/config.json # Credentials stored here # ─────────────────────────────────────────────────────── # Push workflow# 1. Build image$ docker build -t myapp:v1.0.0 . # 2. Tag for target registry$ docker tag myapp:v1.0.0 ghcr.io/myorg/myapp:v1.0.0$ docker tag myapp:v1.0.0 ghcr.io/myorg/myapp:latest # 3. Push to registry$ docker push ghcr.io/myorg/myapp:v1.0.0The push refers to repository [ghcr.io/myorg/myapp]5f70bf18a086: Pushed # Only uploads layers registry doesn't have2a15ad3e3c6b: Layer already existsv1.0.0: digest: sha256:abc123... size: 1156 # ─────────────────────────────────────────────────────── # Pull workflow$ docker pull nginx:1.25.31.25.3: Pulling from library/nginx8a1e25ce7c4f: Already exists # Layer already in local cachea9c9c5e96c3c: Downloading [==> ] 1.2MB/15MB...Digest: sha256:def456...Status: Downloaded newer image for nginx:1.25.3 # Pull by digest (immutable, guaranteed exact image)$ docker pull nginx@sha256:def456... # ─────────────────────────────────────────────────────── # Inspect remote image without pulling$ docker manifest inspect nginx:latest{ "schemaVersion": 2, "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json", "manifests": [ { "mediaType": "...", "size": 1156, "digest": "sha256:abc...", "platform": {"architecture": "amd64", "os": "linux"} }, { "platform": {"architecture": "arm64", "os": "linux"} } ]}:latest is just a convention, not a guarantee of the newest version. It's mutable—the image it points to changes. For production, always use specific version tags or digests. docker pull myapp:latest today might give a completely different image than tomorrow.
Multi-Architecture Images:
Modern registries support multi-architecture images (manifest lists). A single tag can reference different images for amd64, arm64, etc. Docker automatically pulls the right architecture:
123456789101112131415
# Build and push multi-architecture image with buildx$ docker buildx create --use $ docker buildx build \ --platform linux/amd64,linux/arm64 \ --tag ghcr.io/myorg/myapp:v1.0.0 \ --push . # This creates:# - Image for amd64 (Intel/AMD servers)# - Image for arm64 (AWS Graviton, Apple Silicon)# - Manifest list pointing to both # When users pull, they automatically get the right architecture:$ docker pull ghcr.io/myorg/myapp:v1.0.0 # Gets correct archUnderstanding how to inspect images helps with debugging, size optimization, and security audits. Docker provides several tools for exploring image internals.
Essential Inspection Commands:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
# View image history (shows Dockerfile commands and layer sizes)$ docker history nginx:latestIMAGE CREATED CREATED BY SIZEa6bd71f48f68 2 weeks ago /bin/sh -c #(nop) CMD ["nginx" "-g" "daemon… 0B<missing> 2 weeks ago /bin/sh -c #(nop) STOPSIGNAL SIGQUIT 0B<missing> 2 weeks ago /bin/sh -c #(nop) EXPOSE 80 0B<missing> 2 weeks ago /bin/sh -c #(nop) ENTRYPOINT ["/docker-entr… 0B<missing> 2 weeks ago /bin/sh -c set -x && apt-get update && apt… 88.5MB<missing> 2 weeks ago /bin/sh -c #(nop) ENV NGINX_VERSION=1.25.3 0B... # Detailed image metadata (JSON format)$ docker inspect nginx:latest | jq '.[0].Config'{ "Hostname": "", "Env": ["PATH=...", "NGINX_VERSION=1.25.3"], "Cmd": ["nginx", "-g", "daemon off;"], "ExposedPorts": {"80/tcp": {}}, "Labels": {...}} # View image layers and sizes$ docker inspect nginx:latest | jq '.[0].RootFS.Layers'[ "sha256:aad...", # Layer 1 "sha256:bbd...", # Layer 2 ...] # ─────────────────────────────────────────────────────── # Dive: Interactive layer exploration (third-party tool)# Install: https://github.com/wagoodman/dive$ dive nginx:latest # Shows:# - Each layer's contents# - What was added/modified/deleted# - Wasted space (files deleted in later layers)# - Image efficiency score # Example output:# Layer 1: Base Debian (+78 MB)# Layer 2: apt-get install nginx (+88 MB)# Added: /usr/sbin/nginx, /etc/nginx/*, ...# Layer 3: Configuration (+1.2 KB)# Modified: /etc/nginx/nginx.conf # ─────────────────────────────────────────────────────── # Export image filesystem for analysis$ docker save nginx:latest | tar -xf - -C /tmp/nginx-image$ ls /tmp/nginx-imageblobs/ index.json manifest.json oci-layout # Or export a container's filesystem$ docker export $(docker create nginx:latest) | tar -tf - | head -20bin/boot/dev/...Use 'dive' to identify layer bloat. Common issues: deleted files still in previous layers, package caches not cleaned, unnecessary build dependencies included. Fix these by combining RUN commands and cleaning in the same layer, or using multi-stage builds.
We've explored container images in depth—from their structure and creation to optimization and distribution. Let's consolidate the key concepts:
What's next:
Now that we understand container images and how to build them effectively, the final page explores container orchestration—how systems like Kubernetes manage containers at scale, handling scheduling, scaling, networking, and self-healing across clusters of machines.
You now understand the complete lifecycle of container images—from Dockerfile to registry to running container. This knowledge is foundational for building production-grade containerized applications with optimal size, performance, and security characteristics.