Loading content...
A container image is to a container what a class is to an object in object-oriented programming: it's the template from which running instances are created. But unlike simple templates, container images are sophisticated artifacts that carry an entire filesystem, metadata, configuration, and the complete execution environment for an application.
Container images are the most important artifact in modern software delivery. They represent the single source of truth for what gets deployed—the immutable, versioned package that travels unchanged from a developer's laptop through CI/CD pipelines to production clusters across the globe.
Understanding images deeply—their structure, layering mechanism, content addressing, and lifecycle—is essential for anyone architecting containerized systems.
By the end of this page, you will understand the internal structure of container images, how content-addressable storage enables efficient distribution, image layering mechanics, tagging and versioning strategies, base image selection criteria, and best practices for creating secure, minimal, and reproducible images.
A container image is not a single file—it's a collection of filesystem layers plus metadata. Understanding this structure explains why images are efficient to store, transfer, and modify.
Image Components:
| Component | Description | Example |
|---|---|---|
| Filesystem Layers | Tar archives containing files/directories added at each build step | Base OS files, Python packages, application code |
| Image Configuration | JSON metadata describing how to run the container | Environment variables, entrypoint, exposed ports |
| Image Manifest | JSON listing all layers with their digests | Links layers to form the complete image |
| Layer Digests | SHA256 hashes uniquely identifying each layer | sha256:a3ed95cae... |
12345678910111213141516171819202122232425262728
Container Image: myapp:1.0├── manifest.json # Lists all layers and config│ {│ "config": { "digest": "sha256:..." },│ "layers": [│ { "digest": "sha256:abc...", "size": 28558848 },│ { "digest": "sha256:def...", "size": 16458752 },│ { "digest": "sha256:ghi...", "size": 4194304 }│ ]│ }│├── config.json # Runtime configuration│ {│ "architecture": "amd64",│ "os": "linux",│ "config": {│ "Env": ["PATH=/usr/local/bin:/usr/bin"],│ "Cmd": ["python", "app.py"],│ "WorkingDir": "/app",│ "ExposedPorts": { "8080/tcp": {} }│ },│ "history": [...] # Build history│ }│└── layers/ # Filesystem layers (tar.gz) ├── sha256:abc.../layer.tar.gz # Base OS (~28 MB) ├── sha256:def.../layer.tar.gz # Python packages (~16 MB) └── sha256:ghi.../layer.tar.gz # Application code (~4 MB)Content-Addressable Storage:
Every layer and configuration in a container image is identified by a cryptographic hash (SHA256 digest) of its contents. This is content-addressable storage, and it has profound implications:
Tags like 'nginx:1.25' are mutable pointers—they can be moved to point to different image digests. Digests like 'nginx@sha256:abc123...' are immutable references. For production deployments where reproducibility is critical, always reference images by digest, not just tag.
Container image layering is more nuanced than simple stacking. Understanding the mechanics helps you optimize images and debug unexpected behaviors.
The Union Filesystem:
Container runtimes use a union filesystem (OverlayFS, AUFS, or similar) to merge multiple layers into a single coherent view. This is copy-on-write (CoW) at the filesystem level:
1234567891011121314151617181920212223242526272829303132333435
Union Filesystem (OverlayFS on Linux)====================================== ┌─────────────────────────────────────────────────────────────┐│ Merged View (what container sees) ││ /app/main.py /app/config.py /etc/passwd /bin/python │└─────────────────────────────────────────────────────────────┘ ▲ │ Merge┌─────────────────────────────────────────────────────────────┐│ Upper Layer (container writable layer) ││ /app/config.py (modified) ││ /app/.logs/ (new directory) ││ /etc/hostname (whiteout - marks deletion) │└─────────────────────────────────────────────────────────────┘ │┌─────────────────────────────────────────────────────────────┐│ Lower Layer 3: Application code ││ /app/main.py /app/config.py (original) │└─────────────────────────────────────────────────────────────┘ │┌─────────────────────────────────────────────────────────────┐│ Lower Layer 2: Python runtime ││ /usr/local/bin/python /usr/local/lib/python3.11/ │└─────────────────────────────────────────────────────────────┘ │┌─────────────────────────────────────────────────────────────┐│ Base Layer: Alpine Linux ││ /bin/sh /etc/passwd /lib/apk/ │└─────────────────────────────────────────────────────────────┘ File Resolution:Read /app/config.py → Found in Upper Layer → Return modified versionRead /app/main.py → Not in Upper → Check Layer 3 → Found → ReturnRead /bin/python → Not in Upper/3 → Check Layer 2 → Found → ReturnWhiteout Files and Deletions:
A critical concept often misunderstood: deleting a file doesn't reduce image size. When you delete a file in a later layer, the union filesystem creates a "whiteout" marker that hides the file, but the original still exists in the earlier layer.
This has important implications:
12345678910111213
# BAD: File still exists in Layer 1, total size = 200 MBFROM alpineRUN wget http://example.com/large-100mb-file.tar.gz # Layer 1: +100 MBRUN tar -xzf large-100mb-file.tar.gz # Layer 2: +100 MB (extracted)RUN rm large-100mb-file.tar.gz # Layer 3: +4 KB (whiteout only)# Total: 200 MB (original archive still in Layer 1!) # GOOD: Download, extract, and cleanup in single layerFROM alpineRUN wget http://example.com/large-100mb-file.tar.gz && \ tar -xzf large-100mb-file.tar.gz && \ rm large-100mb-file.tar.gz# Total: ~100 MB (only extracted files remain)Never use a separate RUN command to remove files created in a previous layer. Always combine download/install/cleanup in a single RUN command. This is the most common cause of unexpectedly large images. Use 'docker history' to see the size contribution of each layer.
Layer sharing across images:
When multiple images share base layers, Docker stores them only once:
myapp-api:latest ─┬─ [Application layer: 10 MB]
│
└─ [Node.js runtime: 200 MB] ← Shared
│
myapp-web:latest ─┬─ [Application layer: 15 MB]
│
└─ [Node.js runtime: 200 MB] ← Same layer!
Total disk usage: 200 MB + 10 MB + 15 MB = 225 MB
(Not 200 + 10 + 200 + 15 = 425 MB)
This is why standardizing on base images across your organization dramatically reduces storage and network costs.
Image tags are critical for deployment automation, rollback procedures, and system auditability. A well-designed tagging strategy prevents deployment confusion and enables reliable operations.
Common Tagging Patterns:
| Pattern | Example | Use Case | Pros/Cons |
|---|---|---|---|
| Semantic Version | myapp:1.2.3 | Releases | Clear versioning; can be mutable |
| Git SHA | myapp:a3f8d2c | CI/CD pipelines | Immutable; traceable to commit; hard to read |
| Git SHA + Version | myapp:1.2.3-a3f8d2c | Production | Best of both; clearly versioned and traceable |
| Date/Timestamp | myapp:2024-01-15 | Nightly builds | Time-ordered; hard to correlate to code |
| Branch | myapp:main | Development | Always latest; unsuitable for production |
| Environment | myapp:prod | Deployment shortcuts | Dangerous; tags get overwritten |
| latest | myapp:latest | Quick development | Ambiguous; never use in production |
Recommended Production Strategy:
The most robust approach combines semantic versioning with git commit SHA, plus maintaining 'rolling' tags for convenience:
12345678910111213141516171819202122232425
#!/bin/bash# In CI/CD pipeline after successful build and tests # Get version from package.json, Cargo.toml, etc.VERSION=$(npm pkg get version | tr -d '"') # e.g., "2.3.1"GIT_SHA=$(git rev-parse --short HEAD) # e.g., "a3f8d2c"BRANCH=$(git branch --show-current) # e.g., "main" # Full image referenceIMAGE="myregistry.com/myapp" # Tag with multiple tags for flexibilitydocker tag myapp:build "$IMAGE:$VERSION" # myapp:2.3.1docker tag myapp:build "$IMAGE:$VERSION-$GIT_SHA" # myapp:2.3.1-a3f8d2cdocker tag myapp:build "$IMAGE:$GIT_SHA" # myapp:a3f8d2c # Optionally, for main branch, update 'latest' and major version tagsif [ "$BRANCH" = "main" ]; then docker tag myapp:build "$IMAGE:latest" docker tag myapp:build "$IMAGE:2" # Major version docker tag myapp:build "$IMAGE:2.3" # Minor versionfi # Push all tagsdocker push "$IMAGE" --all-tagsTags are mutable by default—pushing 'myapp:1.0' twice overwrites the first image. This can cause production to run different code than expected if tags are reused. Either enforce immutable tags in your registry or always deploy with digests.
Your base image choice affects image size, security posture, compatibility, and operational characteristics. This decision ripples through your entire container infrastructure.
Common Base Image Options:
| Base Image | Size | Package Manager | Best For |
|---|---|---|---|
| scratch | 0 MB | None | Static binaries (Go, Rust) |
| alpine:3.18 | ~5 MB | apk | Minimal containers, most cases |
| distroless | ~20 MB | None | Security-focused deployments |
| debian:bookworm-slim | ~75 MB | apt | Compatibility with glibc apps |
| ubuntu:22.04 | ~78 MB | apt | Familiarity, broad package support |
| python:3.11-slim | ~120 MB | apt + pip | Python applications |
| node:20-slim | ~180 MB | apt + npm | Node.js applications |
Deep Dive: Alpine Linux
Alpine is the most popular minimal base image, but it has important trade-offs:
Deep Dive: Distroless Images
Google's distroless images contain only your application and its runtime dependencies—no shell, no package manager, no unnecessary utilities. This is the gold standard for production security:
1234567891011121314
# Multi-stage build with distroless runtimeFROM golang:1.21 AS builderWORKDIR /appCOPY . .RUN CGO_ENABLED=0 go build -o /app/main # Distroless has NO shell, NO package managerFROM gcr.io/distroless/static-debian12:nonrootCOPY --from=builder /app/main /USER nonroot:nonrootENTRYPOINT ["/main"] # For debugging (has shell but defeats some security benefits)# FROM gcr.io/distroless/static-debian12:debugStart with the smallest image that works for your application. For Go/Rust: use 'scratch' or 'distroless'. For Python/Node: use slim variants and test carefully with Alpine. If you encounter compatibility issues with Alpine, debian-slim is the reliable fallback. Document why you chose a particular base image.
Container image security is a critical concern—images often contain vulnerabilities, exposed secrets, or unnecessary attack surface. Building secure images is a skill that separates amateur containerization from professional practice.
The Security Mindset:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
# SECURITY-FOCUSED DOCKERFILE # Pin exact base image versionFROM python:3.11.7-slim-bookworm@sha256:abc123... # Set labels for image provenanceLABEL org.opencontainers.image.source="https://github.com/org/repo" \ org.opencontainers.image.version="1.2.3" \ org.opencontainers.image.vendor="MyCompany" # Don't run as rootARG UID=1000ARG GID=1000RUN groupadd --gid $GID appgroup && \ useradd --uid $UID --gid $GID --shell /sbin/nologin appuser WORKDIR /app # Pin dependency versions in requirements.txt (or use poetry.lock)COPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt && \ # Remove pip cache and unnecessary files rm -rf /root/.cache /var/cache/apt /var/lib/apt/lists/* # Copy only necessary filesCOPY --chown=appuser:appgroup src/ ./src/ # Switch to non-root userUSER appuser # Minimal environmentENV PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 # No secrets here! Inject at runtime# ENV DATABASE_URL= # WRONG! EXPOSE 8080 # Health check without external dependenciesHEALTHCHECK --interval=30s --timeout=5s \ CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')" CMD ["python", "-m", "gunicorn", "-b", "0.0.0.0:8080", "app:app"]Vulnerability Scanning:
Integrate image scanning into your CI/CD pipeline. Popular tools include:
| Tool | Type | Integration |
|---|---|---|
| Trivy | Open source, comprehensive | CLI, GitHub Actions, GitLab CI |
| Grype | Open source by Anchore | CLI, GitHub Actions |
| Snyk | Commercial with free tier | CLI, IDE, CI/CD, registry |
| Clair | Open source by CoreOS | API-based, registry integration |
| AWS ECR Scanning | Built into ECR | Automatic on push |
| Docker Scout | Docker's scanning solution | Docker Desktop, Docker Hub |
A scan that finds vulnerabilities but triggers no action is security theater. Establish policies: HIGH vulnerabilities block deployment, MEDIUM must be fixed within 30 days, LOW must be tracked. Rebuild images when base images are patched.
Reproducible builds ensure that building the same source code produces bit-for-bit identical images. This is harder than it sounds and crucial for security, debugging, and compliance.
Why Reproducibility Matters:
Achieving Reproducibility:
1234567891011121314151617181920212223
# Pin base image by DIGEST (not just tag)FROM python:3.11.7-slim-bookworm@sha256:a1b2c3d4e5f6... # Pin package versions explicitlyRUN apt-get update && \ apt-get install -y --no-install-recommends \ libpq5=15.4-0+deb12u1 \ ca-certificates=20230311 && \ rm -rf /var/lib/apt/lists/* # Use lock files for application dependenciesCOPY requirements.lock .RUN pip install --no-cache-dir -r requirements.lock # If using npm, copy package-lock.json# COPY package.json package-lock.json ./# RUN npm ci # Uses locked versions # Set SOURCE_DATE_EPOCH for reproducible timestampsARG SOURCE_DATE_EPOCHENV SOURCE_DATE_EPOCH=${SOURCE_DATE_EPOCH:-0} COPY . .Software Bill of Materials (SBOM):
An SBOM documents all components in your image—packages, libraries, and their versions. SBOMs are increasingly required for compliance and enable automated vulnerability tracking.
1234567891011
# Generate SBOM with Syftsyft myapp:latest -o spdx-json > sbom.spdx.json # Generate SBOM with Trivytrivy image --format spdx-json myapp:latest > sbom.spdx.json # Attach SBOM to image (using cosign)cosign attach sbom --sbom sbom.spdx.json myregistry.com/myapp:1.0 # Scan SBOM for vulnerabilitiesgrype sbom:sbom.spdx.jsonFor high-security environments, sign your images cryptographically using tools like Cosign (part of Sigstore). Image signatures prove that the image came from a trusted source and hasn't been tampered with. Kubernetes can be configured to only run signed images.
Smaller images mean faster pulls, faster deployments, lower storage costs, and better security. Here are advanced techniques for minimizing image size beyond basic multi-stage builds.
Size Optimization Strategies:
12345678910111213141516171819202122232425262728
# Build stageFROM golang:1.21-alpine AS builder RUN apk add --no-cache upx WORKDIR /appCOPY go.mod go.sum ./RUN go mod download COPY . . # Build with optimizationsRUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \ go build -ldflags="-s -w" -o main . && \ upx --best main # Runtime stage - FROM SCRATCH (nothing!)FROM scratch # Copy CA certificates for HTTPS (if needed)COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ # Copy the compressed binaryCOPY --from=builder /app/main / ENTRYPOINT ["/main"] # Final size: Often < 10 MB for a full web applicationAnalyzing Image Size:
Use tools to understand what's contributing to image size:
1234567891011
# View layer sizesdocker history myapp:latest # Detailed layer analysisdocker history --no-trunc --format "{{.Size}}\t{{.CreatedBy}}" myapp:latest # Interactive layer explorer (highly recommended)dive myapp:latest # Show total image sizedocker images myapp:latest --format "{{.Size}}"The 'dive' tool (github.com/wagoodman/dive) provides an interactive terminal UI that lets you explore each layer, see exactly which files were added/modified/deleted, and calculate wasted space. It's invaluable for optimizing Dockerfiles.
Let's consolidate our understanding of container images:
What's next:
With a deep understanding of container images, we'll explore Container Registries—the infrastructure that stores, distributes, and secures your images. You'll learn about public and private registries, authentication, access control, and registry operations at scale.
You now have comprehensive knowledge of container images—from their internal structure and layering mechanics to security practices and optimization techniques. This understanding is essential for building production-grade containerized systems.