System Design HLDKubernetes Architecture

Understanding Kubernetes Architecture

LevelIntermediate

Duration120 mins

TopicKubernetes Architecture

5 / 5

Kubernetes Ecosystem: Tools, Extensions, and Best Practices

Beyond Core Kubernetes: A Platform Universe

Kubernetes itself is a powerful container orchestration platform, but its true strength lies in the ecosystem built around it. The Cloud Native Computing Foundation (CNCF) hosts over 150 projects, and countless more exist in the wider community—all designed to extend Kubernetes' capabilities.

From package management (Helm) to GitOps (ArgoCD, Flux) to observability (Prometheus, Grafana) to service mesh (Istio, Linkerd), these tools transform Kubernetes from an orchestrator into a complete cloud-native platform. Understanding this ecosystem helps you choose the right tools and avoid reinventing wheels.

What You Will Learn

By the end of this page, you will have a comprehensive map of the Kubernetes ecosystem. You'll understand the major tool categories, when to use each, and how they fit together to build production-grade platforms. This knowledge enables you to make informed decisions when designing Kubernetes-based infrastructure.

The CNCF Landscape: Navigating the Cloud Native World

The Cloud Native Computing Foundation (CNCF) is the home of Kubernetes and dozens of related projects. Understanding the CNCF landscape helps you discover established, community-vetted tools.

CNCF Project Maturity Levels:

CNCF Project Maturity Stages
Stage	Meaning	Examples
Sandbox	Early-stage, experimental projects	OpenFunction, Karmada, KubeVirt
Incubating	Growing adoption, not yet battle-tested	Argo, Kyverno, Crossplane
Graduated	Production-ready, widely adopted	Kubernetes, Prometheus, Envoy, containerd, Helm

Key CNCF Graduated Projects (Production-Ready):

Category	Project	Purpose
Orchestration	Kubernetes	Container orchestration
Runtime	containerd	Container runtime
Runtime	CRI-O	Kubernetes container runtime
Observability	Prometheus	Metrics and alerting
Observability	Jaeger	Distributed tracing
Observability	Fluentd	Log aggregation
Service Proxy	Envoy	L7 proxy and service mesh data plane
Service Mesh	Linkerd	Lightweight service mesh
Package Management	Helm	Kubernetes package manager
Security	OPA (Gatekeeper)	Policy-as-code
Storage	Rook	Cloud-native storage
Networking	CoreDNS	Cluster DNS
CI/CD	Flux	GitOps toolkit
CI/CD	Argo	Workflows, GitOps, events

Explore the CNCF Landscape

Visit landscape.cncf.io for an interactive map of the entire cloud-native ecosystem. It's overwhelming at first, but invaluable for discovering tools. Filter by category and maturity level to find solutions for specific needs.

Helm: The Kubernetes Package Manager

Helm is the package manager for Kubernetes. It packages applications as Charts—templated, versioned bundles of Kubernetes manifests.

Why Helm?

Reusability: Install complex applications with a single command
Templating: Parameterize manifests for different environments
Versioning: Track application versions, upgrade, rollback
Ecosystem: Thousands of pre-built charts for popular software

Core Concepts:

Helm Terminology
Term	Definition
Chart	Package of templated Kubernetes manifests
Release	Instance of a chart running in the cluster
Repository	Collection of charts (like a package registry)
Values	Configuration parameters for a chart

helm-usage.txt
# Add a chart repository
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
 
# Search for charts
helm search repo nginx
 
# Install a chart (creates a release)
helm install my-nginx bitnami/nginx --namespace web --create-namespace
 
# Install with custom values
helm install my-nginx bitnami/nginx -f my-values.yaml
 
# Upgrade a release
helm upgrade my-nginx bitnami/nginx --set replicaCount=3
 
# Rollback to previous version
helm rollback my-nginx 1
 
# List releases
helm list -A
 
# Uninstall a release
helm uninstall my-nginx -n web
 
# Template locally (see generated manifests without installing)
helm template my-nginx bitnami/nginx -f my-values.yaml

Creating Your Own Charts:

Helm charts follow a standard structure:

mychart/
├── Chart.yaml          # Chart metadata (name, version)
├── values.yaml         # Default configuration values
├── templates/          # Kubernetes manifest templates
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── _helpers.tpl    # Template helper functions
│   └── NOTES.txt       # Post-install instructions
└── charts/             # Dependency charts

Template Syntax:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-app
spec:
  replicas: {{ .Values.replicaCount }}
  template:
    spec:
      containers:
        - name: app
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"

Helm vs. Kustomize

Helm uses Go templates for parameterization; Kustomize uses patches for customization. Helm is better for distributing reusable packages; Kustomize is better for customizing existing manifests. Many teams use both—Helm for third-party apps, Kustomize for in-house apps.

GitOps: Git as the Source of Truth

GitOps is an operational framework where:

Git repositories are the source of truth for infrastructure and application configuration
Changes are made through pull requests, enabling review and audit
Automated processes continuously sync Git state to cluster state

GitOps Benefits:

Audit trail: All changes tracked in Git history
Reproducibility: Any cluster state can be recreated from Git
Collaboration: Standard PR workflow for infra changes
Self-healing: Drift from Git state is automatically corrected

Major GitOps Tools:

ArgoCD

•Pull-based GitOps: Controller pulls and syncs
•Rich UI: Web UI for visualization and management
•App-of-Apps: Manage multiple apps declaratively
•Sync Waves: Control deployment order
•Multi-cluster: Manage many clusters from one ArgoCD

Flux

•Toolkit approach: Modular, composable components
•Source Controller: Git, Helm, OCI sources
•Kustomize Controller: Native Kustomize support
•Image Automation: Auto-update manifests on new images
•Lightweight: Less resource overhead

argocd-application.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# ArgoCD Application resource
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: web-app
  namespace: argocd
spec:
  project: default
  
  source:
    repoURL: https://github.com/myorg/k8s-manifests.git
    targetRevision: main
    path: apps/web-app/overlays/production
  
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  
  syncPolicy:
    automated:
      prune: true        # Delete resources not in Git
      selfHeal: true     # Revert manual changes
    syncOptions:
      - CreateNamespace=true

GitOps Repository Structure

Keep application code and Kubernetes manifests in separate repositories. This separates concerns and allows different CI/CD pipelines. The app repo builds images; the GitOps repo deploys them. Update the GitOps repo when new images are ready.

Observability: Metrics, Logs, and Traces

Observability is essential for operating Kubernetes at scale. The three pillars—metrics, logs, and traces—provide different views into system behavior.

The Observability Stack:

Observability Components
Pillar	Purpose	Popular Tools
Metrics	Numeric measurements over time	Prometheus, Grafana, Datadog, VictoriaMetrics
Logs	Text records of events	Loki, Elasticsearch, Fluentd, Fluent Bit
Traces	Request path through services	Jaeger, Zipkin, Tempo, OpenTelemetry

Prometheus + Grafana (The Standard Stack):

Prometheus:

Pull-based metrics collection (scrapes /metrics endpoints)
Time-series database optimized for metrics
Powerful query language (PromQL)
Alerting rules with Alertmanager integration

Grafana:

Visualization and dashboarding
Multi-source support (Prometheus, Loki, Jaeger, etc.)
Pre-built dashboards for Kubernetes
Alerting with various notification channels

prometheus-servicemonitor.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# ServiceMonitor tells Prometheus what to scrape
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: web-app
  namespace: monitoring
  labels:
    release: prometheus  # Match Prometheus operator selector
spec:
  selector:
    matchLabels:
      app: web-app
  namespaceSelector:
    matchNames:
      - production
  endpoints:
    - port: metrics      # Port name from Service
      interval: 30s
      path: /metrics
 
---
# Example PromQL queries:
# CPU usage by pod
# rate(container_cpu_usage_seconds_total{namespace="production"}[5m])
 
# Memory usage percentage
# container_memory_usage_bytes / container_memory_limit_bytes * 100
 
# Request rate
# sum(rate(http_requests_total[5m])) by (service)
 
# Error rate
# sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

OpenTelemetry (The Future):

OpenTelemetry is becoming the standard for collecting all three pillars:

Unified SDK: One library for metrics, logs, and traces
Vendor-neutral: Export to any backend
Auto-instrumentation: Automatic tracing for common frameworks
Collector: Receives, processes, and exports telemetry data

Start with Prometheus Operator

The Prometheus Operator (kube-prometheus-stack Helm chart) provides a complete observability setup: Prometheus, Grafana, Alertmanager, and pre-configured Kubernetes dashboards. It's the fastest path to production observability.

Service Mesh: Advanced Networking and Security

A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It moves networking concerns (load balancing, retries, circuit breaking, mTLS) out of application code and into the infrastructure.

How Service Mesh Works:

Sidecar Proxy: Each Pod gets a proxy container (usually Envoy)
Traffic Interception: All traffic goes through the proxy
Control Plane: Configures proxies with routing rules, policies
Observability: Proxies report metrics, traces automatically

service-mesh-architecture.txt
Service Mesh Architecture:
 
┌─────────────────────────────────────────────────────────────────┐
│                       Control Plane                              │
│    (Istiod, Linkerd Control Plane, Consul Connect)              │
│    • Configuration management                                    │
│    • Certificate authority (mTLS)                                │
│    • Service discovery                                           │
└─────────────────────────────────────────────────────────────────┘
                              │
                    Push configuration
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                        Data Plane                                │
│                                                                  │
│  ┌─────────────────┐         ┌─────────────────┐                │
│  │     Pod A       │         │     Pod B       │                │
│  │  ┌───────────┐  │         │  ┌───────────┐  │                │
│  │  │   App     │  │         │  │   App     │  │                │
│  │  └─────┬─────┘  │         │  └─────┬─────┘  │                │
│  │        │        │         │        │        │                │
│  │  ┌─────▼─────┐  │         │  ┌─────▼─────┐  │                │
│  │  │  Envoy    │◄─┼─ mTLS ──┼─►│  Envoy    │  │                │
│  │  │  Sidecar  │  │         │  │  Sidecar  │  │                │
│  │  └───────────┘  │         │  └───────────┘  │                │
│  └─────────────────┘         └─────────────────┘                │
│                                                                  │
│  Sidecar handles: mTLS, load balancing, retries, circuit        │
│  breaking, rate limiting, observability                          │
└─────────────────────────────────────────────────────────────────┘

Popular Service Meshes:

Mesh	Characteristics	Best For
Istio	Feature-rich, complex, Google-backed	Enterprises needing advanced features
Linkerd	Lightweight, fast, simple	Teams wanting simplicity and performance
Consul Connect	HashiCorp ecosystem integration	Multi-datacenter, VMs + K8s
Cilium Service Mesh	eBPF-based, sidecar-less option	High performance, kernel-level

When Do You Need a Service Mesh?

Need	Service Mesh Helps
mTLS everywhere	Yes, automated certificate management
Advanced traffic management	Yes, canary, A/B, traffic mirroring
Observability without code changes	Yes, automatic metrics and tracing
Circuit breaking, retries	Yes, configurable resilience patterns
Small cluster, simple app	Probably overkill

Service Mesh Adds Complexity

Service meshes add latency (extra hops), resource overhead (sidecars), and operational complexity. Evaluate carefully—many teams don't need a mesh. Start with native Kubernetes networking and add a mesh when you have clear requirements it addresses.

Security Tools: Policy, Scanning, and Secrets

Kubernetes security is multi-layered. Various tools address different aspects of the security posture.

Policy Enforcement:

Policy Enforcement Tools
Tool	Approach	Use Case
Pod Security Standards	Built-in admission controller	Basic Pod security policies
OPA Gatekeeper	General-purpose policy engine	Custom policies, compliance
Kyverno	Kubernetes-native policies	Easier YAML syntax, mutations
Kubewarden	WebAssembly policies	High performance, portable policies

kyverno-policy.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Kyverno policy: Require resource limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: require-limits
      match:
        resources:
          kinds:
            - Pod
      validate:
        message: "CPU and memory limits are required"
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"

Image Scanning:

Tool	Features
Trivy	Fast, comprehensive (vulnerabilities, misconfig, secrets)
Grype	Anchore's open-source scanner
Snyk	Commercial with free tier, developer-focused
Clair	Red Hat's scanner, used by Quay

Secrets Management:

Tool	Approach
External Secrets Operator	Sync secrets from Vault, AWS SM, etc.
Sealed Secrets	Encrypt secrets for Git storage
HashiCorp Vault	Enterprise secrets management (with CSI driver)
SOPS	Encrypt files with various key providers

Defense in Depth

Layer your security: scan images in CI, enforce policies at admission, use network policies for segmentation, and rotate secrets regularly. No single tool covers everything—combine them for comprehensive security.

Storage Solutions: Persistent Data in Kubernetes

Kubernetes provides a storage abstraction through PersistentVolumes and PersistentVolumeClaims. Various solutions implement the actual storage.

Storage Types:

Kubernetes Storage Solutions
Category	Examples	Use Case
Cloud Provider	AWS EBS, GCP PD, Azure Disk	Managed block storage in cloud
Cloud File	AWS EFS, GCP Filestore, Azure Files	Shared file storage (RWX)
Distributed	Ceph/Rook, Longhorn, OpenEBS	Self-managed, bare-metal
Local	Local PVs, TopoLVM	High performance, node-local
Object Storage	MinIO (S3-compatible)	Large-scale object storage

Container Storage Interface (CSI):

CSI is the standard for storage drivers in Kubernetes:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com  # CSI driver
parameters:
  type: gp3
  iops: "10000"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Key Storage Considerations:

Access Modes: ReadWriteOnce (RWO), ReadWriteMany (RWX), ReadOnlyMany (ROX)
Reclaim Policy: Delete (cleanup) vs. Retain (keep data)
Volume Binding: Immediate vs. WaitForFirstConsumer (topology-aware)
Snapshots: VolumeSnapshot for backup/clone operations

Rook-Ceph for Self-Managed Storage

Rook is a CNCF graduated project that runs Ceph on Kubernetes. It turns cluster storage into a self-managed, highly available storage system. Ideal for bare-metal or when you need independence from cloud storage.

Networking Extensions: CNI, Ingress, and Gateway API

Kubernetes' networking model is extensible at multiple layers.

Container Network Interface (CNI) Plugins:

Popular CNI Plugins
Plugin	Key Features	Best For
Calico	Network policies, BGP, eBPF mode	General purpose, policy focus
Cilium	eBPF-based, L7 policies, service mesh (optional)	Performance, security
Flannel	Simple VXLAN overlay	Simple clusters, getting started
Weave Net	Encrypted networking	Security-focused small clusters
AWS VPC CNI	Native VPC networking	EKS (default)

Ingress Controllers:

Ingress controllers implement the Ingress resource for HTTP(S) routing:

Controller	Type	Features
NGINX Ingress	General purpose	Most popular, highly configurable
Traefik	Cloud-native	Auto Let's Encrypt, dashboard
HAProxy Ingress	Enterprise	High performance
Contour	Envoy-based	Gateway API support
AWS ALB Ingress	Cloud-specific	Native ALB integration

Gateway API (The Future):

Gateway API is the next-generation ingress specification:

More expressive than Ingress
Role-based (infrastructure admin, cluster admin, app developer)
Supports TCP/UDP (not just HTTP)
Better extension mechanism

gateway-api.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Gateway API example
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: main-gateway
spec:
  gatewayClassName: nginx
  listeners:
    - name: http
      protocol: HTTP
      port: 80
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        certificateRefs:
          - name: tls-secret
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: web-route
spec:
  parentRefs:
    - name: main-gateway
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /api
      backendRefs:
        - name: api-service
          port: 8080

Adopt Gateway API for New Projects

Gateway API is graduating and becoming the standard. If starting new projects, consider Gateway API over Ingress. It's more powerful and will be the long-term direction. Most ingress controllers now support Gateway API.

Developer Experience: Local Development and Debugging

Developing for Kubernetes requires specialized tools to bridge local development and cluster environments.

Local Kubernetes Options:

Local Kubernetes Solutions
Tool	How It Works	Best For
Docker Desktop	Built-in K8s cluster	Mac/Windows, simple usage
minikube	VM or container-based cluster	Feature-rich, multiple drivers
kind	Kubernetes in Docker	CI pipelines, multi-node testing
k3s	Lightweight K8s distribution	Edge, IoT, resource-constrained
Rancher Desktop	Docker/containerd + K8s GUI	Docker Desktop alternative

Development and Debugging Tools:

Tool	Purpose
Skaffold	Build/push/deploy loop automation
Tilt	Live development environment with UI
Telepresence	Route cluster traffic to local machine
kubectl debug	Ephemeral debug containers in running Pods
Lens	Desktop Kubernetes IDE
k9s	Terminal-based cluster UI

kubectl Essential Plugins:

# Install krew (kubectl plugin manager)
# Then install useful plugins:

kubectl krew install ctx ns neat tree  # Context, namespace, cleanup
kubectl krew install images             # Show container images
kubectl krew install resource-capacity  # Node resource usage
kubectl krew install sniff              # Network packet capture
kubectl krew install debug              # Container debugging

Use k9s for Daily Operations

k9s provides a fast, terminal-based UI for Kubernetes. Navigate resources with keyboard shortcuts, view logs, exec into Pods, and manage contexts—all faster than typing kubectl commands. It's a productivity multiplier for operators.

Kubernetes Distributions and Managed Services

You can run Kubernetes in many ways—from self-managed clusters to fully managed cloud services.

Managed Kubernetes Services:

Cloud Managed Kubernetes
Service	Provider	Key Features
EKS	AWS	Deep AWS integration, Fargate for serverless
GKE	Google Cloud	Autopilot mode, advanced networking
AKS	Azure	Azure AD integration, Azure Arc for hybrid
DigitalOcean Kubernetes	DigitalOcean	Simplicity, good developer experience
Linode Kubernetes Engine	Linode/Akamai	Cost-effective, straightforward

Self-Managed Distributions:

Distribution	Focus	Use Case
kubeadm	Upstream, minimal	DIY clusters, learning
k3s	Lightweight	Edge, IoT, dev, resource-limited
RKE2	Rancher's secure K8s	Air-gapped, compliance-focused
OpenShift	Red Hat's enterprise K8s	Enterprises, full platform
Tanzu	VMware's K8s portfolio	vSphere integration

Choosing Between Self-Managed and Managed:

Factor	Self-Managed	Managed
Control	Full control	Limited to provider options
Complexity	High—you manage everything	Low—provider manages control plane
Cost	Lower software cost, higher ops cost	Higher service cost, lower ops cost
Upgrades	Your responsibility	Provider assisted or automatic
Best For	Specific requirements, cost optimization	Most production workloads

Start with Managed Kubernetes

Unless you have specific requirements (compliance, air-gap, extreme customization), start with a managed Kubernetes service. The operational overhead of managing control plane components is significant. Focus your team's energy on applications, not infrastructure.

Summary: Navigating the Kubernetes Ecosystem

We've explored the vast Kubernetes ecosystem. Let's consolidate the key takeaways:

Key Takeaways

•CNCF projects are vetted: Graduated projects are production-ready and widely adopted
•Helm is the package manager: Use charts for third-party apps, create charts for reusable infrastructure
•GitOps is the operational model: ArgoCD or Flux for declarative, auditable deployments
•Prometheus + Grafana is the standard stack: Add OpenTelemetry for unified observability
•Service meshes solve specific problems: mTLS, traffic management, observability—but add complexity
•Layer security: Policies, scanning, secrets management, network policies
•CSI drivers provide storage: Cloud or self-managed options for persistent data
•Gateway API is the future of ingress: More expressive, role-based, multi-protocol
•Use local tools for development: kind, minikube, k9s, Telepresence for productivity
•Managed Kubernetes for most cases: Focus on applications, not control plane operations

Module Complete:

You now have a comprehensive understanding of Kubernetes architecture—from core components and resources to the broader ecosystem of tools. This knowledge enables you to design, deploy, and operate production Kubernetes clusters effectively.

Module Complete: Kubernetes Architecture

Congratulations! You've completed the Kubernetes Architecture module. You understand the component architecture, core resources, control plane and nodes, declarative configuration, and the ecosystem tools that make Kubernetes a complete platform. Apply this knowledge to build robust, scalable containerized applications.

5 / 5

Loading learning content...

System Design HLDKubernetes Architecture

Understanding Kubernetes Architecture

LevelIntermediate

Duration120 mins

TopicKubernetes Architecture

5 / 5

Kubernetes Ecosystem: Tools, Extensions, and Best Practices

Beyond Core Kubernetes: A Platform Universe

What You Will Learn

The CNCF Landscape: Navigating the Cloud Native World

The Cloud Native Computing Foundation (CNCF) is the home of Kubernetes and dozens of related projects. Understanding the CNCF landscape helps you discover established, community-vetted tools.

CNCF Project Maturity Levels:

CNCF Project Maturity Stages
Stage	Meaning	Examples
Sandbox	Early-stage, experimental projects	OpenFunction, Karmada, KubeVirt
Incubating	Growing adoption, not yet battle-tested	Argo, Kyverno, Crossplane
Graduated	Production-ready, widely adopted	Kubernetes, Prometheus, Envoy, containerd, Helm

Key CNCF Graduated Projects (Production-Ready):

Category	Project	Purpose
Orchestration	Kubernetes	Container orchestration
Runtime	containerd	Container runtime
Runtime	CRI-O	Kubernetes container runtime
Observability	Prometheus	Metrics and alerting
Observability	Jaeger	Distributed tracing
Observability	Fluentd	Log aggregation
Service Proxy	Envoy	L7 proxy and service mesh data plane
Service Mesh	Linkerd	Lightweight service mesh
Package Management	Helm	Kubernetes package manager
Security	OPA (Gatekeeper)	Policy-as-code
Storage	Rook	Cloud-native storage
Networking	CoreDNS	Cluster DNS
CI/CD	Flux	GitOps toolkit
CI/CD	Argo	Workflows, GitOps, events

Explore the CNCF Landscape

Helm: The Kubernetes Package Manager

Helm is the package manager for Kubernetes. It packages applications as Charts—templated, versioned bundles of Kubernetes manifests.

Why Helm?

Reusability: Install complex applications with a single command
Templating: Parameterize manifests for different environments
Versioning: Track application versions, upgrade, rollback
Ecosystem: Thousands of pre-built charts for popular software

Core Concepts:

Helm Terminology
Term	Definition
Chart	Package of templated Kubernetes manifests
Release	Instance of a chart running in the cluster
Repository	Collection of charts (like a package registry)
Values	Configuration parameters for a chart

helm-usage.txt
# Add a chart repository
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
 
# Search for charts
helm search repo nginx
 
# Install a chart (creates a release)
helm install my-nginx bitnami/nginx --namespace web --create-namespace
 
# Install with custom values
helm install my-nginx bitnami/nginx -f my-values.yaml
 
# Upgrade a release
helm upgrade my-nginx bitnami/nginx --set replicaCount=3
 
# Rollback to previous version
helm rollback my-nginx 1
 
# List releases
helm list -A
 
# Uninstall a release
helm uninstall my-nginx -n web
 
# Template locally (see generated manifests without installing)
helm template my-nginx bitnami/nginx -f my-values.yaml

Creating Your Own Charts:

Helm charts follow a standard structure:

mychart/
├── Chart.yaml          # Chart metadata (name, version)
├── values.yaml         # Default configuration values
├── templates/          # Kubernetes manifest templates
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── _helpers.tpl    # Template helper functions
│   └── NOTES.txt       # Post-install instructions
└── charts/             # Dependency charts

Template Syntax:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-app
spec:
  replicas: {{ .Values.replicaCount }}
  template:
    spec:
      containers:
        - name: app
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"

Helm vs. Kustomize

GitOps: Git as the Source of Truth

GitOps is an operational framework where:

Git repositories are the source of truth for infrastructure and application configuration
Changes are made through pull requests, enabling review and audit
Automated processes continuously sync Git state to cluster state

GitOps Benefits:

Audit trail: All changes tracked in Git history
Reproducibility: Any cluster state can be recreated from Git
Collaboration: Standard PR workflow for infra changes
Self-healing: Drift from Git state is automatically corrected

Major GitOps Tools:

ArgoCD

•Pull-based GitOps: Controller pulls and syncs
•Rich UI: Web UI for visualization and management
•App-of-Apps: Manage multiple apps declaratively
•Sync Waves: Control deployment order
•Multi-cluster: Manage many clusters from one ArgoCD

Flux

•Toolkit approach: Modular, composable components
•Source Controller: Git, Helm, OCI sources
•Kustomize Controller: Native Kustomize support
•Image Automation: Auto-update manifests on new images
•Lightweight: Less resource overhead

argocd-application.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# ArgoCD Application resource
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: web-app
  namespace: argocd
spec:
  project: default
  
  source:
    repoURL: https://github.com/myorg/k8s-manifests.git
    targetRevision: main
    path: apps/web-app/overlays/production
  
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  
  syncPolicy:
    automated:
      prune: true        # Delete resources not in Git
      selfHeal: true     # Revert manual changes
    syncOptions:
      - CreateNamespace=true

GitOps Repository Structure

Observability: Metrics, Logs, and Traces

Observability is essential for operating Kubernetes at scale. The three pillars—metrics, logs, and traces—provide different views into system behavior.

The Observability Stack:

Observability Components
Pillar	Purpose	Popular Tools
Metrics	Numeric measurements over time	Prometheus, Grafana, Datadog, VictoriaMetrics
Logs	Text records of events	Loki, Elasticsearch, Fluentd, Fluent Bit
Traces	Request path through services	Jaeger, Zipkin, Tempo, OpenTelemetry

Prometheus + Grafana (The Standard Stack):

Prometheus:

Pull-based metrics collection (scrapes /metrics endpoints)
Time-series database optimized for metrics
Powerful query language (PromQL)
Alerting rules with Alertmanager integration

Grafana:

Visualization and dashboarding
Multi-source support (Prometheus, Loki, Jaeger, etc.)
Pre-built dashboards for Kubernetes
Alerting with various notification channels

prometheus-servicemonitor.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# ServiceMonitor tells Prometheus what to scrape
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: web-app
  namespace: monitoring
  labels:
    release: prometheus  # Match Prometheus operator selector
spec:
  selector:
    matchLabels:
      app: web-app
  namespaceSelector:
    matchNames:
      - production
  endpoints:
    - port: metrics      # Port name from Service
      interval: 30s
      path: /metrics
 
---
# Example PromQL queries:
# CPU usage by pod
# rate(container_cpu_usage_seconds_total{namespace="production"}[5m])
 
# Memory usage percentage
# container_memory_usage_bytes / container_memory_limit_bytes * 100
 
# Request rate
# sum(rate(http_requests_total[5m])) by (service)
 
# Error rate
# sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

OpenTelemetry (The Future):

OpenTelemetry is becoming the standard for collecting all three pillars:

Unified SDK: One library for metrics, logs, and traces
Vendor-neutral: Export to any backend
Auto-instrumentation: Automatic tracing for common frameworks
Collector: Receives, processes, and exports telemetry data

Start with Prometheus Operator

Service Mesh: Advanced Networking and Security

How Service Mesh Works:

Sidecar Proxy: Each Pod gets a proxy container (usually Envoy)
Traffic Interception: All traffic goes through the proxy
Control Plane: Configures proxies with routing rules, policies
Observability: Proxies report metrics, traces automatically

service-mesh-architecture.txt
Service Mesh Architecture:
 
┌─────────────────────────────────────────────────────────────────┐
│                       Control Plane                              │
│    (Istiod, Linkerd Control Plane, Consul Connect)              │
│    • Configuration management                                    │
│    • Certificate authority (mTLS)                                │
│    • Service discovery                                           │
└─────────────────────────────────────────────────────────────────┘
                              │
                    Push configuration
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                        Data Plane                                │
│                                                                  │
│  ┌─────────────────┐         ┌─────────────────┐                │
│  │     Pod A       │         │     Pod B       │                │
│  │  ┌───────────┐  │         │  ┌───────────┐  │                │
│  │  │   App     │  │         │  │   App     │  │                │
│  │  └─────┬─────┘  │         │  └─────┬─────┘  │                │
│  │        │        │         │        │        │                │
│  │  ┌─────▼─────┐  │         │  ┌─────▼─────┐  │                │
│  │  │  Envoy    │◄─┼─ mTLS ──┼─►│  Envoy    │  │                │
│  │  │  Sidecar  │  │         │  │  Sidecar  │  │                │
│  │  └───────────┘  │         │  └───────────┘  │                │
│  └─────────────────┘         └─────────────────┘                │
│                                                                  │
│  Sidecar handles: mTLS, load balancing, retries, circuit        │
│  breaking, rate limiting, observability                          │
└─────────────────────────────────────────────────────────────────┘

Popular Service Meshes:

Mesh	Characteristics	Best For
Istio	Feature-rich, complex, Google-backed	Enterprises needing advanced features
Linkerd	Lightweight, fast, simple	Teams wanting simplicity and performance
Consul Connect	HashiCorp ecosystem integration	Multi-datacenter, VMs + K8s
Cilium Service Mesh	eBPF-based, sidecar-less option	High performance, kernel-level

When Do You Need a Service Mesh?

Need	Service Mesh Helps
mTLS everywhere	Yes, automated certificate management
Advanced traffic management	Yes, canary, A/B, traffic mirroring
Observability without code changes	Yes, automatic metrics and tracing
Circuit breaking, retries	Yes, configurable resilience patterns
Small cluster, simple app	Probably overkill

Service Mesh Adds Complexity

Security Tools: Policy, Scanning, and Secrets

Kubernetes security is multi-layered. Various tools address different aspects of the security posture.

Policy Enforcement:

Policy Enforcement Tools
Tool	Approach	Use Case
Pod Security Standards	Built-in admission controller	Basic Pod security policies
OPA Gatekeeper	General-purpose policy engine	Custom policies, compliance
Kyverno	Kubernetes-native policies	Easier YAML syntax, mutations
Kubewarden	WebAssembly policies	High performance, portable policies

kyverno-policy.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Kyverno policy: Require resource limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: require-limits
      match:
        resources:
          kinds:
            - Pod
      validate:
        message: "CPU and memory limits are required"
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"

Image Scanning:

Tool	Features
Trivy	Fast, comprehensive (vulnerabilities, misconfig, secrets)
Grype	Anchore's open-source scanner
Snyk	Commercial with free tier, developer-focused
Clair	Red Hat's scanner, used by Quay

Secrets Management:

Tool	Approach
External Secrets Operator	Sync secrets from Vault, AWS SM, etc.
Sealed Secrets	Encrypt secrets for Git storage
HashiCorp Vault	Enterprise secrets management (with CSI driver)
SOPS	Encrypt files with various key providers

Defense in Depth

Storage Solutions: Persistent Data in Kubernetes

Kubernetes provides a storage abstraction through PersistentVolumes and PersistentVolumeClaims. Various solutions implement the actual storage.

Storage Types:

Kubernetes Storage Solutions
Category	Examples	Use Case
Cloud Provider	AWS EBS, GCP PD, Azure Disk	Managed block storage in cloud
Cloud File	AWS EFS, GCP Filestore, Azure Files	Shared file storage (RWX)
Distributed	Ceph/Rook, Longhorn, OpenEBS	Self-managed, bare-metal
Local	Local PVs, TopoLVM	High performance, node-local
Object Storage	MinIO (S3-compatible)	Large-scale object storage

Container Storage Interface (CSI):

CSI is the standard for storage drivers in Kubernetes:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com  # CSI driver
parameters:
  type: gp3
  iops: "10000"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Key Storage Considerations:

Access Modes: ReadWriteOnce (RWO), ReadWriteMany (RWX), ReadOnlyMany (ROX)
Reclaim Policy: Delete (cleanup) vs. Retain (keep data)
Volume Binding: Immediate vs. WaitForFirstConsumer (topology-aware)
Snapshots: VolumeSnapshot for backup/clone operations

Rook-Ceph for Self-Managed Storage

Networking Extensions: CNI, Ingress, and Gateway API

Kubernetes' networking model is extensible at multiple layers.

Container Network Interface (CNI) Plugins:

Popular CNI Plugins
Plugin	Key Features	Best For
Calico	Network policies, BGP, eBPF mode	General purpose, policy focus
Cilium	eBPF-based, L7 policies, service mesh (optional)	Performance, security
Flannel	Simple VXLAN overlay	Simple clusters, getting started
Weave Net	Encrypted networking	Security-focused small clusters
AWS VPC CNI	Native VPC networking	EKS (default)

Ingress Controllers:

Ingress controllers implement the Ingress resource for HTTP(S) routing:

Controller	Type	Features
NGINX Ingress	General purpose	Most popular, highly configurable
Traefik	Cloud-native	Auto Let's Encrypt, dashboard
HAProxy Ingress	Enterprise	High performance
Contour	Envoy-based	Gateway API support
AWS ALB Ingress	Cloud-specific	Native ALB integration

Gateway API (The Future):

Gateway API is the next-generation ingress specification:

More expressive than Ingress
Role-based (infrastructure admin, cluster admin, app developer)
Supports TCP/UDP (not just HTTP)
Better extension mechanism

gateway-api.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Gateway API example
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: main-gateway
spec:
  gatewayClassName: nginx
  listeners:
    - name: http
      protocol: HTTP
      port: 80
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        certificateRefs:
          - name: tls-secret
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: web-route
spec:
  parentRefs:
    - name: main-gateway
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /api
      backendRefs:
        - name: api-service
          port: 8080

Adopt Gateway API for New Projects

Developer Experience: Local Development and Debugging

Developing for Kubernetes requires specialized tools to bridge local development and cluster environments.

Local Kubernetes Options:

Local Kubernetes Solutions
Tool	How It Works	Best For
Docker Desktop	Built-in K8s cluster	Mac/Windows, simple usage
minikube	VM or container-based cluster	Feature-rich, multiple drivers
kind	Kubernetes in Docker	CI pipelines, multi-node testing
k3s	Lightweight K8s distribution	Edge, IoT, resource-constrained
Rancher Desktop	Docker/containerd + K8s GUI	Docker Desktop alternative

Development and Debugging Tools:

Tool	Purpose
Skaffold	Build/push/deploy loop automation
Tilt	Live development environment with UI
Telepresence	Route cluster traffic to local machine
kubectl debug	Ephemeral debug containers in running Pods
Lens	Desktop Kubernetes IDE
k9s	Terminal-based cluster UI

kubectl Essential Plugins:

# Install krew (kubectl plugin manager)
# Then install useful plugins:

kubectl krew install ctx ns neat tree  # Context, namespace, cleanup
kubectl krew install images             # Show container images
kubectl krew install resource-capacity  # Node resource usage
kubectl krew install sniff              # Network packet capture
kubectl krew install debug              # Container debugging

Use k9s for Daily Operations

Kubernetes Distributions and Managed Services

You can run Kubernetes in many ways—from self-managed clusters to fully managed cloud services.

Managed Kubernetes Services:

Cloud Managed Kubernetes
Service	Provider	Key Features
EKS	AWS	Deep AWS integration, Fargate for serverless
GKE	Google Cloud	Autopilot mode, advanced networking
AKS	Azure	Azure AD integration, Azure Arc for hybrid
DigitalOcean Kubernetes	DigitalOcean	Simplicity, good developer experience
Linode Kubernetes Engine	Linode/Akamai	Cost-effective, straightforward

Self-Managed Distributions:

Distribution	Focus	Use Case
kubeadm	Upstream, minimal	DIY clusters, learning
k3s	Lightweight	Edge, IoT, dev, resource-limited
RKE2	Rancher's secure K8s	Air-gapped, compliance-focused
OpenShift	Red Hat's enterprise K8s	Enterprises, full platform
Tanzu	VMware's K8s portfolio	vSphere integration

Choosing Between Self-Managed and Managed:

Factor	Self-Managed	Managed
Control	Full control	Limited to provider options
Complexity	High—you manage everything	Low—provider manages control plane
Cost	Lower software cost, higher ops cost	Higher service cost, lower ops cost
Upgrades	Your responsibility	Provider assisted or automatic
Best For	Specific requirements, cost optimization	Most production workloads

Start with Managed Kubernetes

Summary: Navigating the Kubernetes Ecosystem

We've explored the vast Kubernetes ecosystem. Let's consolidate the key takeaways:

Key Takeaways

•CNCF projects are vetted: Graduated projects are production-ready and widely adopted
•Helm is the package manager: Use charts for third-party apps, create charts for reusable infrastructure
•GitOps is the operational model: ArgoCD or Flux for declarative, auditable deployments
•Prometheus + Grafana is the standard stack: Add OpenTelemetry for unified observability
•Service meshes solve specific problems: mTLS, traffic management, observability—but add complexity
•Layer security: Policies, scanning, secrets management, network policies
•CSI drivers provide storage: Cloud or self-managed options for persistent data
•Gateway API is the future of ingress: More expressive, role-based, multi-protocol
•Use local tools for development: kind, minikube, k9s, Telepresence for productivity
•Managed Kubernetes for most cases: Focus on applications, not control plane operations

Module Complete:

Module Complete: Kubernetes Architecture

5 / 5