Kubernetes Architecture - Learning Module

Loading content...

0/273

Kubernetes Components: The Building Blocks of Container Orchestration

The Architecture That Powers Modern Infrastructure

Kubernetes has become the de facto standard for container orchestration, powering workloads from the smallest startups to the largest enterprises and hyperscalers. But beneath its powerful abstractions lies a carefully designed distributed system with distinct components, each serving a critical purpose.

Understanding Kubernetes components isn't just academic knowledge—it's essential for debugging production issues, capacity planning, security hardening, and designing resilient architectures. When your application deployment fails, knowing whether the problem lies with the API Server, Scheduler, Controller Manager, or kubelet fundamentally changes your troubleshooting approach.

What You Will Learn

By the end of this page, you will have a comprehensive understanding of every core Kubernetes component, how they communicate, their failure modes, and how this architecture enables the self-healing, declarative nature of Kubernetes. You'll be able to reason about cluster behavior at the component level—a skill that separates operators who merely deploy from those who truly understand.

Architectural Overview: Control Plane and Data Plane

Kubernetes follows a control plane / data plane architecture pattern common in distributed systems. This separation provides clear boundaries between the components that make decisions and the components that execute work.

The Control Plane (also called the master components) is the brain of the cluster. It maintains the desired state, makes scheduling decisions, responds to cluster events, and exposes the API. The control plane doesn't run your application workloads—it orchestrates them.

The Data Plane (the worker nodes) is where your containers actually run. Each node hosts the components necessary to run Pods and communicate with the control plane. The data plane executes the decisions made by the control plane.

Control Plane vs Data Plane Components
Component Type	Components	Primary Responsibility
Control Plane	kube-apiserver, etcd, kube-scheduler, kube-controller-manager, cloud-controller-manager	Cluster state management, scheduling, reconciliation
Data Plane	kubelet, kube-proxy, container runtime	Running containers, networking, health monitoring

Why this separation matters:

This architectural division enables several critical capabilities:

Scalability: Control plane and data plane can scale independently. You can add hundreds of worker nodes without proportionally scaling control plane components.
Isolation: Control plane failures don't immediately kill running workloads. If the API server goes down, existing Pods continue running—you just can't make changes.
Security: The control plane can be isolated in separate network segments, reducing attack surface. Worker nodes only need limited access to specific control plane endpoints.
Maintenance: Control plane components can be upgraded or restarted with minimal impact on running workloads.

Production Reality

In production, control plane components are typically run with high availability—usually three or more replicas spread across availability zones. Managed Kubernetes services (EKS, GKE, AKS) abstract this away, managing the control plane for you while you focus on the data plane.

kube-apiserver: The Cluster's Front Door

The kube-apiserver is the central hub of all Kubernetes communication. Every interaction with the cluster—whether from kubectl, controllers, the scheduler, or kubelets—goes through the API server. It's not just a gateway; it's the single source of truth for the cluster's current state.

Core Responsibilities:

API Endpoint Exposure: Serves the Kubernetes REST API over HTTPS, handling CRUD operations for all Kubernetes objects (Pods, Services, ConfigMaps, etc.)
Authentication & Authorization: Validates the identity of all requests (via certificates, tokens, etc.) and enforces RBAC policies to determine what actions are permitted.
Admission Control: Runs admission controllers that can mutate or validate requests before they're persisted. This is where policies like resource quotas, pod security standards, and webhook-based validations are enforced.
etcd Gateway: Serves as the only component that directly communicates with etcd. All cluster state changes go through the API server to etcd.
Watch Mechanism: Supports efficient watches that allow clients to subscribe to changes rather than polling. This is how controllers learn about new or modified resources.

apiserver-request-flow.txt
Request Flow Through the API Server:
 
1. Request Arrives
   └─► Authentication
       ├─ Client Certificate
       ├─ Bearer Token
       ├─ OpenID Connect
       └─ Webhook Token Authentication
 
2. Authorization (after authentication succeeds)
   └─► RBAC Check
       ├─ Role/ClusterRole lookup
       ├─ RoleBinding/ClusterRoleBinding check
       └─ Decision: Allow/Deny
 
3. Admission Controllers (if authorized)
   └─► Mutating Admission
       ├─ Default values injection
       ├─ Sidecar injection (Istio, etc.)
       └─ Label/annotation addition
   └─► Validating Admission
       ├─ Resource quota check
       ├─ Pod security policies
       └─ Custom webhook validations
 
4. Persistence (if all checks pass)
   └─► Write to etcd
       └─► Return success response
 
5. Notification
   └─► Watch subscribers notified of change

Scalability Characteristics:

The API server is designed to be horizontally scalable. In high-availability setups, multiple API server instances run behind a load balancer. Because all state is stored in etcd, any API server instance can handle any request. This stateless design is critical for production deployments.

Key Performance Considerations:

Request throttling: The API server implements priority and fairness to prevent any single client from overwhelming the server
Watch efficiency: Long-running watch connections allow clients to receive events without constant polling
Caching: Frequently accessed data is cached to reduce etcd load
Index-based queries: Efficient label selectors use indexes for fast lookups

API Server as Single Point of Failure

While the API server is horizontally scalable, it remains the central communication hub. If all API server instances become unavailable, you lose the ability to make any cluster changes—though existing workloads continue running. This makes API server availability critical for operational control.

etcd: The Cluster's Distributed Memory

etcd is a distributed, consistent key-value store that serves as Kubernetes' backing store for all cluster data. Every object you create—every Pod, Service, ConfigMap, Secret—is persisted in etcd. It's not just storage; it's the foundation of Kubernetes' consistency guarantees.

Why etcd?

Kubernetes requires a storage system with very specific properties:

Strong Consistency: When the API server writes a Pod spec, that write must be immediately visible to all readers. etcd provides this through the Raft consensus algorithm.
Watch Support: Controllers need to react to changes. etcd's native watch capability allows efficient event notification without polling.
Distributed & Fault-Tolerant: etcd can tolerate node failures while maintaining consistency. A 3-node cluster survives 1 failure; a 5-node cluster survives 2.
Transactional Operations: Compare-and-swap operations enable safe concurrent updates—critical for controllers competing to update resources.

etcd Cluster Sizing Guidelines
Cluster Size	Failure Tolerance	Recommended Use Case
1 node	0 failures	Development/testing only
3 nodes	1 failure	Small production clusters
5 nodes	2 failures	Large production clusters requiring higher availability
7 nodes	3 failures	Rarely needed; additional nodes add consensus overhead

Data Organization in etcd:

Kubernetes organizes data in etcd using a hierarchical key structure:

/registry/pods/<namespace>/<pod-name>
/registry/services/<namespace>/<service-name>
/registry/secrets/<namespace>/<secret-name>
/registry/deployments/<namespace>/<deployment-name>

This organization enables efficient prefix-based watches (watch all pods in a namespace) and range queries.

The Raft Consensus Algorithm:

etcd uses Raft for distributed consensus. Here's how it works at a high level:

Leader Election: One node is elected leader; all writes go through the leader
Log Replication: The leader replicates entries to followers before committing
Commit: Once a majority acknowledges, the entry is committed and applied
Failover: If the leader fails, a new election occurs within milliseconds

etcd is Critical—Back It Up

Losing etcd data means losing your entire cluster configuration. Regular etcd snapshots are essential. Many production incidents have been caused by etcd corruption or loss. Use etcdctl snapshot save regularly and store backups off-cluster.

Performance Tuning Considerations:

Disk I/O: etcd is write-intensive; use SSDs with low latency. Network-attached storage can introduce unacceptable delays.
Network Latency: Keep etcd nodes close (same availability zone ideally). Cross-AZ latency impacts commit times.
Compaction: etcd stores historical revisions; periodic compaction prevents unbounded storage growth.
Resource Isolation: Run etcd on dedicated nodes in large clusters to prevent resource contention with other workloads.

kube-scheduler: Intelligent Workload Placement

The kube-scheduler watches for newly created Pods that have no assigned node and selects an appropriate node for them to run on. This is a non-trivial problem—the scheduler must consider resource requirements, affinity/anti-affinity rules, taints and tolerations, data locality, and many other factors.

The Scheduling Process:

Scheduling happens in two phases:

1. Filtering Phase (Predicates) The scheduler filters out nodes that cannot run the Pod:

Does the node have sufficient CPU and memory?
Does the Pod tolerate the node's taints?
Does the node satisfy node affinity requirements?
Are required ports available?
Does the node's labels match node selectors?

2. Scoring Phase (Priorities) Among feasible nodes, the scheduler scores each to find the optimal placement:

Balanced resource utilization
Data locality (volume zones)
Pod anti-affinity spread
Custom priority functions

scheduling-decision-flow.txt
Scheduling Decision Flow:
 
Pod Created (no nodeName) ──► Scheduler Picks Up
                                      │
                                      ▼
                          ┌─────────────────────┐
                          │   FILTERING PHASE   │
                          └─────────────────────┘
                                      │
           ┌──────────────────────────┼──────────────────────────┐
           ▼                          ▼                          ▼
    Node Affinity?            Resource Fit?              Tolerations?
           │                          │                          │
           ▼                          ▼                          ▼
    ┌─────────────────────────────────────────────────────────────┐
    │              Feasible Nodes (pass all filters)              │
    └─────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
                          ┌─────────────────────┐
                          │    SCORING PHASE    │
                          └─────────────────────┘
                                      │
           ┌──────────────────────────┼──────────────────────────┐
           ▼                          ▼                          ▼
   Resource Balance        Spreading Score           Locality Score
   (LeastRequested)        (Pod Anti-Affinity)      (Volume Zone)
           │                          │                          │
           ▼                          ▼                          ▼
    ┌─────────────────────────────────────────────────────────────┐
    │         Final Score = Σ (weight × individual_score)         │
    └─────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
                        Node with Highest Score Wins
                                      │
                                      ▼
                    Bind Pod to Node (write to etcd)

Scheduler Extensibility:

The default scheduler can be extended through several mechanisms:

Scheduling Profiles: Configure multiple scheduling behaviors and select per-Pod
Scheduler Plugins: The scheduling framework allows injecting custom logic at various extension points
Custom Schedulers: You can run completely separate schedulers alongside the default one

Common Scheduling Failures:

Failure Reason	Typical Cause	Resolution
Insufficient cpu/memory	Resource requests exceed available capacity	Add nodes, reduce requests, or use cluster autoscaler
No nodes match node selector	Labels mismatch	Correct labels or update selector
Taints not tolerated	Node tainted without matching toleration	Add toleration or remove taint
Volume zone conflict	Volume in different zone than viable nodes	Create volume in correct zone

Scheduler Performance at Scale

In large clusters (thousands of nodes), scheduler performance becomes critical. The default scheduler uses techniques like node scoring caching and parallel evaluation. For extremely large clusters, consider running multiple schedulers or using scheduling policies to limit the search space.

kube-controller-manager: The Reconciliation Engine

The kube-controller-manager runs a collection of controllers that watch the cluster state and work to move the current state toward the desired state. This is the heart of Kubernetes' declarative model—you specify what you want, and controllers make it happen.

The Controller Pattern:

Every controller follows the same pattern:

Watch: Subscribe to relevant resource changes via the API server
Observe: When changes occur, read the current state
Compare: Calculate the difference between current and desired states
Act: Take actions to reconcile the difference
Repeat: Continue the loop indefinitely

Key Controllers Bundled in kube-controller-manager:

Core Kubernetes Controllers
Controller	Watches	Manages	Key Behavior
ReplicaSet Controller	ReplicaSets, Pods	Pod count	Creates/deletes Pods to match desired replica count
Deployment Controller	Deployments, ReplicaSets	ReplicaSet versions	Manages rolling updates, rollbacks, and revision history
Node Controller	Nodes	Node health	Marks nodes as unhealthy, evicts Pods from dead nodes
Service Account Controller	Namespaces	ServiceAccounts	Creates default service account in new namespaces
Endpoint Controller	Services, Pods	Endpoints	Updates endpoint lists as Pods come and go
Job Controller	Jobs, Pods	Job completions	Creates Pods for Jobs, tracks completion/failure
Namespace Controller	Namespaces	Namespace deletion	Cleans up all resources when namespace deleted
PV/PVC Controller	PersistentVolumes, PersistentVolumeClaims	Volume binding	Matches claims to volumes, handles reclaim

Reconciliation in Action—ReplicaSet Example:

Consider a ReplicaSet with replicas: 3:

The ReplicaSet controller watches ReplicaSets and Pods
When notified, it counts Pods matching the ReplicaSet's selector
If count < 3: Create (3 - count) new Pods
If count > 3: Delete (count - 3) excess Pods
If count == 3: No action needed

This loop runs continuously. If a Pod dies, the controller creates a replacement. If you scale to 5, the controller creates 2 more. The desired state is always maintained.

Leader Election:

In HA setups with multiple control plane nodes, only one instance of kube-controller-manager actively runs controllers—the leader. Others are on standby. If the leader fails, another instance acquires the leader lock and takes over. This prevents conflicting actions from multiple controllers.

Controller Loops Are Eventually Consistent

Controllers don't guarantee instant state convergence. They operate asynchronously in a level-triggered manner—they reconcile based on the current state, not individual events. This makes them resilient to missed events but means there's always some lag between desired state change and actual state convergence.

cloud-controller-manager: Cloud Provider Integration

The cloud-controller-manager (CCM) contains controllers that interact with cloud provider APIs. This component was extracted from kube-controller-manager to allow cloud providers to develop their integration at their own pace without being tied to Kubernetes release cycles.

Controllers in the CCM:

Node Controller (Cloud Portion)
- Initializes nodes with cloud-specific metadata (zone, instance type, etc.)
- Detects when cloud instances are deleted and removes corresponding Node objects
- Updates node addresses from cloud provider
Route Controller
- Configures cloud provider routes for inter-node networking
- Ensures Pod network routes exist between nodes
Service Controller
- Provisions cloud load balancers for type: LoadBalancer services
- Updates load balancer configs as service endpoints change
- Cleans up load balancers when services are deleted

cloud-integration-flow.txt
Cloud Controller Manager in Action:
 
┌────────────────────────────────────────────────────────────────┐
│                    Service type: LoadBalancer                   │
│                        created in API                           │
└────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌────────────────────────────────────────────────────────────────┐
│              Service Controller (in CCM) watches                │
└────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌────────────────────────────────────────────────────────────────┐
│         CCM calls cloud provider API to create LB               │
│   (e.g., CreateLoadBalancer on AWS ELB/NLB, GCP GCLB)          │
└────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌────────────────────────────────────────────────────────────────┐
│     Cloud provider provisions load balancer, returns IP         │
└────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌────────────────────────────────────────────────────────────────┐
│       CCM updates Service.status.loadBalancer.ingress           │
│              with external IP/hostname                          │
└────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌────────────────────────────────────────────────────────────────┐
│     Traffic flows: Internet → Cloud LB → NodePort → Pod         │
└────────────────────────────────────────────────────────────────┘

Provider Implementations:

Each major cloud provider maintains their own CCM:

Running Without CCM:

On bare-metal or on-premises deployments, you typically don't run a CCM. Without it:

LoadBalancer services stay in Pending state (use MetalLB or similar for bare-metal LB)
Node zone/region labels must be applied manually
No automatic cloud resource cleanup

Managed Kubernetes Hides This Complexity

In managed Kubernetes services (EKS, GKE, AKS), the CCM is pre-configured and managed for you. You simply create a LoadBalancer service, and a real cloud load balancer appears. Understanding CCM matters when troubleshooting cloud integration issues or running self-managed clusters.

kubelet: The Agent That Runs Your Containers

The kubelet is the primary node agent—it runs on every worker node and is responsible for ensuring containers are running as specified. It's the component that actually makes Pods come to life.

Core Responsibilities:

Pod Lifecycle Management
- Receives Pod specs (via API server or static Pod files)
- Instructs container runtime to create/start/stop containers
- Monitors container health and restarts failed containers
- Reports Pod and container status back to API server
Volume Management
- Mounts required volumes before containers start
- Handles volume plugins (CSI drivers, etc.)
- Unmounts volumes when Pods terminate
Resource Enforcement
- Configures cgroups to enforce resource limits
- Triggers eviction when node resources are exhausted
- Reports node capacity and allocatable resources
Health Monitoring
- Executes liveness, readiness, and startup probes
- Updates container status based on probe results
- Removes failed containers and signals for replacement

kubelet-pod-lifecycle.txt
kubelet Pod Lifecycle:
 
┌─────────────────────────────────────────────────────────────────┐
│  1. Pod Scheduled to Node (nodeName set by scheduler)           │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  2. kubelet Sees Pod (watching pods assigned to its node)       │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  3. Volume Setup                                                 │
│     • Mount ConfigMaps, Secrets, PVCs                           │
│     • Wait for volume attachment (if cloud volumes)              │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  4. Image Pull                                                   │
│     • Check local cache                                         │
│     • Pull from registry if needed (using imagePullSecrets)      │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  5. Container Creation                                           │
│     • Create sandbox (pause container for networking)            │
│     • Create app containers via Container Runtime Interface      │
│     • Apply cgroup limits, security contexts                     │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  6. Container Start                                              │
│     • Start containers in order (init containers first)         │
│     • Execute postStart lifecycle hooks                          │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  7. Probe Execution (continuous)                                 │
│     • Startup probe → Liveness probe → Readiness probe           │
│     • Update container status based on results                   │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  8. Status Reporting                                             │
│     • Update Pod status in API server                            │
│     • Report node conditions and capacity                        │
└─────────────────────────────────────────────────────────────────┘

Container Runtime Interface (CRI):

kubelet doesn't run containers directly—it delegates to a container runtime via CRI. This abstraction allows swapping runtimes:

Runtime	Description	Use Case
containerd	Industry-standard runtime, graduated CNCF project	Default for most distributions
CRI-O	Lightweight runtime optimized for Kubernetes	OpenShift, minimalist setups
Docker (via cri-dockerd)	Docker Engine with CRI shim	Legacy compatibility

Pod Eviction:

When node resources are exhausted, kubelet evicts Pods based on priority:

BestEffort Pods (no requests/limits) are evicted first
Burstable Pods (requests < limits) are next
Guaranteed Pods (requests == limits) are evicted last

kubelet Controls Node Health

If kubelet fails, the node stops reporting status. After the node-monitor-grace-period (default 40s), the node is marked NotReady. After pod-eviction-timeout (default 5m), Pods are evicted to other nodes. Monitor kubelet health carefully—a silent kubelet failure can leave you with phantom nodes.

kube-proxy: Service Networking Made Possible

kube-proxy runs on every node and implements the Kubernetes Service concept. When you create a Service, kube-proxy configures the node's networking rules to direct traffic to the appropriate Pods.

What kube-proxy Does:

Watches Services and Endpoints: Subscribes to changes via API server
Configures Packet Forwarding: Sets up rules to intercept service ClusterIP traffic and forward to Pod IPs
Load Balances: Distributes traffic across endpoint Pods
Handles NodePort Services: Opens ports on nodes for external access

Operating Modes:

kube-proxy can operate in different modes with significant performance implications:

kube-proxy Operating Modes
Mode	Mechanism	Performance	When to Use
iptables	Linux iptables rules for NAT/forwarding	Good for <1000 services	Default, widely compatible
IPVS	Linux IPVS (IP Virtual Server) for load balancing	Better scalability, O(1) lookup	Large clusters with many services
userspace (legacy)	Traffic proxied through userspace process	Poor, high latency	Deprecated, avoid

How Service Traffic Flows (iptables mode):

When a Pod accesses a ClusterIP service:

Pod sends packet to 10.96.0.1:80 (service ClusterIP)
iptables rule intercepts the packet (DNAT)
Random Pod endpoint is selected (e.g., 10.244.1.5:8080)
Packet destination is rewritten to the selected Pod IP
Packet is routed to the target Pod
Return traffic follows conntrack for consistent source NAT

IPVS Mode Advantages:

IPVS uses hash tables for O(1) rule lookup versus O(n) iptables chain traversal:

Better performance with thousands of services
Multiple load balancing algorithms (round-robin, least connections, etc.)
Kernel-level connection tracking
Better metrics and observability

kube-proxy Alternatives

Some CNI plugins (Cilium, Calico eBPF mode) can replace kube-proxy entirely. They implement service load balancing using eBPF for even better performance and additional features like session affinity and native load balancing without NAT.

Container Runtime: Where Containers Actually Run

The container runtime is the software responsible for actually running containers. While often overlooked, choosing and understanding your container runtime affects security, performance, and compatibility.

The OCI Standard:

The Open Container Initiative (OCI) defines standards for container images and runtimes:

Image Spec: Container image format
Runtime Spec: How to run a container (process, mounts, cgroups, etc.)

OCI compliance means different runtimes are interchangeable at the low level.

Runtime Layers:

Container runtimes exist in layers:

kubelet
    ↓ (CRI - Container Runtime Interface)
High-Level Runtime (containerd, CRI-O)
    ↓ (OCI Runtime Spec)
Low-Level Runtime (runc, crun, kata-runtime)
    ↓
Linux Kernel (namespaces, cgroups, seccomp)

Container Runtime Comparison
Runtime	Type	Key Characteristics	Best For
containerd	High-level	CNCF graduated, Docker-extracted, production-proven	General Kubernetes deployments
CRI-O	High-level	Kubernetes-focused, minimal, stable	OpenShift, security-focused deployments
runc	Low-level (OCI)	Reference implementation, standard Linux containers	Default low-level runtime
crun	Low-level (OCI)	Written in C, faster startup than runc	Performance-sensitive workloads
gVisor (runsc)	Low-level (OCI)	User-space kernel for isolation	Untrusted/multi-tenant workloads
Kata Containers	Low-level (OCI)	Lightweight VMs for containers	Strong isolation requirements

containerd Architecture:

containerd (the most common high-level runtime) provides:

Image pull and push: Registry interaction, layer deduplication
Snapshot management: Copy-on-write filesystem layers
Container lifecycle: Create, start, stop, pause, resume
Content storage: Image and layer caching
Namespace isolation: Multiple independent container groups

Security Runtimes:

For enhanced isolation (multi-tenant clusters, untrusted code), specialized runtimes provide stronger boundaries:

gVisor: Intercepts syscalls in userspace, providing a security boundary without full VM overhead
Kata Containers: Runs each Pod in a lightweight VM, hardware-level isolation
Firecracker: MicroVM technology (AWS Lambda/Fargate), millisecond boot times

RuntimeClass for Mixed Workloads

Kubernetes RuntimeClass lets you specify different runtimes per Pod. Run trusted workloads with runc for performance, untrusted workloads with gVisor for security—all in the same cluster.

How Components Interact: The Complete Picture

Now that we understand each component individually, let's trace a complete workflow to see how they work together. We'll follow the lifecycle of a Deployment from creation to running Pods.

Step-by-Step: Creating a Deployment

complete-workflow.txt
Complete Workflow: kubectl apply -f deployment.yaml
 
┌──────────────────────────────────────────────────────────────────┐
│  1. kubectl sends Deployment to API Server                       │
│     POST /apis/apps/v1/namespaces/default/deployments            │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  2. API Server authenticates, authorizes, runs admission        │
│     webhooks, then persists Deployment to etcd                  │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  3. Deployment Controller sees new Deployment (via watch)        │
│     Creates ReplicaSet with pod template                        │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  4. ReplicaSet Controller sees new ReplicaSet                    │
│     Creates N Pod objects (pods have no nodeName yet)           │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  5. Scheduler sees unscheduled Pods                              │
│     Runs filter/score, assigns each Pod to a node               │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  6. kubelet on each assigned node sees Pod                       │
│     Pulls images, mounts volumes, creates containers            │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  7. Container Runtime (containerd) runs containers               │
│     OS-level isolation via namespaces, cgroups                  │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  8. kubelet updates Pod status → API Server → etcd              │
│     Pod shows as Running                                        │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  9. Endpoint Controller sees Running Pods                       │
│     Updates Endpoints for Services selecting these Pods          │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  10. kube-proxy sees updated Endpoints                           │
│      Updates iptables/IPVS rules on all nodes                   │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  RESULT: Traffic to Service ClusterIP reaches Pods 🎉            │
└──────────────────────────────────────────────────────────────────┘

Key Observations:

etcd is touched only by the API server: All other components communicate via API server watches
Controllers are event-driven but level-triggered: They react to changes but always reconcile based on current state, not event history
No component directly calls another: Communication is via shared state in etcd, accessed through the API server
Asynchronous by design: Each controller operates independently; there's no synchronous orchestration
Self-healing emerges from reconciliation: If a Pod dies, ReplicaSet controller notices and creates a replacement—no central coordinator needed

Summary: Kubernetes Component Architecture

We've explored every core component of the Kubernetes architecture. Let's consolidate the key takeaways:

Key Takeaways

•Control Plane manages cluster state: API server, etcd, scheduler, and controllers work together to maintain the desired state
•Data Plane runs workloads: kubelet, kube-proxy, and container runtime execute the control plane's decisions on each node
•API Server is the single gateway: All components communicate through the API server, never directly with each other
•etcd is the source of truth: A consistent, distributed key-value store holding all cluster configuration
•Controllers implement the declarative model: The reconciliation loop pattern continuously drives current state toward desired state
•Scheduler optimizes placement: Filtering and scoring ensure Pods land on suitable nodes
•kubelet is the node-level executor: It manages the full Pod lifecycle on each worker node
•Container runtime does the actual work: OCI-compliant runtimes like containerd handle image and container operations

What's Next:

Now that you understand the components, the next page dives into the core Kubernetes objects—Pods, Deployments, and Services. You'll see how these abstractions build on the component architecture to provide powerful application management capabilities.

Page Complete

You now have a comprehensive understanding of Kubernetes component architecture. This foundation will inform your ability to debug cluster issues, design resilient deployments, and make informed decisions about Kubernetes configurations. Next, we'll explore how Pods, Deployments, and Services work together.

Kubernetes Components: The Building Blocks of Container Orchestration

The Architecture That Powers Modern Infrastructure

What You Will Learn

Architectural Overview: Control Plane and Data Plane

Control Plane vs Data Plane Components
Component Type	Components	Primary Responsibility
Control Plane	kube-apiserver, etcd, kube-scheduler, kube-controller-manager, cloud-controller-manager	Cluster state management, scheduling, reconciliation
Data Plane	kubelet, kube-proxy, container runtime	Running containers, networking, health monitoring

Why this separation matters:

This architectural division enables several critical capabilities:

Scalability: Control plane and data plane can scale independently. You can add hundreds of worker nodes without proportionally scaling control plane components.
Isolation: Control plane failures don't immediately kill running workloads. If the API server goes down, existing Pods continue running—you just can't make changes.
Security: The control plane can be isolated in separate network segments, reducing attack surface. Worker nodes only need limited access to specific control plane endpoints.
Maintenance: Control plane components can be upgraded or restarted with minimal impact on running workloads.

Production Reality

kube-apiserver: The Cluster's Front Door

Core Responsibilities:

API Endpoint Exposure: Serves the Kubernetes REST API over HTTPS, handling CRUD operations for all Kubernetes objects (Pods, Services, ConfigMaps, etc.)
Authentication & Authorization: Validates the identity of all requests (via certificates, tokens, etc.) and enforces RBAC policies to determine what actions are permitted.
Admission Control: Runs admission controllers that can mutate or validate requests before they're persisted. This is where policies like resource quotas, pod security standards, and webhook-based validations are enforced.
etcd Gateway: Serves as the only component that directly communicates with etcd. All cluster state changes go through the API server to etcd.
Watch Mechanism: Supports efficient watches that allow clients to subscribe to changes rather than polling. This is how controllers learn about new or modified resources.

apiserver-request-flow.txt
Request Flow Through the API Server:
 
1. Request Arrives
   └─► Authentication
       ├─ Client Certificate
       ├─ Bearer Token
       ├─ OpenID Connect
       └─ Webhook Token Authentication
 
2. Authorization (after authentication succeeds)
   └─► RBAC Check
       ├─ Role/ClusterRole lookup
       ├─ RoleBinding/ClusterRoleBinding check
       └─ Decision: Allow/Deny
 
3. Admission Controllers (if authorized)
   └─► Mutating Admission
       ├─ Default values injection
       ├─ Sidecar injection (Istio, etc.)
       └─ Label/annotation addition
   └─► Validating Admission
       ├─ Resource quota check
       ├─ Pod security policies
       └─ Custom webhook validations
 
4. Persistence (if all checks pass)
   └─► Write to etcd
       └─► Return success response
 
5. Notification
   └─► Watch subscribers notified of change

Scalability Characteristics:

Key Performance Considerations:

Request throttling: The API server implements priority and fairness to prevent any single client from overwhelming the server
Watch efficiency: Long-running watch connections allow clients to receive events without constant polling
Caching: Frequently accessed data is cached to reduce etcd load
Index-based queries: Efficient label selectors use indexes for fast lookups

API Server as Single Point of Failure

etcd: The Cluster's Distributed Memory

Why etcd?

Kubernetes requires a storage system with very specific properties:

Strong Consistency: When the API server writes a Pod spec, that write must be immediately visible to all readers. etcd provides this through the Raft consensus algorithm.
Watch Support: Controllers need to react to changes. etcd's native watch capability allows efficient event notification without polling.
Distributed & Fault-Tolerant: etcd can tolerate node failures while maintaining consistency. A 3-node cluster survives 1 failure; a 5-node cluster survives 2.
Transactional Operations: Compare-and-swap operations enable safe concurrent updates—critical for controllers competing to update resources.

etcd Cluster Sizing Guidelines
Cluster Size	Failure Tolerance	Recommended Use Case
1 node	0 failures	Development/testing only
3 nodes	1 failure	Small production clusters
5 nodes	2 failures	Large production clusters requiring higher availability
7 nodes	3 failures	Rarely needed; additional nodes add consensus overhead

Data Organization in etcd:

Kubernetes organizes data in etcd using a hierarchical key structure:

/registry/pods/<namespace>/<pod-name>
/registry/services/<namespace>/<service-name>
/registry/secrets/<namespace>/<secret-name>
/registry/deployments/<namespace>/<deployment-name>

This organization enables efficient prefix-based watches (watch all pods in a namespace) and range queries.

The Raft Consensus Algorithm:

etcd uses Raft for distributed consensus. Here's how it works at a high level:

Leader Election: One node is elected leader; all writes go through the leader
Log Replication: The leader replicates entries to followers before committing
Commit: Once a majority acknowledges, the entry is committed and applied
Failover: If the leader fails, a new election occurs within milliseconds

etcd is Critical—Back It Up

Performance Tuning Considerations:

Disk I/O: etcd is write-intensive; use SSDs with low latency. Network-attached storage can introduce unacceptable delays.
Network Latency: Keep etcd nodes close (same availability zone ideally). Cross-AZ latency impacts commit times.
Compaction: etcd stores historical revisions; periodic compaction prevents unbounded storage growth.
Resource Isolation: Run etcd on dedicated nodes in large clusters to prevent resource contention with other workloads.

kube-scheduler: Intelligent Workload Placement

The Scheduling Process:

Scheduling happens in two phases:

1. Filtering Phase (Predicates) The scheduler filters out nodes that cannot run the Pod:

Does the node have sufficient CPU and memory?
Does the Pod tolerate the node's taints?
Does the node satisfy node affinity requirements?
Are required ports available?
Does the node's labels match node selectors?

2. Scoring Phase (Priorities) Among feasible nodes, the scheduler scores each to find the optimal placement:

Balanced resource utilization
Data locality (volume zones)
Pod anti-affinity spread
Custom priority functions

scheduling-decision-flow.txt
Scheduling Decision Flow:
 
Pod Created (no nodeName) ──► Scheduler Picks Up
                                      │
                                      ▼
                          ┌─────────────────────┐
                          │   FILTERING PHASE   │
                          └─────────────────────┘
                                      │
           ┌──────────────────────────┼──────────────────────────┐
           ▼                          ▼                          ▼
    Node Affinity?            Resource Fit?              Tolerations?
           │                          │                          │
           ▼                          ▼                          ▼
    ┌─────────────────────────────────────────────────────────────┐
    │              Feasible Nodes (pass all filters)              │
    └─────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
                          ┌─────────────────────┐
                          │    SCORING PHASE    │
                          └─────────────────────┘
                                      │
           ┌──────────────────────────┼──────────────────────────┐
           ▼                          ▼                          ▼
   Resource Balance        Spreading Score           Locality Score
   (LeastRequested)        (Pod Anti-Affinity)      (Volume Zone)
           │                          │                          │
           ▼                          ▼                          ▼
    ┌─────────────────────────────────────────────────────────────┐
    │         Final Score = Σ (weight × individual_score)         │
    └─────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
                        Node with Highest Score Wins
                                      │
                                      ▼
                    Bind Pod to Node (write to etcd)

Scheduler Extensibility:

The default scheduler can be extended through several mechanisms:

Scheduling Profiles: Configure multiple scheduling behaviors and select per-Pod
Scheduler Plugins: The scheduling framework allows injecting custom logic at various extension points
Custom Schedulers: You can run completely separate schedulers alongside the default one

Common Scheduling Failures:

Failure Reason	Typical Cause	Resolution
Insufficient cpu/memory	Resource requests exceed available capacity	Add nodes, reduce requests, or use cluster autoscaler
No nodes match node selector	Labels mismatch	Correct labels or update selector
Taints not tolerated	Node tainted without matching toleration	Add toleration or remove taint
Volume zone conflict	Volume in different zone than viable nodes	Create volume in correct zone

Scheduler Performance at Scale

kube-controller-manager: The Reconciliation Engine

The Controller Pattern:

Every controller follows the same pattern:

Watch: Subscribe to relevant resource changes via the API server
Observe: When changes occur, read the current state
Compare: Calculate the difference between current and desired states
Act: Take actions to reconcile the difference
Repeat: Continue the loop indefinitely

Key Controllers Bundled in kube-controller-manager:

Core Kubernetes Controllers
Controller	Watches	Manages	Key Behavior
ReplicaSet Controller	ReplicaSets, Pods	Pod count	Creates/deletes Pods to match desired replica count
Deployment Controller	Deployments, ReplicaSets	ReplicaSet versions	Manages rolling updates, rollbacks, and revision history
Node Controller	Nodes	Node health	Marks nodes as unhealthy, evicts Pods from dead nodes
Service Account Controller	Namespaces	ServiceAccounts	Creates default service account in new namespaces
Endpoint Controller	Services, Pods	Endpoints	Updates endpoint lists as Pods come and go
Job Controller	Jobs, Pods	Job completions	Creates Pods for Jobs, tracks completion/failure
Namespace Controller	Namespaces	Namespace deletion	Cleans up all resources when namespace deleted
PV/PVC Controller	PersistentVolumes, PersistentVolumeClaims	Volume binding	Matches claims to volumes, handles reclaim

Reconciliation in Action—ReplicaSet Example:

Consider a ReplicaSet with replicas: 3:

The ReplicaSet controller watches ReplicaSets and Pods
When notified, it counts Pods matching the ReplicaSet's selector
If count < 3: Create (3 - count) new Pods
If count > 3: Delete (count - 3) excess Pods
If count == 3: No action needed

This loop runs continuously. If a Pod dies, the controller creates a replacement. If you scale to 5, the controller creates 2 more. The desired state is always maintained.

Leader Election:

Controller Loops Are Eventually Consistent

cloud-controller-manager: Cloud Provider Integration

Controllers in the CCM:

Node Controller (Cloud Portion)
- Initializes nodes with cloud-specific metadata (zone, instance type, etc.)
- Detects when cloud instances are deleted and removes corresponding Node objects
- Updates node addresses from cloud provider
Route Controller
- Configures cloud provider routes for inter-node networking
- Ensures Pod network routes exist between nodes
Service Controller
- Provisions cloud load balancers for type: LoadBalancer services
- Updates load balancer configs as service endpoints change
- Cleans up load balancers when services are deleted

cloud-integration-flow.txt
Cloud Controller Manager in Action:
 
┌────────────────────────────────────────────────────────────────┐
│                    Service type: LoadBalancer                   │
│                        created in API                           │
└────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌────────────────────────────────────────────────────────────────┐
│              Service Controller (in CCM) watches                │
└────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌────────────────────────────────────────────────────────────────┐
│         CCM calls cloud provider API to create LB               │
│   (e.g., CreateLoadBalancer on AWS ELB/NLB, GCP GCLB)          │
└────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌────────────────────────────────────────────────────────────────┐
│     Cloud provider provisions load balancer, returns IP         │
└────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌────────────────────────────────────────────────────────────────┐
│       CCM updates Service.status.loadBalancer.ingress           │
│              with external IP/hostname                          │
└────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌────────────────────────────────────────────────────────────────┐
│     Traffic flows: Internet → Cloud LB → NodePort → Pod         │
└────────────────────────────────────────────────────────────────┘

Provider Implementations:

Each major cloud provider maintains their own CCM:

Running Without CCM:

On bare-metal or on-premises deployments, you typically don't run a CCM. Without it:

LoadBalancer services stay in Pending state (use MetalLB or similar for bare-metal LB)
Node zone/region labels must be applied manually
No automatic cloud resource cleanup

Managed Kubernetes Hides This Complexity

kubelet: The Agent That Runs Your Containers

The kubelet is the primary node agent—it runs on every worker node and is responsible for ensuring containers are running as specified. It's the component that actually makes Pods come to life.

Core Responsibilities:

Pod Lifecycle Management
- Receives Pod specs (via API server or static Pod files)
- Instructs container runtime to create/start/stop containers
- Monitors container health and restarts failed containers
- Reports Pod and container status back to API server
Volume Management
- Mounts required volumes before containers start
- Handles volume plugins (CSI drivers, etc.)
- Unmounts volumes when Pods terminate
Resource Enforcement
- Configures cgroups to enforce resource limits
- Triggers eviction when node resources are exhausted
- Reports node capacity and allocatable resources
Health Monitoring
- Executes liveness, readiness, and startup probes
- Updates container status based on probe results
- Removes failed containers and signals for replacement

kubelet-pod-lifecycle.txt
kubelet Pod Lifecycle:
 
┌─────────────────────────────────────────────────────────────────┐
│  1. Pod Scheduled to Node (nodeName set by scheduler)           │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  2. kubelet Sees Pod (watching pods assigned to its node)       │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  3. Volume Setup                                                 │
│     • Mount ConfigMaps, Secrets, PVCs                           │
│     • Wait for volume attachment (if cloud volumes)              │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  4. Image Pull                                                   │
│     • Check local cache                                         │
│     • Pull from registry if needed (using imagePullSecrets)      │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  5. Container Creation                                           │
│     • Create sandbox (pause container for networking)            │
│     • Create app containers via Container Runtime Interface      │
│     • Apply cgroup limits, security contexts                     │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  6. Container Start                                              │
│     • Start containers in order (init containers first)         │
│     • Execute postStart lifecycle hooks                          │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  7. Probe Execution (continuous)                                 │
│     • Startup probe → Liveness probe → Readiness probe           │
│     • Update container status based on results                   │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│  8. Status Reporting                                             │
│     • Update Pod status in API server                            │
│     • Report node conditions and capacity                        │
└─────────────────────────────────────────────────────────────────┘

Container Runtime Interface (CRI):

kubelet doesn't run containers directly—it delegates to a container runtime via CRI. This abstraction allows swapping runtimes:

Runtime	Description	Use Case
containerd	Industry-standard runtime, graduated CNCF project	Default for most distributions
CRI-O	Lightweight runtime optimized for Kubernetes	OpenShift, minimalist setups
Docker (via cri-dockerd)	Docker Engine with CRI shim	Legacy compatibility

Pod Eviction:

When node resources are exhausted, kubelet evicts Pods based on priority:

BestEffort Pods (no requests/limits) are evicted first
Burstable Pods (requests < limits) are next
Guaranteed Pods (requests == limits) are evicted last

kubelet Controls Node Health

kube-proxy: Service Networking Made Possible

What kube-proxy Does:

Watches Services and Endpoints: Subscribes to changes via API server
Configures Packet Forwarding: Sets up rules to intercept service ClusterIP traffic and forward to Pod IPs
Load Balances: Distributes traffic across endpoint Pods
Handles NodePort Services: Opens ports on nodes for external access

Operating Modes:

kube-proxy can operate in different modes with significant performance implications:

kube-proxy Operating Modes
Mode	Mechanism	Performance	When to Use
iptables	Linux iptables rules for NAT/forwarding	Good for <1000 services	Default, widely compatible
IPVS	Linux IPVS (IP Virtual Server) for load balancing	Better scalability, O(1) lookup	Large clusters with many services
userspace (legacy)	Traffic proxied through userspace process	Poor, high latency	Deprecated, avoid

How Service Traffic Flows (iptables mode):

When a Pod accesses a ClusterIP service:

Pod sends packet to 10.96.0.1:80 (service ClusterIP)
iptables rule intercepts the packet (DNAT)
Random Pod endpoint is selected (e.g., 10.244.1.5:8080)
Packet destination is rewritten to the selected Pod IP
Packet is routed to the target Pod
Return traffic follows conntrack for consistent source NAT

IPVS Mode Advantages:

IPVS uses hash tables for O(1) rule lookup versus O(n) iptables chain traversal:

Better performance with thousands of services
Multiple load balancing algorithms (round-robin, least connections, etc.)
Kernel-level connection tracking
Better metrics and observability

kube-proxy Alternatives

Container Runtime: Where Containers Actually Run

The OCI Standard:

The Open Container Initiative (OCI) defines standards for container images and runtimes:

Image Spec: Container image format
Runtime Spec: How to run a container (process, mounts, cgroups, etc.)

OCI compliance means different runtimes are interchangeable at the low level.

Runtime Layers:

Container runtimes exist in layers:

kubelet
    ↓ (CRI - Container Runtime Interface)
High-Level Runtime (containerd, CRI-O)
    ↓ (OCI Runtime Spec)
Low-Level Runtime (runc, crun, kata-runtime)
    ↓
Linux Kernel (namespaces, cgroups, seccomp)

Container Runtime Comparison
Runtime	Type	Key Characteristics	Best For
containerd	High-level	CNCF graduated, Docker-extracted, production-proven	General Kubernetes deployments
CRI-O	High-level	Kubernetes-focused, minimal, stable	OpenShift, security-focused deployments
runc	Low-level (OCI)	Reference implementation, standard Linux containers	Default low-level runtime
crun	Low-level (OCI)	Written in C, faster startup than runc	Performance-sensitive workloads
gVisor (runsc)	Low-level (OCI)	User-space kernel for isolation	Untrusted/multi-tenant workloads
Kata Containers	Low-level (OCI)	Lightweight VMs for containers	Strong isolation requirements

containerd Architecture:

containerd (the most common high-level runtime) provides:

Image pull and push: Registry interaction, layer deduplication
Snapshot management: Copy-on-write filesystem layers
Container lifecycle: Create, start, stop, pause, resume
Content storage: Image and layer caching
Namespace isolation: Multiple independent container groups

Security Runtimes:

For enhanced isolation (multi-tenant clusters, untrusted code), specialized runtimes provide stronger boundaries:

gVisor: Intercepts syscalls in userspace, providing a security boundary without full VM overhead
Kata Containers: Runs each Pod in a lightweight VM, hardware-level isolation
Firecracker: MicroVM technology (AWS Lambda/Fargate), millisecond boot times

RuntimeClass for Mixed Workloads

Kubernetes RuntimeClass lets you specify different runtimes per Pod. Run trusted workloads with runc for performance, untrusted workloads with gVisor for security—all in the same cluster.

How Components Interact: The Complete Picture

Now that we understand each component individually, let's trace a complete workflow to see how they work together. We'll follow the lifecycle of a Deployment from creation to running Pods.

Step-by-Step: Creating a Deployment

complete-workflow.txt
Complete Workflow: kubectl apply -f deployment.yaml
 
┌──────────────────────────────────────────────────────────────────┐
│  1. kubectl sends Deployment to API Server                       │
│     POST /apis/apps/v1/namespaces/default/deployments            │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  2. API Server authenticates, authorizes, runs admission        │
│     webhooks, then persists Deployment to etcd                  │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  3. Deployment Controller sees new Deployment (via watch)        │
│     Creates ReplicaSet with pod template                        │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  4. ReplicaSet Controller sees new ReplicaSet                    │
│     Creates N Pod objects (pods have no nodeName yet)           │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  5. Scheduler sees unscheduled Pods                              │
│     Runs filter/score, assigns each Pod to a node               │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  6. kubelet on each assigned node sees Pod                       │
│     Pulls images, mounts volumes, creates containers            │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  7. Container Runtime (containerd) runs containers               │
│     OS-level isolation via namespaces, cgroups                  │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  8. kubelet updates Pod status → API Server → etcd              │
│     Pod shows as Running                                        │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  9. Endpoint Controller sees Running Pods                       │
│     Updates Endpoints for Services selecting these Pods          │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  10. kube-proxy sees updated Endpoints                           │
│      Updates iptables/IPVS rules on all nodes                   │
└──────────────────────────────────────────────────────────────────┘
                ↓
┌──────────────────────────────────────────────────────────────────┐
│  RESULT: Traffic to Service ClusterIP reaches Pods 🎉            │
└──────────────────────────────────────────────────────────────────┘

Key Observations:

etcd is touched only by the API server: All other components communicate via API server watches
Controllers are event-driven but level-triggered: They react to changes but always reconcile based on current state, not event history
No component directly calls another: Communication is via shared state in etcd, accessed through the API server
Asynchronous by design: Each controller operates independently; there's no synchronous orchestration
Self-healing emerges from reconciliation: If a Pod dies, ReplicaSet controller notices and creates a replacement—no central coordinator needed

Summary: Kubernetes Component Architecture

We've explored every core component of the Kubernetes architecture. Let's consolidate the key takeaways:

Key Takeaways

•Control Plane manages cluster state: API server, etcd, scheduler, and controllers work together to maintain the desired state
•Data Plane runs workloads: kubelet, kube-proxy, and container runtime execute the control plane's decisions on each node
•API Server is the single gateway: All components communicate through the API server, never directly with each other
•etcd is the source of truth: A consistent, distributed key-value store holding all cluster configuration
•Controllers implement the declarative model: The reconciliation loop pattern continuously drives current state toward desired state
•Scheduler optimizes placement: Filtering and scoring ensure Pods land on suitable nodes
•kubelet is the node-level executor: It manages the full Pod lifecycle on each worker node
•Container runtime does the actual work: OCI-compliant runtimes like containerd handle image and container operations

What's Next:

Page Complete