Kubernetes Storage - Learning Module

Loading content...

0/273

Data Persistence Patterns

Beyond Basic Storage: Patterns for Production Data

Provisioning storage is just the beginning. Production environments require sophisticated patterns for data protection, recovery, migration, and optimization. The difference between a development cluster and a production-ready system often lies in how data persistence is handled.

The reality of production data:

Data is the most valuable asset in most systems—its loss is catastrophic
Hardware fails, regions experience outages, human errors occur
Storage needs evolve—applications grow, requirements change
Compliance demands specific data handling procedures

This page covers the patterns that transform basic Kubernetes storage into a production-grade data platform: backup and restore strategies, disaster recovery architectures, data migration techniques, storage tiering, local volume optimization, and operational best practices that protect your most critical asset.

What You Will Learn

By the end of this page, you will understand backup and restore patterns for Kubernetes data, disaster recovery architectures, data migration strategies, storage tiering approaches, local volume patterns for high performance, and operational patterns for production data management.

Backup Strategies for Kubernetes

Kubernetes backup strategies must address multiple layers: cluster state, application configuration, and persistent data. Each layer requires different approaches and tools.

Backup layers:

Kubernetes Backup Layers
Layer	What to Backup	Tools/Methods	Frequency
Cluster State	etcd, control plane configs	etcd snapshot, Kubeadm	Daily + before upgrades
Kubernetes Objects	Deployments, Services, ConfigMaps, Secrets	kubectl, Velero, Kasten	After changes, hourly
Application Config	Helm values, GitOps manifests	Git repository	Continuous (version control)
Persistent Data	PVC contents, databases	Volume snapshots, app-native backup	Hourly/daily RPO-dependent

Volume snapshot-based backup:

Volume snapshots are the most straightforward approach for backing up PVCs:

backup-strategy.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# Comprehensive backup with Velero (backup application + data)
# Install Velero first: velero install --provider <cloud> ...
 
# Backup policy: namespace with all resources and PVCs
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: production-backup
  namespace: velero
spec:
  # What to backup
  includedNamespaces:
    - production
    - payment-service
  includedResources:
    - '*'
  
  # Exclude temporary resources
  excludedResources:
    - events
    - events.events.k8s.io
  
  # Snapshot PVCs using volume snapshotter
  snapshotVolumes: true
  
  # Label selector for specific pods
  labelSelector:
    matchLabels:
      backup: enabled
  
  # Time-to-live for backup
  ttl: 720h  # 30 days
  
  # Storage location
  storageLocation: default
 
---
# Scheduled backup (hourly)
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: hourly-backup
  namespace: velero
spec:
  schedule: "0 * * * *"  # Every hour
  template:
    includedNamespaces:
      - production
    snapshotVolumes: true
    ttl: 168h  # 7 days
    storageLocation: default
 
---
# Cross-region backup location
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: dr-region
  namespace: velero
spec:
  provider: aws
  objectStorage:
    bucket: velero-backups-dr
    prefix: production
  config:
    region: us-west-2  # Different from primary (us-east-1)
    
---
# Volume snapshot location for DR
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: dr-snapshots
  namespace: velero
spec:
  provider: aws
  config:
    region: us-west-2

Application Consistency

Volume snapshots are crash-consistent, not application-consistent. For databases, use application-native backup tools (pg_dump, mysqldump, mongodump) or freeze I/O before snapshots. Velero supports pre-backup hooks for this purpose.

Application-native backup patterns:

For databases and stateful applications, application-native backups provide stronger consistency guarantees:

application-native-backup.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# CronJob for PostgreSQL backup to S3
apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
  namespace: production
spec:
  schedule: "0 */6 * * *"  # Every 6 hours
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: backup
              image: postgres:15
              env:
                - name: PGHOST
                  value: "postgresql.production.svc.cluster.local"
                - name: PGUSER
                  valueFrom:
                    secretKeyRef:
                      name: postgres-credentials
                      key: username
                - name: PGPASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: postgres-credentials
                      key: password
                - name: AWS_ACCESS_KEY_ID
                  valueFrom:
                    secretKeyRef:
                      name: aws-s3-credentials
                      key: access-key
                - name: AWS_SECRET_ACCESS_KEY
                  valueFrom:
                    secretKeyRef:
                      name: aws-s3-credentials
                      key: secret-key
              command:
                - /bin/bash
                - -c
                - |
                  set -e
                  TIMESTAMP=$(date +%Y%m%d_%H%M%S)
                  BACKUP_FILE="postgres_backup_$TIMESTAMP.sql.gz"
                  
                  echo "Starting PostgreSQL backup..."
                  pg_dumpall | gzip > /tmp/$BACKUP_FILE
                  
                  echo "Uploading to S3..."
                  apt-get update && apt-get install -y awscli
                  aws s3 cp /tmp/$BACKUP_FILE s3://database-backups/postgres/$BACKUP_FILE
                  
                  echo "Backup completed: $BACKUP_FILE"
          restartPolicy: OnFailure
 
---
# Velero pre-backup hook for consistent MySQL snapshot
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
  namespace: production
  annotations:
    # Velero hooks for application consistency
    pre.hook.backup.velero.io/container: mysql
    pre.hook.backup.velero.io/command: '["/bin/sh", "-c", "mysql -u root -p$MYSQL_ROOT_PASSWORD -e "FLUSH TABLES WITH READ LOCK; SYSTEM sleep 5;""]'
    pre.hook.backup.velero.io/timeout: 30s
    post.hook.backup.velero.io/container: mysql
    post.hook.backup.velero.io/command: '["/bin/sh", "-c", "mysql -u root -p$MYSQL_ROOT_PASSWORD -e "UNLOCK TABLES;""]'
spec:
  # ... deployment spec

Disaster Recovery Architectures

Disaster recovery (DR) for Kubernetes storage requires planning for various failure scenarios, from individual node failures to complete region outages. DR strategies are characterized by two key metrics:

RPO (Recovery Point Objective): Maximum acceptable data loss, measured in time
RTO (Recovery Time Objective): Maximum acceptable downtime until recovery

DR strategies by RPO/RTO:

DR Strategies for Kubernetes Storage
Strategy	RPO	RTO	Cost	Complexity
Backup & Restore	Hours	Hours	Low	Low
Pilot Light	Minutes-Hours	Minutes	Medium	Medium
Warm Standby	Minutes	Minutes	High	High
Active-Active	Zero/Near-zero	Seconds	Very High	Very High

Pattern 1: Backup and Restore (Cold DR)

Simplest approach—backups in DR region, restore on-demand when disaster occurs:

Backup & Restore Implementation

•Automated backups to cross-region object storage (S3, GCS, Azure Blob)
•Volume snapshot replication to DR region
•Infrastructure-as-Code for rapid cluster provisioning in DR
•Runbooks for restore procedures, tested quarterly
•RPO: Time since last backup (typically hours); RTO: Time to provision and restore (typically hours)

Pattern 2: Pilot Light (Warm Infrastructure)

Core infrastructure running in DR region, scaled down, ready to activate:

pilot-light-dr.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# DR Region: Minimal cluster with critical data sync
# Primary Region: us-east-1, DR Region: us-west-2
 
# StatefulSet in DR region - scaled to 0, ready to activate
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: database
  namespace: production
  annotations:
    dr.example.com/role: standby
spec:
  replicas: 0  # Scaled down - no pods running
  serviceName: database
  selector:
    matchLabels:
      app: database
  template:
    spec:
      containers:
        - name: postgres
          image: postgres:15
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 100Gi
 
---
# Data sync - Replicate snapshots to DR region
# AWS example: Cross-region EBS snapshot copy
apiVersion: batch/v1
kind: CronJob
metadata:
  name: snapshot-replicator
  namespace: velero
spec:
  schedule: "*/15 * * * *"  # Every 15 minutes
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: replicate
              image: amazon/aws-cli:latest
              env:
                - name: SOURCE_REGION
                  value: "us-east-1"
                - name: DEST_REGION
                  value: "us-west-2"
              command:
                - /bin/sh
                - -c
                - |
                  # Get latest snapshot for each volume
                  SNAPSHOTS=$(aws ec2 describe-snapshots \
                    --region $SOURCE_REGION \
                    --filters "Name=tag:kubernetes.io/created-for/pvc/namespace,Values=production" \
                    --query 'Snapshots[*].SnapshotId' --output text)
                  
                  for SNAP in $SNAPSHOTS; do
                    echo "Copying snapshot $SNAP to $DEST_REGION"
                    aws ec2 copy-snapshot \
                      --source-region $SOURCE_REGION \
                      --source-snapshot-id $SNAP \
                      --destination-region $DEST_REGION \
                      --description "DR copy of $SNAP"
                  done
          restartPolicy: OnFailure
 
---
# DR Activation runbook (conceptual - trigger via CI/CD or manual)
# 1. Scale up StatefulSet: kubectl scale sts database --replicas=3
# 2. Update DNS to DR region
# 3. Scale up application deployments
# 4. Verify connectivity and data integrity
# 5. Monitor and validate

Pattern 3: Active-Active (Multi-Region)

Both regions actively serving traffic with synchronized data:

Active-Active Requirements

•Synchronous or async replication — Database-level (PostgreSQL streaming, MySQL replication) or storage-level
•Conflict resolution — Last-writer-wins, vector clocks, or application-specific logic
•Global load balancing — Route53, Cloud DNS, or Akamai GTM for geographic routing
•Shared-nothing architecture — Each region can operate independently
•Eventual consistency tolerance — Application must handle cross-region replication lag

Choose DR Based on Business Requirements

Calculate the cost of downtime ($/hour) and data loss ($/MB lost). Compare against DR infrastructure costs to find the right balance. Most applications don't need active-active—pilot light with good automation often achieves acceptable RTO at much lower cost.

Data Migration Strategies

Data migration in Kubernetes is required for various scenarios: moving between storage classes, migrating to different cloud providers, resizing volumes, or consolidating data. Each scenario requires different approaches.

Migration scenarios:

Data Migration Scenarios
Scenario	Challenge	Recommended Approach
Storage class change	PVC storageClassName is immutable	Clone to new PVC or restore from snapshot
Volume resize (shrink)	Cannot shrink PVCs	Create smaller PVC, copy data, swap
Cross-namespace migration	PVC is namespace-scoped	Clone via snapshot, create in new namespace
Cross-cluster migration	No direct PVC transfer	Snapshot copy or application-level backup/restore
Cloud provider migration	Different storage backends	Application-level backup, object storage transfer

Pattern 1: Clone via Volume Snapshot

The cleanest approach when changing storage class or migrating within the same cluster/cloud:

snapshot-based-migration.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Step 1: Create snapshot of source PVC
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: migration-snapshot
  namespace: production
spec:
  volumeSnapshotClassName: ebs-snapshot-class
  source:
    persistentVolumeClaimName: old-database-data
 
---
# Step 2: Create new PVC from snapshot with different storage class
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: new-database-data
  namespace: production
spec:
  storageClassName: fast-ssd  # New storage class
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Gi  # Can be larger than original
  dataSource:
    name: migration-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
 
---
# Step 3: Update workload to use new PVC
# Option A: Direct switch (downtime)
# Option B: Blue-green deployment with new PVC
 
# After migration verified:
# Step 4: Delete old PVC (if data confirmed migrated)
# kubectl delete pvc old-database-data

Pattern 2: rsync-based migration

For cross-cluster or cross-cloud migrations where snapshots aren't viable:

rsync-migration.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Job to sync data between PVCs using rsync
# Use case: Cross-cluster migration, detailed control
apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration
  namespace: production
spec:
  template:
    spec:
      containers:
        - name: rsync
          image: instrumentisto/rsync-ssh:latest
          command:
            - /bin/sh
            - -c
            - |
              echo "Starting rsync migration..."
              
              # Option 1: Local rsync between mounted volumes
              rsync -avz --progress /source/ /destination/
              
              # Option 2: Remote rsync (if migrating to different cluster)
              # rsync -avz --progress /source/ user@remote-host:/destination/
              
              echo "Migration complete. Verifying..."
              diff -r /source /destination && echo "Verified OK" || echo "MISMATCH DETECTED"
          volumeMounts:
            - name: source
              mountPath: /source
              readOnly: true
            - name: destination
              mountPath: /destination
      restartPolicy: OnFailure
      volumes:
        - name: source
          persistentVolumeClaim:
            claimName: old-pvc
        - name: destination
          persistentVolumeClaim:
            claimName: new-pvc
 
---
# For live migration with minimal downtime:
# Pattern: rsync multiple passes
# 1. Initial sync while application running (can take hours for large data)
# 2. Quiesce application (stop writes)
# 3. Final incremental sync (seconds/minutes)
# 4. Switch application to new PVC
# 5. Resume application

Migration Downtime

Most migration patterns require application downtime during the final cutover. Plan migrations during maintenance windows. For zero-downtime, use database-native replication to synchronize data, then failover at the database level rather than storage level.

Storage Tiering Patterns

Not all data is equal—hot data needs fast storage, while cold data can tolerate slower, cheaper options. Storage tiering optimizes costs while maintaining performance where it matters.

Tiering strategies:

Storage Tier Design

•Hot tier (Premium SSD) — Active databases, real-time analytics, high-frequency writes. Highest IOPS, lowest latency, highest cost.
•Warm tier (Balanced SSD) — Read-heavy workloads, reporting databases, staging environments. Good IOPS at moderate cost.
•Cold tier (HDD) — Archives, backups, infrequently accessed logs. High capacity, low cost, higher latency.
•Archival tier (Object Storage) — Long-term retention, compliance archives. Lowest cost, highest retrieval latency.

tiered-storage-implementation.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# Tiered storage class hierarchy
# Hot tier: io2 for production databases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: hot-tier
  labels:
    tier: hot
    cost-per-gb: high
provisioner: ebs.csi.aws.com
parameters:
  type: io2
  iops: "16000"
  encrypted: "true"
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
 
---
# Warm tier: gp3 for general workloads
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: warm-tier
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  labels:
    tier: warm
    cost-per-gb: medium
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
 
---
# Cold tier: st1 for archives and logs
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cold-tier
  labels:
    tier: cold
    cost-per-gb: low
provisioner: ebs.csi.aws.com
parameters:
  type: st1
  encrypted: "true"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
 
---
# Example: Multi-tier application deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: log-processor
spec:
  # ... standard spec
  volumeClaimTemplates:
    # Hot data: active processing
    - metadata:
        name: hot-data
      spec:
        storageClassName: hot-tier
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 50Gi
    # Cold data: processed logs
    - metadata:
        name: cold-data
      spec:
        storageClassName: cold-tier
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 500Gi
 
---
# Resource quota per tier (prevent budget overruns)
apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-tier-quotas
  namespace: production
spec:
  hard:
    # Limit hot tier to 1TB per namespace
    hot-tier.storageclass.storage.k8s.io/requests.storage: 1Ti
    # Limit warm tier to 5TB
    warm-tier.storageclass.storage.k8s.io/requests.storage: 5Ti
    # Cold tier more flexible
    cold-tier.storageclass.storage.k8s.io/requests.storage: 20Ti

Automated Tiering

Some storage systems (NetApp, Portworx, Robin) offer automated data tiering—automatically moving data between tiers based on access patterns. For cloud-native, consider using application logic to archive old data to object storage (S3, GCS) and only keep hot data on PVCs.

Local Volume Patterns for Performance

Local volumes use storage directly attached to nodes, bypassing network storage overhead. They offer the highest performance but sacrifice the flexibility of network-attached storage.

Local volume characteristics:

Advantages

•Highest IOPS (direct NVMe access)
•Lowest latency (no network hop)
•Predictable performance
•No cloud storage costs
•Full bandwidth of local disk

Disadvantages

•No data mobility (tied to node)
•Node failure = data unavailable
•No dynamic provisioning
•Manual capacity management
•Pod scheduling constrained to node

Local Persistent Volume configuration:

local-volume-setup.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# Local volume StorageClass (no provisioner - static only)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-nvme
provisioner: kubernetes.io/no-provisioner  # No dynamic provisioning
volumeBindingMode: WaitForFirstConsumer    # Essential for local volumes
 
---
# Local PersistentVolume for NVMe disk
# Must be created manually per disk/node
apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-nvme-node1-disk1
  labels:
    node: node-1
    disk: nvme0n1
spec:
  capacity:
    storage: 1Ti
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-nvme
  local:
    path: /mnt/nvme-disk1  # Pre-formatted and mounted
  # Critical: nodeAffinity ties PV to specific node
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - node-1
 
---
# PVC for local storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-local-storage
  namespace: production
spec:
  storageClassName: local-nvme
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Gi
 
---
# StatefulSet with local storage and pod anti-affinity
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: high-perf-database
spec:
  serviceName: database
  replicas: 3
  selector:
    matchLabels:
      app: database
  template:
    metadata:
      labels:
        app: database
    spec:
      # Schedule to nodes with local storage labels
      nodeSelector:
        storage.kubernetes.io/local-nvme: "true"
      
      # Spread across nodes (each node has local storage)
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: database
              topologyKey: kubernetes.io/hostname
      
      containers:
        - name: database
          image: scylladb/scylla:5.2
          volumeMounts:
            - name: data
              mountPath: /var/lib/scylla
  
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        storageClassName: local-nvme
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 1Ti

Local volume provisioner:

The sig-storage-local-static-provisioner can automate local PV creation by discovering and managing local disks:

local-provisioner-config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Local volume provisioner ConfigMap
# Discovers disks in specified directories and creates PVs automatically
apiVersion: v1
kind: ConfigMap
metadata:
  name: local-provisioner-config
  namespace: kube-system
data:
  storageClassMap: |
    local-nvme:
      hostDir: /mnt/disks
      mountDir: /mnt/disks
      volumeMode: Filesystem
      fsType: ext4
      blockCleanerCommand:
        - "/scripts/shred.sh"
        - "2"
 
# Provisioner discovers disks in /mnt/disks/ and creates PVs
# Symlink format: /mnt/disks/<uniquename> -> /dev/nvme0n1p1

When to Use Local Volumes

Use local volumes for: latency-sensitive databases (ScyllaDB, Cassandra, Redis), applications with built-in replication that handle node failures, caching layers where data is reconstructible, and analytics workloads with local shuffle space. Always pair with application-level replication for HA.

Ephemeral Storage Patterns

Not all data needs persistence. Ephemeral storage patterns optimize for temporary data that can be regenerated or is only needed during pod lifetime.

Ephemeral volume types:

Ephemeral Storage Options in Kubernetes
Type	Backing	Persistence	Sharing	Use Case
emptyDir	Memory or disk	Pod lifetime	Same-pod containers	Scratch space, temp files
emptyDir (memory)	tmpfs (RAM)	Pod lifetime	Same-pod containers	In-memory cache, secrets
configMap	etcd → memory	Pod lifetime	Read-only	Configuration files
secret	etcd → memory (tmpfs)	Pod lifetime	Read-only	Credentials, certificates
Generic Ephemeral	StorageClass provisioned	Pod lifetime	Same-pod containers	Per-pod scratch with specific characteristics

ephemeral-storage-patterns.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# Pattern 1: emptyDir for scratch space
apiVersion: v1
kind: Pod
metadata:
  name: data-processor
spec:
  containers:
    - name: processor
      image: processor:v1
      volumeMounts:
        # Disk-backed scratch space
        - name: scratch
          mountPath: /scratch
        # Memory-backed for fastest access
        - name: cache
          mountPath: /cache
  volumes:
    # Standard emptyDir (uses node disk)
    - name: scratch
      emptyDir:
        sizeLimit: 5Gi  # Limit to prevent node resource exhaustion
    
    # Memory-backed emptyDir (tmpfs)
    - name: cache
      emptyDir:
        medium: Memory
        sizeLimit: 512Mi  # Counts against container memory limit
 
---
# Pattern 2: Sidecar communication via emptyDir
apiVersion: v1
kind: Pod
metadata:
  name: app-with-sidecar
spec:
  containers:
    # Main application
    - name: app
      image: myapp:v1
      volumeMounts:
        - name: shared
          mountPath: /shared
    
    # Log shipper sidecar
    - name: log-shipper
      image: fluent/fluent-bit:latest
      volumeMounts:
        - name: shared
          mountPath: /logs
          readOnly: true
  
  volumes:
    - name: shared
      emptyDir: {}
 
---
# Pattern 3: Generic Ephemeral Volume (CSI-provisioned per-pod storage)
# Useful when you need StorageClass features but no persistence
apiVersion: v1
kind: Pod
metadata:
  name: ml-training
spec:
  containers:
    - name: trainer
      image: tensorflow/tensorflow:latest-gpu
      volumeMounts:
        - name: training-scratch
          mountPath: /data/scratch
  volumes:
    - name: training-scratch
      ephemeral:
        volumeClaimTemplate:
          metadata:
            labels:
              type: ephemeral-scratch
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: fast-ssd
            resources:
              requests:
                storage: 500Gi
  # PVC auto-created with pod, auto-deleted when pod terminates
 
---
# Pattern 4: Resource limits for ephemeral storage
apiVersion: v1
kind: Pod
metadata:
  name: bounded-ephemeral
spec:
  containers:
    - name: app
      image: myapp:v1
      volumeMounts:
        - name: scratch
          mountPath: /scratch
      resources:
        requests:
          # Ephemeral storage request counts emptyDir + container writable layer
          ephemeral-storage: 2Gi
        limits:
          ephemeral-storage: 5Gi  # Pod evicted if exceeds this
  volumes:
    - name: scratch
      emptyDir: {}

Ephemeral Storage Eviction

Kubernetes can evict pods that exceed ephemeral storage limits or when nodes are under disk pressure. Always set ephemeral-storage resource limits and monitor kubelet's imagefs and nodefs pressure signals.

Operational Monitoring and Alerting

Proactive monitoring prevents storage-related outages. Key metrics, alerts, and dashboards are essential for production storage operations.

Critical storage metrics:

Must-Monitor Metrics

•Capacity utilization — kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes — Alert at 80%, critical at 90%
•Inode utilization — kubelet_volume_stats_inodes_used / kubelet_volume_stats_inodes — Alert at 80%, many small files can exhaust inodes before capacity
•PVC phase — kube_persistentvolumeclaim_status_phase — Alert on Pending > 5 minutes, Lost immediately
•PV phase — kube_persistentvolume_status_phase — Alert on Failed, Released without action plan
•I/O latency — Cloud-specific metrics or node_exporter — Latency spikes indicate performance issues
•I/O errors — CSI driver metrics, dmesg — Disk errors indicate hardware or driver issues
•Snapshot age — Custom metric — Alert if latest snapshot older than RPO

storage-alerting-rules.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Prometheus alerting rules for Kubernetes storage
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: storage-alerts
  namespace: monitoring
spec:
  groups:
    - name: kubernetes-storage
      rules:
        # Capacity alerts
        - alert: VolumeAlmostFull
          expr: |
            (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.80
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Volume {{ $labels.persistentvolumeclaim }} is over 80% full"
            description: "Volume in namespace {{ $labels.namespace }} is {{ humanizePercentage $value }} full"
            runbook_url: "https://runbooks.example.com/volume-almost-full"
        
        - alert: VolumeCriticallyFull
          expr: |
            (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.90
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "CRITICAL: Volume {{ $labels.persistentvolumeclaim }} is over 90% full"
        
        # Inode alerts
        - alert: VolumeInodesExhausting
          expr: |
            (kubelet_volume_stats_inodes_used / kubelet_volume_stats_inodes) > 0.80
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Volume {{ $labels.persistentvolumeclaim }} inodes over 80%"
        
        # PVC issues
        - alert: PVCPendingTooLong
          expr: |
            kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "PVC {{ $labels.persistentvolumeclaim }} stuck in Pending"
            description: "Check StorageClass provisioner, quotas, and cloud API limits"
        
        - alert: PVCLost
          expr: |
            kube_persistentvolumeclaim_status_phase{phase="Lost"} == 1
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: "CRITICAL: PVC {{ $labels.persistentvolumeclaim }} is LOST"
            description: "Underlying PV is no longer available. Data may be lost."
        
        # PV issues
        - alert: PVFailed
          expr: |
            kube_persistentvolume_status_phase{phase="Failed"} == 1
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: "CRITICAL: PV {{ $labels.persistentvolume }} is FAILED"
        
        # Orphaned PVCs (no pod using them)
        - alert: OrphanedPVC
          expr: |
            kube_persistentvolumeclaim_status_phase{phase="Bound"} == 1
            unless on(namespace, persistentvolumeclaim) 
            kube_pod_spec_volumes_persistentvolumeclaims_info
          for: 24h
          labels:
            severity: info
          annotations:
            summary: "PVC {{ $labels.persistentvolumeclaim }} is bound but unused for 24h"
            description: "Consider cleanup if no longer needed to save costs"

Summary: Data Persistence Mastery

Data persistence patterns transform basic Kubernetes storage into production-grade infrastructure. These patterns address the full lifecycle of data: protection, recovery, migration, optimization, and operational visibility.

Key Takeaways

•Backup is multi-layered — Cluster state, Kubernetes objects, and persistent data each need different backup strategies and tools.
•Application consistency matters — Volume snapshots are crash-consistent; use application-native backup or pre-snapshot hooks for stronger guarantees.
•DR is a spectrum — From backup/restore (hours RTO) to active-active (near-zero RTO). Choose based on business requirements vs. cost.
•Migration requires planning — Snapshot-based for same cloud, rsync for cross-cloud. Most patterns require downtime during final cutover.
•Storage tiering optimizes cost — Hot, warm, and cold tiers match data access patterns. Use quotas to prevent budget overruns.
•Local volumes offer peak performance — For applications with built-in replication. Sacrifice mobility for latency.
•Ephemeral storage has its place — emptyDir, tmpfs, and generic ephemeral volumes for scratch data. Monitor limits to avoid eviction.
•Monitor proactively — Capacity, inodes, PVC status, and I/O metrics. Alert before issues become outages.

Module complete:

You have now completed the Kubernetes Storage module. You understand the full stack from Persistent Volumes through Storage Classes, StatefulSet storage patterns, cloud provider integration, and production data persistence patterns. This knowledge enables you to design, deploy, and operate reliable storage infrastructure for any Kubernetes workload.

Module Complete

You now possess comprehensive knowledge of Kubernetes storage—from fundamental concepts through cloud integration and production operational patterns. You can design storage architectures that meet performance, availability, and cost requirements for enterprise workloads.