Kubernetes Storage - Learning Module

Loading content...

0/273

Cloud Provider Integration

Bridging Kubernetes and Cloud Storage

Kubernetes storage abstractions—Persistent Volumes, Storage Classes, and CSI drivers—are designed to be cloud-agnostic. However, understanding the specific behaviors, constraints, and capabilities of each cloud provider's storage offerings is essential for production deployments.

Each cloud provider offers distinct storage services:

AWS: EBS (block), EFS (file), FSx (specialty file systems)
Google Cloud: Persistent Disk (block), Filestore (file), Cloud Storage (object)
Azure: Managed Disks (block), Azure Files (file), Blob Storage (object)

These services differ in performance characteristics, availability models, pricing structures, and operational behaviors. This page explores how to effectively integrate these cloud storage services with Kubernetes for reliable, performant storage.

What You Will Learn

By the end of this page, you will understand AWS EBS integration with Kubernetes, Google Persistent Disk configurations, Azure Managed Disk options, cloud file storage services (EFS, Filestore, Azure Files), CSI driver installation and configuration, cross-AZ and cross-region considerations, and cloud-specific best practices.

AWS EBS Integration

Amazon Elastic Block Store (EBS) is the primary block storage service for AWS EC2 instances and the most common storage backend for Kubernetes on AWS (EKS or self-managed).

EBS fundamentals:

Block storage: Raw block devices attached to EC2 instances
Zone-local: Each EBS volume exists in a single Availability Zone
Persistent: Data persists independently of EC2 instance lifecycle
Snapshots: Point-in-time backups stored in S3 (cross-region possible)

AWS EBS Volume Types for Kubernetes
Volume Type	Use Case	IOPS	Throughput	Cost
gp3	General purpose, balanced workloads	3,000-16,000	125-1,000 MB/s	Lowest $/GB
gp2	Legacy general purpose	100-16,000 (burst)	128-250 MB/s	Higher than gp3
io2/io2 Block Express	Critical databases, high IOPS	64,000-256,000	1,000-4,000 MB/s	Highest
st1	Big data, log processing	500 (baseline)	500-500 MB/s	Low $/GB
sc1	Cold data, infrequent access	250 (baseline)	250-250 MB/s	Lowest

aws-ebs-storage-classes.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# AWS EBS CSI Driver Storage Classes
# Requires: aws-ebs-csi-driver installed in cluster
 
# High-performance gp3 for production
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3-fast
  annotations:
    description: "High-performance gp3 with 16K IOPS, 1000 MB/s throughput"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "16000"           # Provisioned IOPS (3000 baseline, up to 16000)
  throughput: "1000"      # MB/s (125 baseline, up to 1000)
  encrypted: "true"       # Encryption at rest
  kmsKeyId: alias/ebs-prod  # Customer-managed KMS key
  fsType: ext4
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
 
---
# Default gp3 for general workloads
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
  fsType: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
 
---
# io2 for critical databases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-io2-critical
provisioner: ebs.csi.aws.com
parameters:
  type: io2
  iops: "64000"           # Up to 64,000 IOPS
  encrypted: "true"
  kmsKeyId: alias/ebs-critical
  # For io2 Block Express (256K IOPS, 4GB/s):
  # blockExpress: "true"
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
 
---
# Throughput-optimized for big data
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-st1-throughput
provisioner: ebs.csi.aws.com
parameters:
  type: st1
  encrypted: "true"
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

EBS Zone Locality

EBS volumes are zone-local—a volume in us-east-1a cannot attach to a node in us-east-1b. Always use volumeBindingMode: WaitForFirstConsumer to ensure volumes are created in the same zone as the pod that will use them.

Installing the AWS EBS CSI Driver:

For EKS clusters, the EBS CSI driver is typically installed as an EKS addon:

# Create IAM OIDC provider for the cluster
eksctl utils associate-iam-oidc-provider --cluster my-cluster --approve

# Create IAM role for the CSI driver
eksctl create iamserviceaccount \
  --name ebs-csi-controller-sa \
  --namespace kube-system \
  --cluster my-cluster \
  --role-name AmazonEKS_EBS_CSI_DriverRole \
  --role-only \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --approve

# Install the addon
aws eks create-addon \
  --cluster-name my-cluster \
  --addon-name aws-ebs-csi-driver \
  --service-account-role-arn arn:aws:iam::<AWS_ACCOUNT_ID>:role/AmazonEKS_EBS_CSI_DriverRole

Google Persistent Disk Integration

Google Persistent Disk (PD) is GCP's block storage service, tightly integrated with GKE (Google Kubernetes Engine). GCP PD offers some unique features compared to other cloud providers, including regional persistent disks for high availability.

Persistent Disk characteristics:

Google Persistent Disk Types
Disk Type	Use Case	IOPS (max)	Throughput (max)	Notes
pd-standard	Cost-effective, lower performance	3,000 read, 15,000 write	120 MB/s	HDD-backed
pd-balanced	General purpose workloads	80,000	1,200 MB/s	SSD, good price/performance
pd-ssd	Performance-sensitive workloads	100,000	1,200 MB/s	SSD, higher IOPS than balanced
pd-extreme	Highest performance databases	120,000	2,400 MB/s	Provisioned IOPS, highest cost
hyperdisk-*	Next-gen, decoupled sizing	350,000	5,000 MB/s	Independent IOPS/throughput/capacity

gcp-pd-storage-classes.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# GCP Persistent Disk CSI Driver Storage Classes
# GKE clusters typically have pd.csi.storage.gke.io pre-installed
 
# High-performance SSD for databases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-ssd-fast
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  disk-encryption-kms-key: projects/<PROJECT>/locations/<LOCATION>/keyRings/<RING>/cryptoKeys/<KEY>
  fsType: ext4
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
 
---
# Extreme performance for critical databases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-extreme
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-extreme
  provisioned-iops-on-create: "100000"    # Up to 120,000
  provisioned-throughput-on-create: "2400"  # MB/s
  fsType: xfs
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
 
---
# Regional persistent disk for HA (multi-zone replication)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-regional-ssd
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  replication-type: regional-pd
  # Available zones for replication
  # topology.gke.io/zone: us-central1-a,us-central1-b
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
 
---
# Balanced for general workloads (default)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-balanced
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-balanced
  fsType: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Regional Persistent Disks:

Google Cloud offers Regional Persistent Disks that synchronously replicate data across two zones in a region. This provides:

High availability: If one zone fails, the disk remains available in the other
Automatic failover: GKE can reschedule pods to the healthy zone with the disk still accessible
RPO = 0: Synchronous replication means no data loss on zone failure

Trade-offs of Regional PD:

2x storage cost (replicated across zones)
Higher write latency (synchronous replication)
Only 2 zones per disk (limited to single region)

When to Use Regional PD

Use Regional Persistent Disks for: production databases requiring zone-level HA, critical stateful services where even minutes of downtime are unacceptable, and applications where synchronous replication latency is acceptable. For read replicas or services with application-level replication, zonal disks are more cost-effective.

Azure Managed Disk Integration

Azure Managed Disks are Microsoft Azure's block storage solution, offering multiple performance tiers and redundancy options for AKS (Azure Kubernetes Service) workloads.

Azure Managed Disk tiers:

Azure Managed Disk SKUs
SKU	Use Case	IOPS (max)	Throughput (max)	Durability
Standard HDD (Standard_LRS)	Dev/test, backups	2,000	500 MB/s	3 copies in datacenter
Standard SSD (StandardSSD_LRS)	Web servers, light databases	6,000	750 MB/s	3 copies in datacenter
Premium SSD (Premium_LRS)	Production databases	20,000	900 MB/s	3 copies in datacenter
Premium SSD v2	High-performance databases	80,000	1,200 MB/s	3 copies in datacenter
Ultra Disk	Highest I/O workloads	160,000	4,000 MB/s	3 copies in datacenter
Premium_ZRS	Zone-redundant premium	20,000	900 MB/s	3 copies across 3 zones

azure-disk-storage-classes.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# Azure Disk CSI Driver Storage Classes
# AKS clusters have disk.csi.azure.com pre-installed
 
# Premium SSD for production
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-premium
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_LRS
  cachingMode: ReadOnly    # ReadOnly, ReadWrite, or None
  diskEncryptionSetID: /subscriptions/<SUB>/resourceGroups/<RG>/providers/Microsoft.Compute/diskEncryptionSets/<DES>
  fsType: ext4
  # enableBursting: "true"  # Enable burst for Premium SSDs < 512 GiB
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
 
---
# Premium SSD v2 for high-performance workloads
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-premium-v2
provisioner: disk.csi.azure.com
parameters:
  skuName: PremiumV2_LRS
  DiskIOPSReadWrite: "80000"    # Up to 80,000 IOPS
  DiskMBpsReadWrite: "1200"     # Up to 1,200 MB/s
  LogicalSectorSize: "512"      # 512 or 4096
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
 
---
# Zone-redundant storage for HA
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-zrs
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_ZRS
  fsType: ext4
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
 
---
# Ultra Disk for extreme performance
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-ultra
provisioner: disk.csi.azure.com
parameters:
  skuName: UltraSSD_LRS
  DiskIOPSReadWrite: "160000"
  DiskMBpsReadWrite: "4000"
  cachingMode: None    # Ultra Disks don't support caching
  # Note: Ultra Disks require specific VM types and AZ availability
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
 
---
# Standard SSD for dev/test
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-standard
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: disk.csi.azure.com
parameters:
  skuName: StandardSSD_LRS
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Azure Disk Caching

Azure Managed Disks support host caching (ReadOnly, ReadWrite, None). ReadOnly caching improves read performance for read-heavy workloads. ReadWrite caching (only for OS disks) risks data loss on host failure. Databases typically use ReadOnly or None.

Cloud File Storage Services

Block storage (EBS, PD, Azure Disk) is limited to single-node access (RWO). For workloads requiring shared storage across multiple pods (RWX), cloud-native file storage services are essential.

Comparison of cloud file services:

Cloud File Storage for Kubernetes
Cloud	Service	Protocol	Access Modes	Use Cases
AWS	EFS	NFS v4.1	RWO, ROX, RWX	Shared web content, ML training data, cross-AZ
AWS	FSx for Lustre	Lustre	RWO, ROX, RWX	HPC, ML training, big data
AWS	FSx for NetApp ONTAP	NFS/SMB	RWO, ROX, RWX	Enterprise workloads, NAS migration
GCP	Filestore	NFS v3	RWO, ROX, RWX	Shared storage, legacy NFS apps
GCP	Filestore Enterprise	NFS v3	RWO, ROX, RWX	HA NFS, regional availability
Azure	Azure Files	SMB 3.0/NFS 4.1	RWO, ROX, RWX	Shared files, Windows workloads
Azure	Azure NetApp Files	NFS/SMB	RWO, ROX, RWX	High-performance enterprise

aws-efs-storage.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# AWS EFS CSI Driver StorageClass
# Requires: aws-efs-csi-driver installed
# EFS is regional, accessible from all AZs
 
# Dynamic provisioning with Access Points
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-shared
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap    # Use Access Points
  fileSystemId: fs-0123456789abcdef0
  directoryPerms: "755"
  gidRangeStart: "1000"
  gidRangeEnd: "2000"
  basePath: "/dynamic_provisioning"
  # Encryption in transit
  # encryptInTransit: "true"
reclaimPolicy: Delete
volumeBindingMode: Immediate  # EFS is zone-agnostic
 
---
# Static provisioning: use existing EFS
apiVersion: v1
kind: PersistentVolume
metadata:
  name: efs-static
spec:
  capacity:
    storage: 5Ti    # EFS is elastic, this is informative
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: efs-static
  csi:
    driver: efs.csi.aws.com
    volumeHandle: fs-0123456789abcdef0
    volumeAttributes:
      encryptInTransit: "true"

Choosing Between Block and File Storage

Use block storage (EBS, PD, Azure Disk) for databases, single-pod workloads, and performance-critical applications. Use file storage (EFS, Filestore, Azure Files) for shared content, multi-pod access, and workloads requiring RWX. File storage typically has higher latency but offers sharing capabilities.

CSI Driver Management

Container Storage Interface (CSI) drivers are the standard mechanism for integrating storage backends with Kubernetes. Proper installation, configuration, and maintenance of CSI drivers is essential for reliable storage.

CSI driver components:

CSI Driver Architecture

•Controller Plugin: Deployment running on control plane or dedicated nodes. Handles provisioning, deletion, attach/detach, snapshots. Uses CSI gRPC calls.
•Node Plugin: DaemonSet running on every node. Handles mount/unmount, device staging, volume statistics. Communicates with kubelet.
•CSI Sidecar Containers: Kubernetes-sig-storage maintained containers (provisioner, attacher, resizer, snapshotter) that translate Kubernetes API events to CSI calls.
•CSIDriver Object: Cluster-scoped resource declaring driver capabilities (attachRequired, podInfoOnMount, volumeLifecycleModes).

csi-driver-inspection.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# List installed CSI drivers
kubectl get csidrivers
 
# Example output:
# NAME                      ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY   TOKENREQUESTS
# disk.csi.azure.com        true             false            true              
# ebs.csi.aws.com           true             false            false             
# efs.csi.aws.com           false            false            false             
# file.csi.azure.com        false            true             false             
# pd.csi.storage.gke.io     true             false            false             
 
# Inspect a CSI driver
kubectl describe csidriver ebs.csi.aws.com
 
# Check controller pods
kubectl get pods -n kube-system -l app=ebs-csi-controller
 
# Check node pods (DaemonSet)
kubectl get pods -n kube-system -l app=ebs-csi-node
 
# View CSI driver logs
kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner
 
# Check VolumeAttachments for debugging
kubectl get volumeattachments
kubectl describe volumeattachment <name>

Common CSI driver issues:

CSI Driver Troubleshooting
Symptom	Possible Cause	Resolution
PVC stuck in Pending	Provisioner not running	Check controller pod status, review logs
PVC stuck in Pending	Missing IAM permissions	Verify service account roles (IRSA for AWS, Workload Identity for GCP)
Pod stuck in ContainerCreating	Volume attach failed	Check VolumeAttachment status, verify node labels
Pod stuck in ContainerCreating	Mount failed	Check node plugin logs, verify filesystem support
Expansion not working	Driver/StorageClass doesn't support expansion	Verify allowVolumeExpansion and driver capabilities
Slow provisioning	API rate limiting	Implement exponential backoff, check cloud quotas

CSI Driver Updates

CSI driver updates can affect running workloads. Always review release notes, test in staging, and use rolling updates. Some updates may require node cordoning/draining for the node plugin DaemonSet.

Cross-Zone and Multi-Region Considerations

Storage availability and data placement across availability zones and regions is one of the most critical considerations for production Kubernetes deployments. Each cloud provider handles this differently.

Zone affinity and storage topology:

Storage Zone Behaviors by Cloud

•AWS EBS: Strictly zone-local. Volume in us-east-1a can only attach to nodes in us-east-1a. Requires WaitForFirstConsumer to avoid scheduling conflicts.
•AWS EFS: Regional service accessible from all AZs in the region. Mount targets in each AZ for optimal performance.
•GCP PD (Zonal): Zone-local like EBS. Volume must be in same zone as node.
•GCP PD (Regional): Synchronously replicated across 2 zones. Pod can run in either zone, with automatic failover.
•Azure Disk (LRS): Zone-local, 3 replicas within single datacenter. Cannot cross zones.
•Azure Disk (ZRS): Replicated across 3 availability zones. Can attach from any zone in the region.
•Azure Files: Generally zone-agnostic (Storage Account level configuration determines redundancy).

cross-zone-patterns.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Pattern 1: StatefulSet with zone anti-affinity and zone-aware PVs
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: database-ha
spec:
  serviceName: database
  replicas: 3
  selector:
    matchLabels:
      app: database
  template:
    metadata:
      labels:
        app: database
    spec:
      # Spread pods across zones
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: database
      
      # Alternative: pod anti-affinity
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: database
              topologyKey: topology.kubernetes.io/zone
      
      containers:
        - name: database
          image: postgres:15
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        # StorageClass with WaitForFirstConsumer ensures
        # each PV is created in the same zone as the pod
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 100Gi
 
---
# Pattern 2: HA with regional storage (GCP Regional PD)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: regional-pd
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  replication-type: regional-pd
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
  - matchLabelExpressions:
      - key: topology.kubernetes.io/zone
        values:
          - us-central1-a
          - us-central1-b
 
---
# Pattern 3: Shared storage across zones (AWS EFS)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-cross-zone
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-0123456789abcdef0
  directoryPerms: "755"
# EFS is regional, so Immediate binding is safe
volumeBindingMode: Immediate

Cross-Region Storage

Cross-region storage replication is typically handled at the application level or through cloud-specific mechanisms (EBS Snapshots + cross-region copy, GCP async replication, Azure Site Recovery). Kubernetes storage abstractions are generally region-local.

Volume Snapshots with Cloud Storage

Volume snapshots enable point-in-time backups of persistent volumes. All major cloud providers support snapshots through their CSI drivers, using the Kubernetes VolumeSnapshot API.

Snapshot workflow:

volume-snapshot-workflow.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# Step 1: Create a VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: ebs-snapshot-class
driver: ebs.csi.aws.com
deletionPolicy: Retain  # Delete or Retain
parameters:
  # AWS-specific: encrypt snapshot
  # encrypted: "true"
 
---
# Step 2: Create a VolumeSnapshot from existing PVC
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: database-snapshot-2024-01-08
  namespace: production
spec:
  volumeSnapshotClassName: ebs-snapshot-class
  source:
    # Reference to PVC to snapshot
    persistentVolumeClaimName: data-database-0
 
---
# Step 3: Check snapshot status
# kubectl get volumesnapshot database-snapshot-2024-01-08
# NAME                          READYTOUSE   SOURCEPVC        SNAPSHOTCONTENT
# database-snapshot-2024-01-08  true         data-database-0  snapcontent-xxx
 
---
# Step 4: Create new PVC from snapshot (restore or clone)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-restored
  namespace: production
spec:
  storageClassName: fast-ssd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi  # Must be >= snapshot size
  dataSource:
    name: database-snapshot-2024-01-08
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
 
---
# Alternative: Clone directly from PVC (no snapshot)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-clone
  namespace: production
spec:
  storageClassName: fast-ssd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  dataSource:
    name: data-database-0  # Source PVC
    kind: PersistentVolumeClaim

Snapshot automation:

For production backups, automate snapshot creation using CronJobs or dedicated snapshot controllers:

snapshot-automation.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# CronJob for automated snapshots
apiVersion: batch/v1
kind: CronJob
metadata:
  name: database-snapshot
  namespace: production
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: snapshot-creator
          containers:
            - name: snapshot
              image: bitnami/kubectl:latest
              command:
                - /bin/sh
                - -c
                - |
                  DATE=$(date +%Y-%m-%d-%H%M)
                  cat <<EOF | kubectl apply -f -
                  apiVersion: snapshot.storage.k8s.io/v1
                  kind: VolumeSnapshot
                  metadata:
                    name: database-snapshot-$DATE
                    namespace: production
                    labels:
                      app: database
                      backup-type: scheduled
                  spec:
                    volumeSnapshotClassName: ebs-snapshot-class
                    source:
                      persistentVolumeClaimName: data-database-0
                  EOF
          restartPolicy: OnFailure

Snapshot Best Practices

For application consistency, pause writes or use application-specific backup tools before snapshotting. Snapshots are crash-consistent but not necessarily application-consistent. For databases, use pg_start_backup/pg_stop_backup or equivalent before/after snapshots.

Cloud-Specific Best Practices

Each cloud provider has unique characteristics that affect storage performance, availability, and cost. Here are key best practices for each major platform:

AWS Storage Best Practices

•Use gp3 over gp2 — gp3 is cheaper and offers configurable IOPS/throughput independent of size
•Enable encryption — Use KMS encryption for all production volumes; it's free and transparent
•Monitor burst credits — gp2 and st1 use burst credits; monitor to avoid throttling
•Right-size for IOPS — EBS IOPS scale with volume size; ensure volumes are large enough for needed performance
•Use EFS for RWX — EFS is the only AWS option for ReadWriteMany; consider performance mode (General Purpose vs Max I/O)
•Snapshot to S3 — EBS snapshots are stored in S3 (11 9's durability); use for DR and cross-region
•Watch for AZ imbalance — If all volumes are in one AZ, AZ failure affects all storage

Summary: Cloud Storage Integration Mastery

Effective cloud storage integration requires understanding both Kubernetes abstractions and cloud-provider-specific behaviors. The combination of Storage Classes, CSI drivers, and cloud storage services enables flexible, performant storage for any workload.

Key Takeaways

•Cloud block storage is zone-local — EBS, GCP PD (zonal), and Azure Disk (LRS) require WaitForFirstConsumer to avoid scheduling conflicts with storage topology.
•Choose the right disk type — gp3/pd-balanced/Premium_SSD for general use; io2/pd-extreme/Ultra for critical databases; regional/ZRS for HA.
•File storage enables RWX — EFS, Filestore, and Azure Files provide shared storage for multi-pod access but with higher latency than block.
•CSI drivers are essential infrastructure — Proper installation, configuration, and maintenance of CSI drivers is required for storage operations.
•Volume snapshots enable backup/restore — Use VolumeSnapshots for point-in-time backups; automate with CronJobs or snapshot controllers.
•Regional options exist for HA — GCP Regional PD and Azure ZRS provide cross-zone replication for critical workloads.
•Encryption is non-negotiable — Enable encryption at rest using cloud KMS for all production storage.
•Performance scales with size — Most cloud block storage IOPS scale with volume size; right-size for performance needs.

What's next:

We'll explore data persistence patterns for Kubernetes—advanced strategies for backup, disaster recovery, data migration, and hybrid cloud storage that build on the cloud storage foundations covered in this page.

Page Complete

You now understand cloud provider storage integration comprehensively—from AWS EBS through Google Persistent Disk, Azure Managed Disks, file storage services, CSI driver management, and cross-zone considerations. This knowledge enables production-ready storage on any major cloud platform.