Loading content...
Kubernetes storage abstractions—Persistent Volumes, Storage Classes, and CSI drivers—are designed to be cloud-agnostic. However, understanding the specific behaviors, constraints, and capabilities of each cloud provider's storage offerings is essential for production deployments.
Each cloud provider offers distinct storage services:
These services differ in performance characteristics, availability models, pricing structures, and operational behaviors. This page explores how to effectively integrate these cloud storage services with Kubernetes for reliable, performant storage.
By the end of this page, you will understand AWS EBS integration with Kubernetes, Google Persistent Disk configurations, Azure Managed Disk options, cloud file storage services (EFS, Filestore, Azure Files), CSI driver installation and configuration, cross-AZ and cross-region considerations, and cloud-specific best practices.
Amazon Elastic Block Store (EBS) is the primary block storage service for AWS EC2 instances and the most common storage backend for Kubernetes on AWS (EKS or self-managed).
EBS fundamentals:
| Volume Type | Use Case | IOPS | Throughput | Cost |
|---|---|---|---|---|
| gp3 | General purpose, balanced workloads | 3,000-16,000 | 125-1,000 MB/s | Lowest $/GB |
| gp2 | Legacy general purpose | 100-16,000 (burst) | 128-250 MB/s | Higher than gp3 |
| io2/io2 Block Express | Critical databases, high IOPS | 64,000-256,000 | 1,000-4,000 MB/s | Highest |
| st1 | Big data, log processing | 500 (baseline) | 500-500 MB/s | Low $/GB |
| sc1 | Cold data, infrequent access | 250 (baseline) | 250-250 MB/s | Lowest |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
# AWS EBS CSI Driver Storage Classes# Requires: aws-ebs-csi-driver installed in cluster # High-performance gp3 for productionapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: ebs-gp3-fast annotations: description: "High-performance gp3 with 16K IOPS, 1000 MB/s throughput"provisioner: ebs.csi.aws.comparameters: type: gp3 iops: "16000" # Provisioned IOPS (3000 baseline, up to 16000) throughput: "1000" # MB/s (125 baseline, up to 1000) encrypted: "true" # Encryption at rest kmsKeyId: alias/ebs-prod # Customer-managed KMS key fsType: ext4reclaimPolicy: RetainallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumer ---# Default gp3 for general workloadsapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: ebs-gp3 annotations: storageclass.kubernetes.io/is-default-class: "true"provisioner: ebs.csi.aws.comparameters: type: gp3 encrypted: "true" fsType: ext4reclaimPolicy: DeleteallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumer ---# io2 for critical databasesapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: ebs-io2-criticalprovisioner: ebs.csi.aws.comparameters: type: io2 iops: "64000" # Up to 64,000 IOPS encrypted: "true" kmsKeyId: alias/ebs-critical # For io2 Block Express (256K IOPS, 4GB/s): # blockExpress: "true"reclaimPolicy: RetainallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumer ---# Throughput-optimized for big dataapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: ebs-st1-throughputprovisioner: ebs.csi.aws.comparameters: type: st1 encrypted: "true"reclaimPolicy: DeleteallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumerEBS volumes are zone-local—a volume in us-east-1a cannot attach to a node in us-east-1b. Always use volumeBindingMode: WaitForFirstConsumer to ensure volumes are created in the same zone as the pod that will use them.
Installing the AWS EBS CSI Driver:
For EKS clusters, the EBS CSI driver is typically installed as an EKS addon:
# Create IAM OIDC provider for the cluster
eksctl utils associate-iam-oidc-provider --cluster my-cluster --approve
# Create IAM role for the CSI driver
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster my-cluster \
--role-name AmazonEKS_EBS_CSI_DriverRole \
--role-only \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--approve
# Install the addon
aws eks create-addon \
--cluster-name my-cluster \
--addon-name aws-ebs-csi-driver \
--service-account-role-arn arn:aws:iam::<AWS_ACCOUNT_ID>:role/AmazonEKS_EBS_CSI_DriverRole
Google Persistent Disk (PD) is GCP's block storage service, tightly integrated with GKE (Google Kubernetes Engine). GCP PD offers some unique features compared to other cloud providers, including regional persistent disks for high availability.
Persistent Disk characteristics:
| Disk Type | Use Case | IOPS (max) | Throughput (max) | Notes |
|---|---|---|---|---|
| pd-standard | Cost-effective, lower performance | 3,000 read, 15,000 write | 120 MB/s | HDD-backed |
| pd-balanced | General purpose workloads | 80,000 | 1,200 MB/s | SSD, good price/performance |
| pd-ssd | Performance-sensitive workloads | 100,000 | 1,200 MB/s | SSD, higher IOPS than balanced |
| pd-extreme | Highest performance databases | 120,000 | 2,400 MB/s | Provisioned IOPS, highest cost |
| hyperdisk-* | Next-gen, decoupled sizing | 350,000 | 5,000 MB/s | Independent IOPS/throughput/capacity |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
# GCP Persistent Disk CSI Driver Storage Classes# GKE clusters typically have pd.csi.storage.gke.io pre-installed # High-performance SSD for databasesapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: pd-ssd-fastprovisioner: pd.csi.storage.gke.ioparameters: type: pd-ssd disk-encryption-kms-key: projects/<PROJECT>/locations/<LOCATION>/keyRings/<RING>/cryptoKeys/<KEY> fsType: ext4reclaimPolicy: RetainallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumer ---# Extreme performance for critical databasesapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: pd-extremeprovisioner: pd.csi.storage.gke.ioparameters: type: pd-extreme provisioned-iops-on-create: "100000" # Up to 120,000 provisioned-throughput-on-create: "2400" # MB/s fsType: xfsreclaimPolicy: RetainallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumer ---# Regional persistent disk for HA (multi-zone replication)apiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: pd-regional-ssdprovisioner: pd.csi.storage.gke.ioparameters: type: pd-ssd replication-type: regional-pd # Available zones for replication # topology.gke.io/zone: us-central1-a,us-central1-breclaimPolicy: RetainallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumer ---# Balanced for general workloads (default)apiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: pd-balanced annotations: storageclass.kubernetes.io/is-default-class: "true"provisioner: pd.csi.storage.gke.ioparameters: type: pd-balanced fsType: ext4reclaimPolicy: DeleteallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumerRegional Persistent Disks:
Google Cloud offers Regional Persistent Disks that synchronously replicate data across two zones in a region. This provides:
Trade-offs of Regional PD:
Use Regional Persistent Disks for: production databases requiring zone-level HA, critical stateful services where even minutes of downtime are unacceptable, and applications where synchronous replication latency is acceptable. For read replicas or services with application-level replication, zonal disks are more cost-effective.
Azure Managed Disks are Microsoft Azure's block storage solution, offering multiple performance tiers and redundancy options for AKS (Azure Kubernetes Service) workloads.
Azure Managed Disk tiers:
| SKU | Use Case | IOPS (max) | Throughput (max) | Durability |
|---|---|---|---|---|
| Standard HDD (Standard_LRS) | Dev/test, backups | 2,000 | 500 MB/s | 3 copies in datacenter |
| Standard SSD (StandardSSD_LRS) | Web servers, light databases | 6,000 | 750 MB/s | 3 copies in datacenter |
| Premium SSD (Premium_LRS) | Production databases | 20,000 | 900 MB/s | 3 copies in datacenter |
| Premium SSD v2 | High-performance databases | 80,000 | 1,200 MB/s | 3 copies in datacenter |
| Ultra Disk | Highest I/O workloads | 160,000 | 4,000 MB/s | 3 copies in datacenter |
| Premium_ZRS | Zone-redundant premium | 20,000 | 900 MB/s | 3 copies across 3 zones |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
# Azure Disk CSI Driver Storage Classes# AKS clusters have disk.csi.azure.com pre-installed # Premium SSD for productionapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: azure-premiumprovisioner: disk.csi.azure.comparameters: skuName: Premium_LRS cachingMode: ReadOnly # ReadOnly, ReadWrite, or None diskEncryptionSetID: /subscriptions/<SUB>/resourceGroups/<RG>/providers/Microsoft.Compute/diskEncryptionSets/<DES> fsType: ext4 # enableBursting: "true" # Enable burst for Premium SSDs < 512 GiBreclaimPolicy: RetainallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumer ---# Premium SSD v2 for high-performance workloadsapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: azure-premium-v2provisioner: disk.csi.azure.comparameters: skuName: PremiumV2_LRS DiskIOPSReadWrite: "80000" # Up to 80,000 IOPS DiskMBpsReadWrite: "1200" # Up to 1,200 MB/s LogicalSectorSize: "512" # 512 or 4096reclaimPolicy: RetainallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumer ---# Zone-redundant storage for HAapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: azure-zrsprovisioner: disk.csi.azure.comparameters: skuName: Premium_ZRS fsType: ext4reclaimPolicy: RetainallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumer ---# Ultra Disk for extreme performanceapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: azure-ultraprovisioner: disk.csi.azure.comparameters: skuName: UltraSSD_LRS DiskIOPSReadWrite: "160000" DiskMBpsReadWrite: "4000" cachingMode: None # Ultra Disks don't support caching # Note: Ultra Disks require specific VM types and AZ availabilityreclaimPolicy: RetainallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumer ---# Standard SSD for dev/testapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: azure-standard annotations: storageclass.kubernetes.io/is-default-class: "true"provisioner: disk.csi.azure.comparameters: skuName: StandardSSD_LRSreclaimPolicy: DeleteallowVolumeExpansion: truevolumeBindingMode: WaitForFirstConsumerAzure Managed Disks support host caching (ReadOnly, ReadWrite, None). ReadOnly caching improves read performance for read-heavy workloads. ReadWrite caching (only for OS disks) risks data loss on host failure. Databases typically use ReadOnly or None.
Block storage (EBS, PD, Azure Disk) is limited to single-node access (RWO). For workloads requiring shared storage across multiple pods (RWX), cloud-native file storage services are essential.
Comparison of cloud file services:
| Cloud | Service | Protocol | Access Modes | Use Cases |
|---|---|---|---|---|
| AWS | EFS | NFS v4.1 | RWO, ROX, RWX | Shared web content, ML training data, cross-AZ |
| AWS | FSx for Lustre | Lustre | RWO, ROX, RWX | HPC, ML training, big data |
| AWS | FSx for NetApp ONTAP | NFS/SMB | RWO, ROX, RWX | Enterprise workloads, NAS migration |
| GCP | Filestore | NFS v3 | RWO, ROX, RWX | Shared storage, legacy NFS apps |
| GCP | Filestore Enterprise | NFS v3 | RWO, ROX, RWX | HA NFS, regional availability |
| Azure | Azure Files | SMB 3.0/NFS 4.1 | RWO, ROX, RWX | Shared files, Windows workloads |
| Azure | Azure NetApp Files | NFS/SMB | RWO, ROX, RWX | High-performance enterprise |
12345678910111213141516171819202122232425262728293031323334353637383940
# AWS EFS CSI Driver StorageClass# Requires: aws-efs-csi-driver installed# EFS is regional, accessible from all AZs # Dynamic provisioning with Access PointsapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: efs-sharedprovisioner: efs.csi.aws.comparameters: provisioningMode: efs-ap # Use Access Points fileSystemId: fs-0123456789abcdef0 directoryPerms: "755" gidRangeStart: "1000" gidRangeEnd: "2000" basePath: "/dynamic_provisioning" # Encryption in transit # encryptInTransit: "true"reclaimPolicy: DeletevolumeBindingMode: Immediate # EFS is zone-agnostic ---# Static provisioning: use existing EFSapiVersion: v1kind: PersistentVolumemetadata: name: efs-staticspec: capacity: storage: 5Ti # EFS is elastic, this is informative accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain storageClassName: efs-static csi: driver: efs.csi.aws.com volumeHandle: fs-0123456789abcdef0 volumeAttributes: encryptInTransit: "true"Use block storage (EBS, PD, Azure Disk) for databases, single-pod workloads, and performance-critical applications. Use file storage (EFS, Filestore, Azure Files) for shared content, multi-pod access, and workloads requiring RWX. File storage typically has higher latency but offers sharing capabilities.
Container Storage Interface (CSI) drivers are the standard mechanism for integrating storage backends with Kubernetes. Proper installation, configuration, and maintenance of CSI drivers is essential for reliable storage.
CSI driver components:
1234567891011121314151617181920212223242526
# List installed CSI driverskubectl get csidrivers # Example output:# NAME ATTACHREQUIRED PODINFOONMOUNT STORAGECAPACITY TOKENREQUESTS# disk.csi.azure.com true false true # ebs.csi.aws.com true false false # efs.csi.aws.com false false false # file.csi.azure.com false true false # pd.csi.storage.gke.io true false false # Inspect a CSI driverkubectl describe csidriver ebs.csi.aws.com # Check controller podskubectl get pods -n kube-system -l app=ebs-csi-controller # Check node pods (DaemonSet)kubectl get pods -n kube-system -l app=ebs-csi-node # View CSI driver logskubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner # Check VolumeAttachments for debuggingkubectl get volumeattachmentskubectl describe volumeattachment <name>Common CSI driver issues:
| Symptom | Possible Cause | Resolution |
|---|---|---|
| PVC stuck in Pending | Provisioner not running | Check controller pod status, review logs |
| PVC stuck in Pending | Missing IAM permissions | Verify service account roles (IRSA for AWS, Workload Identity for GCP) |
| Pod stuck in ContainerCreating | Volume attach failed | Check VolumeAttachment status, verify node labels |
| Pod stuck in ContainerCreating | Mount failed | Check node plugin logs, verify filesystem support |
| Expansion not working | Driver/StorageClass doesn't support expansion | Verify allowVolumeExpansion and driver capabilities |
| Slow provisioning | API rate limiting | Implement exponential backoff, check cloud quotas |
CSI driver updates can affect running workloads. Always review release notes, test in staging, and use rolling updates. Some updates may require node cordoning/draining for the node plugin DaemonSet.
Storage availability and data placement across availability zones and regions is one of the most critical considerations for production Kubernetes deployments. Each cloud provider handles this differently.
Zone affinity and storage topology:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
# Pattern 1: StatefulSet with zone anti-affinity and zone-aware PVsapiVersion: apps/v1kind: StatefulSetmetadata: name: database-haspec: serviceName: database replicas: 3 selector: matchLabels: app: database template: metadata: labels: app: database spec: # Spread pods across zones topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: database # Alternative: pod anti-affinity affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app: database topologyKey: topology.kubernetes.io/zone containers: - name: database image: postgres:15 volumeMounts: - name: data mountPath: /var/lib/postgresql/data volumeClaimTemplates: - metadata: name: data spec: accessModes: ["ReadWriteOnce"] # StorageClass with WaitForFirstConsumer ensures # each PV is created in the same zone as the pod storageClassName: fast-ssd resources: requests: storage: 100Gi ---# Pattern 2: HA with regional storage (GCP Regional PD)apiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: regional-pdprovisioner: pd.csi.storage.gke.ioparameters: type: pd-ssd replication-type: regional-pdvolumeBindingMode: WaitForFirstConsumerallowedTopologies: - matchLabelExpressions: - key: topology.kubernetes.io/zone values: - us-central1-a - us-central1-b ---# Pattern 3: Shared storage across zones (AWS EFS)apiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: efs-cross-zoneprovisioner: efs.csi.aws.comparameters: provisioningMode: efs-ap fileSystemId: fs-0123456789abcdef0 directoryPerms: "755"# EFS is regional, so Immediate binding is safevolumeBindingMode: ImmediateCross-region storage replication is typically handled at the application level or through cloud-specific mechanisms (EBS Snapshots + cross-region copy, GCP async replication, Azure Site Recovery). Kubernetes storage abstractions are generally region-local.
Volume snapshots enable point-in-time backups of persistent volumes. All major cloud providers support snapshots through their CSI drivers, using the Kubernetes VolumeSnapshot API.
Snapshot workflow:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
# Step 1: Create a VolumeSnapshotClassapiVersion: snapshot.storage.k8s.io/v1kind: VolumeSnapshotClassmetadata: name: ebs-snapshot-classdriver: ebs.csi.aws.comdeletionPolicy: Retain # Delete or Retainparameters: # AWS-specific: encrypt snapshot # encrypted: "true" ---# Step 2: Create a VolumeSnapshot from existing PVCapiVersion: snapshot.storage.k8s.io/v1kind: VolumeSnapshotmetadata: name: database-snapshot-2024-01-08 namespace: productionspec: volumeSnapshotClassName: ebs-snapshot-class source: # Reference to PVC to snapshot persistentVolumeClaimName: data-database-0 ---# Step 3: Check snapshot status# kubectl get volumesnapshot database-snapshot-2024-01-08# NAME READYTOUSE SOURCEPVC SNAPSHOTCONTENT# database-snapshot-2024-01-08 true data-database-0 snapcontent-xxx ---# Step 4: Create new PVC from snapshot (restore or clone)apiVersion: v1kind: PersistentVolumeClaimmetadata: name: database-restored namespace: productionspec: storageClassName: fast-ssd accessModes: - ReadWriteOnce resources: requests: storage: 100Gi # Must be >= snapshot size dataSource: name: database-snapshot-2024-01-08 kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io ---# Alternative: Clone directly from PVC (no snapshot)apiVersion: v1kind: PersistentVolumeClaimmetadata: name: database-clone namespace: productionspec: storageClassName: fast-ssd accessModes: - ReadWriteOnce resources: requests: storage: 100Gi dataSource: name: data-database-0 # Source PVC kind: PersistentVolumeClaimSnapshot automation:
For production backups, automate snapshot creation using CronJobs or dedicated snapshot controllers:
123456789101112131415161718192021222324252627282930313233343536373839
# CronJob for automated snapshotsapiVersion: batch/v1kind: CronJobmetadata: name: database-snapshot namespace: productionspec: schedule: "0 2 * * *" # Daily at 2 AM concurrencyPolicy: Forbid successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 3 jobTemplate: spec: template: spec: serviceAccountName: snapshot-creator containers: - name: snapshot image: bitnami/kubectl:latest command: - /bin/sh - -c - | DATE=$(date +%Y-%m-%d-%H%M) cat <<EOF | kubectl apply -f - apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: database-snapshot-$DATE namespace: production labels: app: database backup-type: scheduled spec: volumeSnapshotClassName: ebs-snapshot-class source: persistentVolumeClaimName: data-database-0 EOF restartPolicy: OnFailureFor application consistency, pause writes or use application-specific backup tools before snapshotting. Snapshots are crash-consistent but not necessarily application-consistent. For databases, use pg_start_backup/pg_stop_backup or equivalent before/after snapshots.
Each cloud provider has unique characteristics that affect storage performance, availability, and cost. Here are key best practices for each major platform:
Effective cloud storage integration requires understanding both Kubernetes abstractions and cloud-provider-specific behaviors. The combination of Storage Classes, CSI drivers, and cloud storage services enables flexible, performant storage for any workload.
What's next:
We'll explore data persistence patterns for Kubernetes—advanced strategies for backup, disaster recovery, data migration, and hybrid cloud storage that build on the cloud storage foundations covered in this page.
You now understand cloud provider storage integration comprehensively—from AWS EBS through Google Persistent Disk, Azure Managed Disks, file storage services, CSI driver management, and cross-zone considerations. This knowledge enables production-ready storage on any major cloud platform.