System Design (HLD)GitOps

GitOps: Infrastructure and Application Delivery via Git

LevelIntermediate

Duration90 mins

TopicGitOps

3 / 5

Declarative Cluster State

The Fully Declarative Cluster

A fundamental promise of GitOps is that your Git repository becomes a complete description of your cluster state. If your repository disappears, your cluster documentation is gone. Conversely, if your cluster disappears, you can recreate it entirely from Git. This promise requires modeling everything declaratively—not just application deployments, but infrastructure components, networking, security policies, observability stacks, and cross-cutting concerns.

Achieving this level of declarative completeness is both technically challenging and culturally transformative. It requires rethinking how we manage configuration, secrets, and the inevitable edge cases that seem to require imperative intervention.

What You Will Learn

This page teaches you how to structure a fully declarative cluster state. You'll learn to organize infrastructure layers, manage secrets safely in Git, handle cross-cutting concerns, and design a repository structure that scales from single-cluster to enterprise multi-cluster deployments.

The Layers of Cluster State

A Kubernetes cluster isn't a flat collection of resources—it has layers with dependencies, ownership boundaries, and different change velocities. Understanding these layers is essential for structuring your GitOps repository effectively.

The Infrastructure Layer Cake:

Think of cluster state as a layered cake, where each layer depends on the layers below it:

Cluster State Layers (Bottom to Top)

•Cluster Prerequisites — Resources that must exist before anything else: namespaces, resource quotas, RBAC cluster roles, custom resource definitions (CRDs).
•Platform Infrastructure — Foundational services: ingress controllers, cert-manager, DNS management, service mesh control planes, secret management operators.
•Observability Stack — Monitoring, logging, tracing: Prometheus, Grafana, Loki, Jaeger, OpenTelemetry collectors.
•Security Infrastructure — Policy engines (OPA Gatekeeper, Kyverno), network policies, pod security standards, vulnerability scanners.
•Shared Services — Components used by multiple applications: databases, message queues, caches (if cluster-deployed rather than managed services).
•Application Workloads — The actual business applications, organized by team, product, or domain.

Converting Mermaid diagram...

Why Layers Matter:

Dependency Ordering: You can't deploy an Ingress before the Ingress Controller exists. You can't create a Certificate before cert-manager is running. Layers encode these dependencies.
Change Velocity: Infrastructure changes infrequently (weekly/monthly). Applications change constantly (hourly/daily). Separating layers prevents high-velocity application changes from affecting slow-moving infrastructure.
Ownership Boundaries: The platform team owns layers 1-4. Product teams own layer 6. Clear ownership prevents conflicts and enables parallel workflows.
Blast Radius Containment: A broken application deployment affects one team. A broken infrastructure deployment affects everyone. Separation allows different testing and approval requirements.

Dependencies in GitOps Tools

Both ArgoCD and Flux support explicit dependencies. In ArgoCD, use sync waves and the app-of-apps pattern. In Flux, Kustomizations can declare dependsOn relationships. These mechanisms ensure layers are applied in the correct order, even when manifests are committed simultaneously.

Modeling Infrastructure Declaratively

Infrastructure components require careful declarative modeling. Unlike applications with simple Deployment + Service patterns, infrastructure often involves CRDs, cluster-scoped resources, and complex configurations.

infrastructure-structure.txt

Directory Structure

# Infrastructure Layer Organization
infrastructure/
├── base/
│   ├── namespaces/
│   │   ├── kustomization.yaml
│   │   ├── ingress-nginx.yaml
│   │   ├── cert-manager.yaml
│   │   ├── monitoring.yaml
│   │   └── security.yaml
│   │
│   ├── crds/
│   │   ├── kustomization.yaml
│   │   └── README.md  # CRDs often installed by operators
│   │
│   ├── ingress-nginx/
│   │   ├── kustomization.yaml
│   │   ├── namespace.yaml
│   │   ├── helmrelease.yaml       # Or raw manifests
│   │   └── ingress-class.yaml
│   │
│   ├── cert-manager/
│   │   ├── kustomization.yaml
│   │   ├── namespace.yaml
│   │   ├── helmrelease.yaml
│   │   └── cluster-issuers/
│   │       ├── letsencrypt-staging.yaml
│   │       └── letsencrypt-production.yaml
│   │
│   ├── external-secrets/
│   │   ├── kustomization.yaml
│   │   ├── namespace.yaml
│   │   ├── helmrelease.yaml
│   │   └── cluster-secret-store.yaml
│   │
│   └── monitoring/
│       ├── kustomization.yaml
│       ├── prometheus/
│       │   ├── kustomization.yaml
│       │   └── helmrelease.yaml
│       └── grafana/
│           ├── kustomization.yaml
│           ├── helmrelease.yaml
│           └── dashboards/
│               ├── kubernetes-cluster.yaml
│               └── nginx-ingress.yaml
│
└── overlays/
    ├── production/
    │   ├── kustomization.yaml     # Patches for prod
    │   ├── ingress-nginx-patch.yaml
    │   └── prometheus-patch.yaml
    └── staging/
        ├── kustomization.yaml
        └── ingress-nginx-patch.yaml

Helm Charts in GitOps:

Many infrastructure components are distributed as Helm charts. In GitOps, you have two options:

Option 1: Render and Commit (Template in CI)

helm template ingress-nginx ingress-nginx/ingress-nginx \
  --values values-production.yaml \
  > manifests/ingress-nginx.yaml

Pros: Full visibility into generated manifests, easy diffing, no Helm at runtime. Cons: Upgrade process is more complex, generated files can be large.

Option 2: GitOps Operator Renders (HelmRelease CRD) The GitOps operator (ArgoCD or Flux's helm-controller) renders charts at sync time. Pros: Simpler workflow, native Helm features, easier upgrades. Cons: Less visibility pre-deployment, requires trusting runtime rendering.

Both approaches are valid. Option 2 is more common for infrastructure because it's simpler and infrastructure changes are heavily reviewed anyway.

ingress-nginx-helmrelease.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# HelmRelease for ingress-nginx (Flux style)
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
spec:
  interval: 1h
  chart:
    spec:
      chart: ingress-nginx
      version: "4.8.x"
      sourceRef:
        kind: HelmRepository
        name: ingress-nginx
        namespace: flux-system
      interval: 24h
  
  install:
    crds: CreateReplace
    remediation:
      retries: 3
  
  upgrade:
    crds: CreateReplace
    remediation:
      retries: 3
      remediateLastFailure: true
    cleanupOnFail: true
  
  values:
    controller:
      replicaCount: 3
      
      # Resource requests/limits
      resources:
        requests:
          cpu: 100m
          memory: 256Mi
        limits:
          cpu: 500m
          memory: 512Mi
      
      # High availability
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app.kubernetes.io/name: ingress-nginx
              topologyKey: kubernetes.io/hostname
      
      # Service configuration
      service:
        type: LoadBalancer
        externalTrafficPolicy: Local
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-type: nlb
          service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
      
      # Metrics for monitoring
      metrics:
        enabled: true
        serviceMonitor:
          enabled: true
          namespace: monitoring
      
      # Security
      admissionWebhooks:
        enabled: true
        
    # Default backend
    defaultBackend:
      enabled: true
      replicaCount: 2

CRD Management Complexity

Custom Resource Definitions (CRDs) require special handling. They must exist before any Custom Resources that use them. Some Helm charts include CRDs; others install them separately. Flux's install.crds and upgrade.crds options control this behavior. ArgoCD's sync waves can order CRD installation. Get this wrong, and deployments fail silently waiting for nonexistent CRDs.

Secret Management in GitOps

Secrets are the Achilles' heel of GitOps. The fundamental promise is "everything in Git," but you cannot commit plaintext secrets to Git. This creates a tension that requires careful solution architecture.

The Secret Management Spectrum:

Plaintext in Git — Never acceptable. Secrets in Git history are leaked forever.
Encrypted Secrets in Git — Acceptable. Secrets are encrypted before commit, decrypted in-cluster.
External Secret References — Excellent. Git stores references; secrets live in Vault, AWS Secrets Manager, etc.
Injected at Runtime — Viable for some cases. Secrets never touch Git, injected via admission webhooks.

GitOps Secret Management Solutions
Solution	Approach	Pros	Cons
Sealed Secrets	Encrypt with cluster public key, commit to Git	Simple, no external deps, GitOps-native	Secrets tied to cluster, key rotation complex
SOPS	Encrypt with KMS/PGP, commit to Git	Multi-key support, flexible, works offline	Requires decryption step, key management
External Secrets Operator	Sync from Vault/AWS/GCP to K8s Secrets	Centralized secret management, rotation	External dependency, network requirement
Vault Agent Injector	Inject secrets via webhook at pod start	Dynamic secrets, fine-grained access	Vault dependency, performance impact
CSI Secret Store	Mount secrets as volumes from providers	Standard interface, multiple backends	CSI driver complexity, mount semantics

Bitnami Sealed Secrets encrypts secrets using the cluster's public key. Only the Sealed Secrets controller running in the cluster can decrypt them.

sealed-secrets.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Create a regular Kubernetes secret
kubectl create secret generic db-credentials \
  --from-literal=username=myuser \
  --from-literal=password=supersecret \
  --dry-run=client -o yaml > secret.yaml
 
# Encrypt it using the cluster's public key
kubeseal --format yaml \
  --controller-namespace kube-system \
  --controller-name sealed-secrets-controller \
  < secret.yaml > sealed-secret.yaml
 
# The sealed-secret.yaml is SAFE to commit to Git!
cat sealed-secret.yaml

Secret Rotation Considerations

Encrypted-in-Git solutions (Sealed Secrets, SOPS) require re-encryption and a new commit when secrets rotate. External secret stores (Vault, AWS Secrets Manager) separate rotation from Git—secrets update in the backend, and the operator syncs the new values automatically. For high-rotation secrets, external stores are strongly preferred.

Application State Patterns

Applications are the highest layer in your declarative cluster state. They change most frequently and are typically owned by product teams rather than platform teams. The patterns for organizing application state need to balance developer autonomy with cluster-wide consistency.

application-structure.txt

Directory Structure

# Application Layer Organization
apps/
├── my-service/
│   ├── base/
│   │   ├── kustomization.yaml
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   ├── hpa.yaml
│   │   ├── pdb.yaml
│   │   ├── configmap.yaml
│   │   └── service-account.yaml
│   │
│   └── overlays/
│       ├── development/
│       │   ├── kustomization.yaml
│       │   ├── replicas-patch.yaml    # 1 replica
│       │   └── resources-patch.yaml   # Lower limits
│       │
│       ├── staging/
│       │   ├── kustomization.yaml
│       │   ├── replicas-patch.yaml    # 2 replicas
│       │   └── ingress.yaml           # Staging domain
│       │
│       └── production/
│           ├── kustomization.yaml
│           ├── replicas-patch.yaml    # 5 replicas
│           ├── ingress.yaml           # Production domain
│           ├── resources-patch.yaml   # Higher limits
│           └── external-secret.yaml   # Prod secrets
│
├── another-service/
│   └── ...
│
└── _templates/
    ├── web-service/           # Reusable template
    │   ├── kustomization.yaml
    │   ├── deployment.yaml
    │   └── service.yaml
    └── background-worker/
        ├── kustomization.yaml
        └── deployment.yaml

Kustomize Overlay Pattern:

The base + overlay pattern is fundamental to GitOps. The base contains the complete, production-ready configuration. Overlays make minimal, targeted modifications for different environments.

Key Principles:

Base must be deployable — The base should work in production. Overlays simplify for lower environments, not augment for production.
Overlays are minimal — Only patch what differs. Don't duplicate entire files; use Kustomize patches.
Environment parity — The more overlays diverge from base, the less confidence you have that staging reflects production.
Consistent naming — Use the same namespace names across environments when possible. Different clusters, same logical structure.

kustomize-examples.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# apps/my-service/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
 
namespace: my-service
 
# Common labels for all resources
commonLabels:
  app.kubernetes.io/name: my-service
  app.kubernetes.io/component: backend
  app.kubernetes.io/part-of: my-platform
 
resources:
  - deployment.yaml
  - service.yaml
  - hpa.yaml
  - pdb.yaml
  - configmap.yaml
  - service-account.yaml
 
# Images can be overridden by overlays or GitOps tools
images:
  - name: my-service
    newName: registry.example.com/my-service
    newTag: latest  # Overridden by image automation

The Single-Source Image Tag

Image tags should be updated in exactly one place. With GitOps image automation (Flux Image Automation or Argo Image Updater), the tool commits updated tags to your repository. Without automation, CI pipelines should update a single file (typically the overlay's kustomization.yaml) that specifies the image tag.

Cross-Cutting Concerns

Many cluster resources cut across the layer model—they're used by multiple applications or affect the entire cluster. Managing these requires careful design to avoid duplication while maintaining clear ownership.

Common Cross-Cutting Concerns

•Network Policies — Base policies set defaults; applications can extend with additional ingress/egress rules.
•Pod Security Standards — Cluster-wide policies enforce security baselines; exemptions declared per-namespace.
•Resource Quotas — Set at namespace level by platform team; applications work within boundaries.
•Limit Ranges — Default container resource limits; applications can override within quota bounds.
•Service Mesh Configuration — Global mesh settings versus per-service traffic policies.
•Observability Configuration — ServiceMonitors, logging configs that apps reference but don't own.

Pattern: Policy with Defaults and Overrides

A common pattern for cross-cutting concerns:

Cluster-wide defaults — Platform team defines baseline in infrastructure layer
Namespace-scoped policies — Platform team sets boundaries per team/namespace
Application-level extensions — Apps can add to (but not remove from) policies

For example, with network policies:

Cluster default: deny all ingress
Namespace default: allow from same namespace
Application: add specific external ingress rules

network-policy-pattern.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# Platform team defines defaults
# infrastructure/security/default-network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  # Applied to all namespaces via Kustomize namespaceSuffix
spec:
  podSelector: {}  # All pods
  policyTypes:
    - Ingress
  ingress: []  # Deny all ingress by default
 
---
# Allow intra-namespace communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector: {}  # Same namespace
 
---
# Application team adds their specific rules
# apps/my-service/base/network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: my-service-ingress
spec:
  podSelector:
    matchLabels:
      app: my-service
  policyTypes:
    - Ingress
  ingress:
    # Allow from ingress controller
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
          podSelector:
            matchLabels:
              app.kubernetes.io/name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
    # Allow from specific consuming service
    - from:
        - namespaceSelector:
            matchLabels:
              name: consumer-service
          podSelector:
            matchLabels:
              app: consumer-service

Policy Engines for Advanced Cross-Cutting

For complex policy requirements, dedicated policy engines like OPA Gatekeeper or Kyverno provide more power than raw Kubernetes RBAC and admission control. They can enforce policies like 'all Deployments must have resource limits', 'all images must come from approved registries', or 'all pods must have security contexts'. These engines integrate well with GitOps.

Multi-Cluster Declarative State

Enterprise deployments typically span multiple clusters—different regions, environments, or even cloud providers. Extending declarative state management across clusters requires patterns for sharing configuration while allowing cluster-specific customization.

multi-cluster-structure.txt

Directory Structure

# Multi-Cluster Repository Organization
fleet-infra/
├── clusters/
│   ├── production-us-east/
│   │   ├── flux-system/           # Flux installation for this cluster
│   │   ├── infrastructure.yaml    # Points to infrastructure overlays
│   │   ├── apps.yaml              # Points to app overlays
│   │   └── cluster-config.yaml    # Cluster-specific variables
│   │
│   ├── production-eu-west/
│   │   ├── flux-system/
│   │   ├── infrastructure.yaml
│   │   ├── apps.yaml
│   │   └── cluster-config.yaml
│   │
│   ├── staging-us-east/
│   │   └── ...
│   │
│   └── development/
│       └── ...
│
├── infrastructure/
│   ├── base/                      # Shared infrastructure base
│   └── overlays/
│       ├── production/            # Production-specific
│       └── staging/               # Staging-specific
│
├── apps/
│   ├── service-a/
│   │   ├── base/
│   │   └── overlays/
│   │       ├── production/        # Shared across prod clusters
│   │       └── staging/
│   └── service-b/
│       └── ...
│
└── cluster-definitions/
    ├── production-us-east.yaml    # Cluster metadata
    ├── production-eu-west.yaml
    └── staging-us-east.yaml

Patterns for Multi-Cluster Configuration:

1. Inheritance Model Common base configurations with cluster-specific overlays. Each cluster's directory contains only what's unique to that cluster.

2. Variable Substitution Flux's postBuild.substitute allows variables to be replaced based on per-cluster ConfigMaps. Define ${CLUSTER_NAME}, ${REGION}, ${ENVIRONMENT} variables that are substituted at reconciliation time.

3. Generator Pattern (ArgoCD) ApplicationSets use generators to create Applications for each cluster. A matrix generator can create apps × clusters combinations from a single definition.

4. Hub-and-Spoke A management cluster hosts the GitOps operator, deploying to multiple workload clusters. Common in regulated environments where separation is required.

variable-substitution.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# Cluster-specific configuration (stored per-cluster)
# clusters/production-us-east/cluster-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-config
  namespace: flux-system
data:
  CLUSTER_NAME: production-us-east
  REGION: us-east-1
  ENVIRONMENT: production
  INGRESS_DOMAIN: us-east.production.example.com
  REPLICA_MINIMUM: "3"
 
---
# Kustomization with variable substitution
# clusters/production-us-east/apps.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: apps
  namespace: flux-system
spec:
  interval: 10m
  sourceRef:
    kind: GitRepository
    name: fleet-infra
  path: ./apps/service-a/overlays/production
  prune: true
  
  # Substitute variables from cluster-config
  postBuild:
    substitute:
      CLUSTER_NAME: production-us-east  # Fallback
    substituteFrom:
      - kind: ConfigMap
        name: cluster-config
 
---
# Application manifest can reference variables
# apps/service-a/overlays/production/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: service-a
  annotations:
    cluster: "${CLUSTER_NAME}"
    region: "${REGION}"
spec:
  replicas: ${REPLICA_MINIMUM}  # Substituted at reconcile time
  template:
    spec:
      containers:
        - name: service-a
          env:
            - name: CLUSTER_NAME
              value: "${CLUSTER_NAME}"

ArgoCD ApplicationSets for Multi-Cluster

ApplicationSets excel at multi-cluster management. Using the cluster generator, a single ApplicationSet can deploy an application to all registered clusters. Combined with the matrix generator, you can create every combination of app × environment × cluster from one definition.

Summary: Building Declarative Cluster State

We've explored how to model your entire cluster as declarative, Git-versioned state—from infrastructure layers through application workloads. Let's consolidate the key patterns:

Declarative State Key Takeaways

•Layer your state — Organize by dependency (prerequisites → infrastructure → observability → security → apps) with explicit ordering.
•Use base + overlays — Kustomize's inheritance model enables environment-specific configuration without duplication.
•Never commit plaintext secrets — Use Sealed Secrets, SOPS, or External Secrets Operator. External stores are best for high-rotation secrets.
•Handle cross-cutting concerns carefully — Establish defaults at platform level; allow applications to extend but not override core policies.
•Design for multi-cluster — Use variable substitution, generators, or hub-and-spoke patterns to manage fleet-wide configuration.
•One source of truth per resource — Every resource should be defined in exactly one place; no duplication across environments or clusters.

What's Next:

With your cluster state modeled declaratively, the next page covers automated reconciliation—the continuous loop that ensures your live clusters match the desired state in Git, including drift detection, health assessment, and failure handling.

Page Complete

You now understand how to structure a fully declarative cluster state, from infrastructure layers to application workloads, with proper secret management and multi-cluster patterns. Next, we'll explore automated reconciliation—the engine that keeps your cluster in sync with Git.

3 / 5

Loading learning content...

System Design (HLD)GitOps

GitOps: Infrastructure and Application Delivery via Git

LevelIntermediate

Duration90 mins

TopicGitOps

3 / 5

Declarative Cluster State

The Fully Declarative Cluster

What You Will Learn

The Layers of Cluster State

The Infrastructure Layer Cake:

Think of cluster state as a layered cake, where each layer depends on the layers below it:

Cluster State Layers (Bottom to Top)

•Cluster Prerequisites — Resources that must exist before anything else: namespaces, resource quotas, RBAC cluster roles, custom resource definitions (CRDs).
•Platform Infrastructure — Foundational services: ingress controllers, cert-manager, DNS management, service mesh control planes, secret management operators.
•Observability Stack — Monitoring, logging, tracing: Prometheus, Grafana, Loki, Jaeger, OpenTelemetry collectors.
•Security Infrastructure — Policy engines (OPA Gatekeeper, Kyverno), network policies, pod security standards, vulnerability scanners.
•Shared Services — Components used by multiple applications: databases, message queues, caches (if cluster-deployed rather than managed services).
•Application Workloads — The actual business applications, organized by team, product, or domain.

Converting Mermaid diagram...

Why Layers Matter:

Dependency Ordering: You can't deploy an Ingress before the Ingress Controller exists. You can't create a Certificate before cert-manager is running. Layers encode these dependencies.
Change Velocity: Infrastructure changes infrequently (weekly/monthly). Applications change constantly (hourly/daily). Separating layers prevents high-velocity application changes from affecting slow-moving infrastructure.
Ownership Boundaries: The platform team owns layers 1-4. Product teams own layer 6. Clear ownership prevents conflicts and enables parallel workflows.
Blast Radius Containment: A broken application deployment affects one team. A broken infrastructure deployment affects everyone. Separation allows different testing and approval requirements.

Dependencies in GitOps Tools

Modeling Infrastructure Declaratively

infrastructure-structure.txt

Directory Structure

# Infrastructure Layer Organization
infrastructure/
├── base/
│   ├── namespaces/
│   │   ├── kustomization.yaml
│   │   ├── ingress-nginx.yaml
│   │   ├── cert-manager.yaml
│   │   ├── monitoring.yaml
│   │   └── security.yaml
│   │
│   ├── crds/
│   │   ├── kustomization.yaml
│   │   └── README.md  # CRDs often installed by operators
│   │
│   ├── ingress-nginx/
│   │   ├── kustomization.yaml
│   │   ├── namespace.yaml
│   │   ├── helmrelease.yaml       # Or raw manifests
│   │   └── ingress-class.yaml
│   │
│   ├── cert-manager/
│   │   ├── kustomization.yaml
│   │   ├── namespace.yaml
│   │   ├── helmrelease.yaml
│   │   └── cluster-issuers/
│   │       ├── letsencrypt-staging.yaml
│   │       └── letsencrypt-production.yaml
│   │
│   ├── external-secrets/
│   │   ├── kustomization.yaml
│   │   ├── namespace.yaml
│   │   ├── helmrelease.yaml
│   │   └── cluster-secret-store.yaml
│   │
│   └── monitoring/
│       ├── kustomization.yaml
│       ├── prometheus/
│       │   ├── kustomization.yaml
│       │   └── helmrelease.yaml
│       └── grafana/
│           ├── kustomization.yaml
│           ├── helmrelease.yaml
│           └── dashboards/
│               ├── kubernetes-cluster.yaml
│               └── nginx-ingress.yaml
│
└── overlays/
    ├── production/
    │   ├── kustomization.yaml     # Patches for prod
    │   ├── ingress-nginx-patch.yaml
    │   └── prometheus-patch.yaml
    └── staging/
        ├── kustomization.yaml
        └── ingress-nginx-patch.yaml

Helm Charts in GitOps:

Many infrastructure components are distributed as Helm charts. In GitOps, you have two options:

Option 1: Render and Commit (Template in CI)

helm template ingress-nginx ingress-nginx/ingress-nginx \
  --values values-production.yaml \
  > manifests/ingress-nginx.yaml

Pros: Full visibility into generated manifests, easy diffing, no Helm at runtime. Cons: Upgrade process is more complex, generated files can be large.

Both approaches are valid. Option 2 is more common for infrastructure because it's simpler and infrastructure changes are heavily reviewed anyway.

ingress-nginx-helmrelease.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# HelmRelease for ingress-nginx (Flux style)
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
spec:
  interval: 1h
  chart:
    spec:
      chart: ingress-nginx
      version: "4.8.x"
      sourceRef:
        kind: HelmRepository
        name: ingress-nginx
        namespace: flux-system
      interval: 24h
  
  install:
    crds: CreateReplace
    remediation:
      retries: 3
  
  upgrade:
    crds: CreateReplace
    remediation:
      retries: 3
      remediateLastFailure: true
    cleanupOnFail: true
  
  values:
    controller:
      replicaCount: 3
      
      # Resource requests/limits
      resources:
        requests:
          cpu: 100m
          memory: 256Mi
        limits:
          cpu: 500m
          memory: 512Mi
      
      # High availability
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app.kubernetes.io/name: ingress-nginx
              topologyKey: kubernetes.io/hostname
      
      # Service configuration
      service:
        type: LoadBalancer
        externalTrafficPolicy: Local
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-type: nlb
          service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
      
      # Metrics for monitoring
      metrics:
        enabled: true
        serviceMonitor:
          enabled: true
          namespace: monitoring
      
      # Security
      admissionWebhooks:
        enabled: true
        
    # Default backend
    defaultBackend:
      enabled: true
      replicaCount: 2

CRD Management Complexity

Secret Management in GitOps

The Secret Management Spectrum:

Plaintext in Git — Never acceptable. Secrets in Git history are leaked forever.
Encrypted Secrets in Git — Acceptable. Secrets are encrypted before commit, decrypted in-cluster.
External Secret References — Excellent. Git stores references; secrets live in Vault, AWS Secrets Manager, etc.
Injected at Runtime — Viable for some cases. Secrets never touch Git, injected via admission webhooks.

GitOps Secret Management Solutions
Solution	Approach	Pros	Cons
Sealed Secrets	Encrypt with cluster public key, commit to Git	Simple, no external deps, GitOps-native	Secrets tied to cluster, key rotation complex
SOPS	Encrypt with KMS/PGP, commit to Git	Multi-key support, flexible, works offline	Requires decryption step, key management
External Secrets Operator	Sync from Vault/AWS/GCP to K8s Secrets	Centralized secret management, rotation	External dependency, network requirement
Vault Agent Injector	Inject secrets via webhook at pod start	Dynamic secrets, fine-grained access	Vault dependency, performance impact
CSI Secret Store	Mount secrets as volumes from providers	Standard interface, multiple backends	CSI driver complexity, mount semantics

Bitnami Sealed Secrets encrypts secrets using the cluster's public key. Only the Sealed Secrets controller running in the cluster can decrypt them.

sealed-secrets.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Create a regular Kubernetes secret
kubectl create secret generic db-credentials \
  --from-literal=username=myuser \
  --from-literal=password=supersecret \
  --dry-run=client -o yaml > secret.yaml
 
# Encrypt it using the cluster's public key
kubeseal --format yaml \
  --controller-namespace kube-system \
  --controller-name sealed-secrets-controller \
  < secret.yaml > sealed-secret.yaml
 
# The sealed-secret.yaml is SAFE to commit to Git!
cat sealed-secret.yaml

Secret Rotation Considerations

Application State Patterns

application-structure.txt

Directory Structure

# Application Layer Organization
apps/
├── my-service/
│   ├── base/
│   │   ├── kustomization.yaml
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   ├── hpa.yaml
│   │   ├── pdb.yaml
│   │   ├── configmap.yaml
│   │   └── service-account.yaml
│   │
│   └── overlays/
│       ├── development/
│       │   ├── kustomization.yaml
│       │   ├── replicas-patch.yaml    # 1 replica
│       │   └── resources-patch.yaml   # Lower limits
│       │
│       ├── staging/
│       │   ├── kustomization.yaml
│       │   ├── replicas-patch.yaml    # 2 replicas
│       │   └── ingress.yaml           # Staging domain
│       │
│       └── production/
│           ├── kustomization.yaml
│           ├── replicas-patch.yaml    # 5 replicas
│           ├── ingress.yaml           # Production domain
│           ├── resources-patch.yaml   # Higher limits
│           └── external-secret.yaml   # Prod secrets
│
├── another-service/
│   └── ...
│
└── _templates/
    ├── web-service/           # Reusable template
    │   ├── kustomization.yaml
    │   ├── deployment.yaml
    │   └── service.yaml
    └── background-worker/
        ├── kustomization.yaml
        └── deployment.yaml

Kustomize Overlay Pattern:

The base + overlay pattern is fundamental to GitOps. The base contains the complete, production-ready configuration. Overlays make minimal, targeted modifications for different environments.

Key Principles:

Base must be deployable — The base should work in production. Overlays simplify for lower environments, not augment for production.
Overlays are minimal — Only patch what differs. Don't duplicate entire files; use Kustomize patches.
Environment parity — The more overlays diverge from base, the less confidence you have that staging reflects production.
Consistent naming — Use the same namespace names across environments when possible. Different clusters, same logical structure.

kustomize-examples.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# apps/my-service/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
 
namespace: my-service
 
# Common labels for all resources
commonLabels:
  app.kubernetes.io/name: my-service
  app.kubernetes.io/component: backend
  app.kubernetes.io/part-of: my-platform
 
resources:
  - deployment.yaml
  - service.yaml
  - hpa.yaml
  - pdb.yaml
  - configmap.yaml
  - service-account.yaml
 
# Images can be overridden by overlays or GitOps tools
images:
  - name: my-service
    newName: registry.example.com/my-service
    newTag: latest  # Overridden by image automation

The Single-Source Image Tag

Cross-Cutting Concerns

Common Cross-Cutting Concerns

•Network Policies — Base policies set defaults; applications can extend with additional ingress/egress rules.
•Pod Security Standards — Cluster-wide policies enforce security baselines; exemptions declared per-namespace.
•Resource Quotas — Set at namespace level by platform team; applications work within boundaries.
•Limit Ranges — Default container resource limits; applications can override within quota bounds.
•Service Mesh Configuration — Global mesh settings versus per-service traffic policies.
•Observability Configuration — ServiceMonitors, logging configs that apps reference but don't own.

Pattern: Policy with Defaults and Overrides

A common pattern for cross-cutting concerns:

Cluster-wide defaults — Platform team defines baseline in infrastructure layer
Namespace-scoped policies — Platform team sets boundaries per team/namespace
Application-level extensions — Apps can add to (but not remove from) policies

For example, with network policies:

Cluster default: deny all ingress
Namespace default: allow from same namespace
Application: add specific external ingress rules

network-policy-pattern.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# Platform team defines defaults
# infrastructure/security/default-network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  # Applied to all namespaces via Kustomize namespaceSuffix
spec:
  podSelector: {}  # All pods
  policyTypes:
    - Ingress
  ingress: []  # Deny all ingress by default
 
---
# Allow intra-namespace communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector: {}  # Same namespace
 
---
# Application team adds their specific rules
# apps/my-service/base/network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: my-service-ingress
spec:
  podSelector:
    matchLabels:
      app: my-service
  policyTypes:
    - Ingress
  ingress:
    # Allow from ingress controller
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
          podSelector:
            matchLabels:
              app.kubernetes.io/name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
    # Allow from specific consuming service
    - from:
        - namespaceSelector:
            matchLabels:
              name: consumer-service
          podSelector:
            matchLabels:
              app: consumer-service

Policy Engines for Advanced Cross-Cutting

Multi-Cluster Declarative State

multi-cluster-structure.txt

Directory Structure

# Multi-Cluster Repository Organization
fleet-infra/
├── clusters/
│   ├── production-us-east/
│   │   ├── flux-system/           # Flux installation for this cluster
│   │   ├── infrastructure.yaml    # Points to infrastructure overlays
│   │   ├── apps.yaml              # Points to app overlays
│   │   └── cluster-config.yaml    # Cluster-specific variables
│   │
│   ├── production-eu-west/
│   │   ├── flux-system/
│   │   ├── infrastructure.yaml
│   │   ├── apps.yaml
│   │   └── cluster-config.yaml
│   │
│   ├── staging-us-east/
│   │   └── ...
│   │
│   └── development/
│       └── ...
│
├── infrastructure/
│   ├── base/                      # Shared infrastructure base
│   └── overlays/
│       ├── production/            # Production-specific
│       └── staging/               # Staging-specific
│
├── apps/
│   ├── service-a/
│   │   ├── base/
│   │   └── overlays/
│   │       ├── production/        # Shared across prod clusters
│   │       └── staging/
│   └── service-b/
│       └── ...
│
└── cluster-definitions/
    ├── production-us-east.yaml    # Cluster metadata
    ├── production-eu-west.yaml
    └── staging-us-east.yaml

Patterns for Multi-Cluster Configuration:

1. Inheritance Model Common base configurations with cluster-specific overlays. Each cluster's directory contains only what's unique to that cluster.

3. Generator Pattern (ArgoCD) ApplicationSets use generators to create Applications for each cluster. A matrix generator can create apps × clusters combinations from a single definition.

4. Hub-and-Spoke A management cluster hosts the GitOps operator, deploying to multiple workload clusters. Common in regulated environments where separation is required.

variable-substitution.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# Cluster-specific configuration (stored per-cluster)
# clusters/production-us-east/cluster-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-config
  namespace: flux-system
data:
  CLUSTER_NAME: production-us-east
  REGION: us-east-1
  ENVIRONMENT: production
  INGRESS_DOMAIN: us-east.production.example.com
  REPLICA_MINIMUM: "3"
 
---
# Kustomization with variable substitution
# clusters/production-us-east/apps.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: apps
  namespace: flux-system
spec:
  interval: 10m
  sourceRef:
    kind: GitRepository
    name: fleet-infra
  path: ./apps/service-a/overlays/production
  prune: true
  
  # Substitute variables from cluster-config
  postBuild:
    substitute:
      CLUSTER_NAME: production-us-east  # Fallback
    substituteFrom:
      - kind: ConfigMap
        name: cluster-config
 
---
# Application manifest can reference variables
# apps/service-a/overlays/production/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: service-a
  annotations:
    cluster: "${CLUSTER_NAME}"
    region: "${REGION}"
spec:
  replicas: ${REPLICA_MINIMUM}  # Substituted at reconcile time
  template:
    spec:
      containers:
        - name: service-a
          env:
            - name: CLUSTER_NAME
              value: "${CLUSTER_NAME}"

ArgoCD ApplicationSets for Multi-Cluster

Summary: Building Declarative Cluster State

We've explored how to model your entire cluster as declarative, Git-versioned state—from infrastructure layers through application workloads. Let's consolidate the key patterns:

Declarative State Key Takeaways

•Layer your state — Organize by dependency (prerequisites → infrastructure → observability → security → apps) with explicit ordering.
•Use base + overlays — Kustomize's inheritance model enables environment-specific configuration without duplication.
•Never commit plaintext secrets — Use Sealed Secrets, SOPS, or External Secrets Operator. External stores are best for high-rotation secrets.
•Handle cross-cutting concerns carefully — Establish defaults at platform level; allow applications to extend but not override core policies.
•Design for multi-cluster — Use variable substitution, generators, or hub-and-spoke patterns to manage fleet-wide configuration.
•One source of truth per resource — Every resource should be defined in exactly one place; no duplication across environments or clusters.

What's Next:

Page Complete

3 / 5