What Is IaC? - Learning Module

Loading content...

0/273

Version Control for Infrastructure

The Git Revolution for Infrastructure

It's 3 AM. Your pager goes off. Production is down. A misconfigured firewall rule is blocking all traffic. You fix it manually, restore service, and go back to sleep. A week later, your colleague 'updates' the firewall configuration from a different script, overwriting your fix. Production goes down again.

This nightmare scenario—loss of changes, conflicting modifications, and no record of what happened—is the daily reality of infrastructure management without version control. The same problems that led software development to adopt version control decades ago afflict infrastructure teams that haven't embraced Infrastructure as Code.

Version control transforms infrastructure operations from chaos to order.

When infrastructure is code, Git becomes the single source of truth. Every change is tracked. Every modification is reviewable. History is preserved forever. And the entire engineering organization can collaborate on infrastructure with the same workflows they use for application code.

What You Will Learn

By the end of this page, you will understand how to apply version control to infrastructure, the workflows and branching strategies that work best, how code review improves infrastructure quality, the audit and compliance benefits, and the practices that distinguish mature IaC teams from those just getting started.

Why Version Control for Infrastructure

Version control for software development solved fundamental problems: tracking changes, enabling collaboration, maintaining history, and supporting parallel development. Infrastructure faces these exact same problems—often in more acute forms because infrastructure failures have immediate, visible impact.

The Problems Version Control Solves:

Version Control Benefits for Infrastructure

•Complete History — Know exactly what changed, when, and by whom. Answer 'what was the configuration last Tuesday?' in seconds.
•Blame and Accountability — git blame shows who introduced every line. Not for punishment—for understanding context and asking the right person questions.
•Rollback Capability — If a change breaks things, revert to the previous version. The old configuration is always one git revert away.
•Branching and Isolation — Test infrastructure changes in isolation before merging to main. Multiple engineers can work simultaneously without conflict.
•Code Review — Changes go through peer review before deployment, catching errors and spreading knowledge.
•Documentation by Commit Message — Every change comes with an explanation of why it was made. The repository becomes its own documentation.

The Cultural Shift:

Adopting version control for infrastructure represents a cultural change as much as a technical one. It means:

No more SSH-and-edit — Changes go through the repository, not ad-hoc terminal commands
No more undocumented modifications — Every change has a commit message explaining why
No more 'works on my machine' — The repository is the authoritative source
No more tribal knowledge — Knowledge is captured in code and commit history

The Golden Rule

If it's not in Git, it should not exist in production. This golden rule transforms infrastructure operations. It creates accountability, enables automation, and ensures that the code always reflects reality.

Repository Structure and Organization

How you organize your infrastructure code repository significantly impacts maintainability, collaboration, and automation. There are several established patterns, each with trade-offs.

Monorepo vs. Polyrepo:

Monorepo (Single Repository)

•All infrastructure code in one repository
•Pros: Atomic changes across services, shared modules, unified CI/CD
•Cons: Large repo size, broader blast radius for mistakes, complex permissions
•Best for: Smaller teams, tightly coupled infrastructure, unified deployment

Polyrepo (Multiple Repositories)

•Separate repositories per team/service/environment
•Pros: Clear ownership, isolated blast radius, simpler permissions
•Cons: Cross-repo changes are complex, module versioning required
•Best for: Larger orgs, independent teams, microservices architectures

A Common Monorepo Structure:

infrastructure-repo-structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
infrastructure/
├── README.md                    # Repository documentation
├── .github/
│   └── workflows/               # GitHub Actions CI/CD pipelines
│       ├── terraform-plan.yml
│       └── terraform-apply.yml
│
├── modules/                     # Reusable Terraform modules
│   ├── vpc/                     # VPC module used across environments
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── eks-cluster/             # Kubernetes cluster module
│   ├── rds-postgres/            # Database module
│   └── ...
│
├── environments/                # Environment-specific configurations
│   ├── production/
│   │   ├── us-east-1/           # Region-specific within environment
│   │   │   ├── main.tf
│   │   │   ├── variables.tf
│   │   │   ├── terraform.tfvars # Environment-specific values
│   │   │   └── backend.tf       # State backend configuration
│   │   └── eu-west-1/
│   ├── staging/
│   │   └── us-east-1/
│   └── development/
│       └── us-east-1/
│
├── policies/                    # OPA/Sentinel policies for validation
│   ├── cost-controls.rego
│   ├── security-baseline.rego
│   └── naming-conventions.rego
│
└── scripts/                     # Helper scripts (imperative tasks)
    ├── import-existing.sh
    └── state-migration.sh

Keep Modules Separate from Environments

The separation between reusable 'modules' and environment-specific 'roots' (the environments directory) is crucial. Modules encapsulate logic; environments instantiate modules with specific values. This enables reuse while maintaining environment-specific customization.

Branching Strategies for Infrastructure

Infrastructure code benefits from well-defined branching strategies, but the requirements differ somewhat from application code. Infrastructure changes are often immediate and difficult to 'deploy to staging first' in the traditional sense.

Common Branching Patterns:

Branching Strategies Comparison
Strategy	Description	Best For
Trunk-Based Development	All changes go to main; feature flags for incomplete work	Small teams, frequent deployments, mature testing
GitFlow	Feature branches, develop branch, release branches	Release cycles, multiple versions in production
Environment Branches	Separate branches per environment (main=prod, staging, dev)	Clear environment progression, manual promotions
PR-per-Environment	Single main branch; PRs target different apply paths	GitOps, automated pipelines, audit trails

The Recommended Pattern: Trunk-Based with Environment Directories

For most teams, the best approach combines trunk-based development with environment-specific directories:

Single main branch — All merged code represents the desired state
Environment directories — /environments/production/, /environments/staging/, etc.
Feature branches — Short-lived branches for developing changes
Pull Requests — All changes reviewed before merge
Automated pipelines — Merge to main triggers automatic plan; manual approval for apply

typical-workflow.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Typical IaC Change Workflow
 
1. Engineer creates feature branch from main:
   git checkout -b feature/add-new-database
 
2. Makes changes to relevant environment:
   # Edit environments/staging/main.tf
   # Add new database resource
 
3. Commits with meaningful message:
   git commit -m "Add PostgreSQL database for user service
   
   - Creates db.t3.medium RDS instance
   - Configures in private subnet
   - Enables automated backups
   - Related: JIRA-1234"
 
4. Opens Pull Request to main:
   # Automated checks run:
   # - Terraform fmt (formatting)
   # - Terraform validate (syntax)
   # - Terraform plan (preview changes)
   # - Policy checks (security, cost)
 
5. Code review by peer:
   # Reviewer checks:
   # - Is the change correct?
   # - Are there security implications?
   # - Does the plan output look right?
 
6. After approval, merge to main:
   # Triggers automatic plan for production
   # Manual approval gate for apply
 
7. Apply executes:
   # Infrastructure updated
   # State file updated
   # Notification sent

Avoid Long-Lived Branches

Long-lived feature branches cause merge conflicts and drift from main. For infrastructure, this is particularly dangerous—you might be developing against an outdated understanding of current state. Keep branches short-lived (hours to days, not weeks).

Code Review for Infrastructure

Code review is perhaps the most valuable practice enabled by version control. For infrastructure, where mistakes can have immediate and severe consequences, code review provides a critical safety net.

What Reviewers Should Check:

Infrastructure Code Review Checklist

•Plan Output — Does the generated plan match the stated intent? Any unexpected destroys or recreates?
•Security Implications — Are security groups, IAM policies, and encryption configured correctly?
•Cost Impact — Are expensive resources justified? Could a smaller instance work?
•Naming Conventions — Do names follow organizational standards?
•State Safety — Could this change cause state corruption or data loss?
•Rollback Plan — If this fails, how do we recover?
•Dependencies — Does this change affect other teams or services?
•Documentation — Is the commit message and any inline documentation sufficient?

The Power of Plan Output in PRs:

Most CI/CD pipelines for Terraform automatically run terraform plan and post the output to the Pull Request. This is immensely valuable:

Reviewers see exactly what will change
Unexpected modifications are immediately visible
Resource counts (2 to add, 1 to change, 0 to destroy) provide quick sanity checks
Drift is detected if actual state differs from expected

plan-output-in-pr.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Terraform Plan Output (Posted to PR)
 
Terraform will perform the following actions:
 
  # aws_db_instance.users will be created
  + resource "aws_db_instance" "users" {
      + allocated_storage      = 100
      + engine                 = "postgres"
      + engine_version         = "14.9"
      + identifier             = "users-db"
      + instance_class         = "db.t3.medium"
      + multi_az               = true
      + storage_encrypted      = true
      ...
    }
 
  # aws_security_group.db will be created
  + resource "aws_security_group" "db" {
      + name   = "users-db-sg"
      + vpc_id = "vpc-12345"
      ...
    }
 
Plan: 2 to add, 0 to change, 0 to destroy.
 
---
✅ Plan looks correct
✅ Security group restricts to VPC only
✅ Multi-AZ enabled for production
⚠️  Note: Estimated cost ~$150/month

Require Reviews for Production

At minimum, require code review for all production infrastructure changes. Many teams also require reviews for staging. Development environments might allow self-merge to reduce friction, but even there, the audit trail of commits remains valuable.

Commit Messages and Living Documentation

In version-controlled infrastructure, commit messages serve as the primary documentation of why changes were made. The code shows what; the commits explain why. This makes commit message quality critically important.

Anatomy of an Excellent Infrastructure Commit:

excellent-commit-message.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
feat(vpc): Add NAT Gateway for private subnet egress
 
Previously, instances in private subnets had no internet access,
blocking package updates and external API calls. This change:
 
- Adds a NAT Gateway in each AZ for high availability
- Routes outbound traffic through NAT for private subnets
- Enables instances to reach internet without being directly exposed
 
Cost impact: ~$100/month per NAT Gateway ($300/month total)
 
Alternatives considered:
- VPC endpoints: More cost-effective for AWS services only, but we
  need access to external APIs (Stripe, Twilio)
- NAT instances: Lower cost but requires managing EC2, not worth
  the operational overhead for our team size
 
Related: INFRA-456
Approved-by: security-team@example.com

Commit Message Best Practices:

Infrastructure Commit Message Standards

•Imperative subject line — 'Add NAT Gateway' not 'Added NAT Gateway' or 'Adding NAT Gateway'
•Explain the why — Future you (and colleagues) need to understand the reasoning, not just the change
•Document alternatives considered — Show this wasn't the first idea; you evaluated options
•Include cost impact — For any resource that incurs significant cost
•Link to tickets — Connect to issue trackers for full context
•Note approvals — If security or other teams reviewed, mention it
•Keep subject under 50 chars — Enables readable logs and tooling integration

Commit History as Decision Record

When done well, your Git history becomes an architectural decision record. 'Why do we have three NAT gateways?' can be answered by searching the commit history. This is living documentation that can never become stale—it's attached to the actual changes.

Audit and Compliance Benefits

Regulatory compliance (SOC 2, HIPAA, PCI-DSS, etc.) requires demonstrating control over infrastructure changes. Version control provides this automatically, transforming compliance from a painful audit scramble into a routine process.

Compliance Questions Version Control Answers:

Compliance Questions and Git Answers
Auditor Question	Without Version Control	With Version Control
Who made this change?	Check CloudTrail, correlate timestamps, interview staff	git log --author or git blame instantly
When was this change made?	Parse multiple log sources	git log shows timestamp and chronology
Was the change approved?	Check email threads, Slack, meeting notes	PR approval recorded in merge commit
What was the previous configuration?	Maybe backups exist? Maybe documentation?	git show HEAD~1:file.tf shows exact previous state
What changes happened in Q3?	Weeks of log correlation	git log --since='2024-07-01' --until='2024-10-01'
Can you prove change control exists?	Document manual processes	Show PR workflow with required reviews

Automated Compliance Evidence:

Modern IaC pipelines can automatically generate compliance evidence:

PR records — Each change has associated discussion, approvals, and timestamps
Plan outputs — Archived plans show what was intended
Apply logs — Recorded output proves what was executed
Policy check results — Automated security scans documented per change
Git signed commits — Cryptographically prove who made changes (for high-security environments)

Compliance Becomes Continuous

Instead of annual panic preparing for audits, compliance becomes continuous. Every PR that passes review and policy checks is automatically compliant. Auditors can be given read access to the repository and see months of perfect change control with zero preparation.

Managing Secrets in Version Control

One critical challenge when version-controlling infrastructure: secrets must never be committed to Git. Database passwords, API keys, and certificates don't belong in your repository—even a private one. Git history is permanent; secrets committed even briefly remain in history forever.

Approaches for Secret Management:

Secret Management Strategies
Approach	Description	Best For
Environment Variables	Secrets injected at runtime, not in code	Simple setups, CI/CD pipelines
Secrets Managers	AWS Secrets Manager, HashiCorp Vault, Azure Key Vault	Production systems, rotation capabilities
Encrypted Files	SOPS, git-crypt encrypt secrets in repo	When secrets must be versioned alongside code
External References	Code references secret by name, value stored elsewhere	Terraform with data sources, Kubernetes external secrets

secrets-handling.tf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# BAD: Never do this!
# resource "aws_db_instance" "bad" {
#   password = "super-secret-password123"  # EXPOSED IN GIT HISTORY FOREVER
# }
 
# GOOD: Reference from secrets manager
data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "production/database/password"
}
 
resource "aws_db_instance" "good" {
  identifier = "production-db"
  # ... other config
  
  password = data.aws_secretsmanager_secret_version.db_password.secret_string
}
 
# GOOD: Use Terraform variables, inject at runtime
variable "database_password" {
  type        = string
  sensitive   = true
  description = "Database password - provide via TF_VAR_database_password"
}
 
resource "aws_db_instance" "also_good" {
  identifier = "production-db"
  # ... other config
  
  password = var.database_password
}

Pre-commit Hooks Are Essential

Use pre-commit hooks to scan for secrets before they're committed. Tools like gitleaks, trufflehog, and detect-secrets can prevent accidental secret exposure. Also use .gitignore to exclude files likely to contain secrets (*.tfvars with sensitive values, .env files).

Summary: Version Control for Infrastructure

We've covered the comprehensive application of version control to infrastructure. Let's consolidate the key insights:

Key Takeaways

•Version control brings software engineering to infrastructure — History, collaboration, review, and automation all become possible.
•Repository structure matters — Choose between monorepo and polyrepo based on team size and coupling needs.
•Trunk-based development works best — Short-lived branches, environment directories, and automated pipelines.
•Code review is the safety net — PRs with plan output catch errors before production impact.
•Commit messages are documentation — Explain why, not just what, to create a living architectural record.
•Compliance becomes automatic — Audit trails are built into the workflow, not prepared separately.
•Secrets require special handling — Never commit secrets; use secrets managers and pre-commit hooks.

What's Next:

With version control in place, we can now achieve the ultimate goals of Infrastructure as Code: Reproducibility and Consistency. The next page explores how IaC eliminates configuration drift, ensures environments match, and enables reliable infrastructure at any scale.

Page Complete

You now understand how to apply version control to infrastructure code, including repository organization, branching strategies, code review practices, commit message standards, compliance benefits, and secret management. These practices form the foundation for professional infrastructure operations.