Loading content...
It's 3 AM. Your pager goes off. Production is down. A misconfigured firewall rule is blocking all traffic. You fix it manually, restore service, and go back to sleep. A week later, your colleague 'updates' the firewall configuration from a different script, overwriting your fix. Production goes down again.
This nightmare scenario—loss of changes, conflicting modifications, and no record of what happened—is the daily reality of infrastructure management without version control. The same problems that led software development to adopt version control decades ago afflict infrastructure teams that haven't embraced Infrastructure as Code.
Version control transforms infrastructure operations from chaos to order.
When infrastructure is code, Git becomes the single source of truth. Every change is tracked. Every modification is reviewable. History is preserved forever. And the entire engineering organization can collaborate on infrastructure with the same workflows they use for application code.
By the end of this page, you will understand how to apply version control to infrastructure, the workflows and branching strategies that work best, how code review improves infrastructure quality, the audit and compliance benefits, and the practices that distinguish mature IaC teams from those just getting started.
Version control for software development solved fundamental problems: tracking changes, enabling collaboration, maintaining history, and supporting parallel development. Infrastructure faces these exact same problems—often in more acute forms because infrastructure failures have immediate, visible impact.
The Problems Version Control Solves:
git blame shows who introduced every line. Not for punishment—for understanding context and asking the right person questions.git revert away.The Cultural Shift:
Adopting version control for infrastructure represents a cultural change as much as a technical one. It means:
If it's not in Git, it should not exist in production. This golden rule transforms infrastructure operations. It creates accountability, enables automation, and ensures that the code always reflects reality.
How you organize your infrastructure code repository significantly impacts maintainability, collaboration, and automation. There are several established patterns, each with trade-offs.
Monorepo vs. Polyrepo:
A Common Monorepo Structure:
12345678910111213141516171819202122232425262728293031323334353637
infrastructure/├── README.md # Repository documentation├── .github/│ └── workflows/ # GitHub Actions CI/CD pipelines│ ├── terraform-plan.yml│ └── terraform-apply.yml│├── modules/ # Reusable Terraform modules│ ├── vpc/ # VPC module used across environments│ │ ├── main.tf│ │ ├── variables.tf│ │ └── outputs.tf│ ├── eks-cluster/ # Kubernetes cluster module│ ├── rds-postgres/ # Database module│ └── ...│├── environments/ # Environment-specific configurations│ ├── production/│ │ ├── us-east-1/ # Region-specific within environment│ │ │ ├── main.tf│ │ │ ├── variables.tf│ │ │ ├── terraform.tfvars # Environment-specific values│ │ │ └── backend.tf # State backend configuration│ │ └── eu-west-1/│ ├── staging/│ │ └── us-east-1/│ └── development/│ └── us-east-1/│├── policies/ # OPA/Sentinel policies for validation│ ├── cost-controls.rego│ ├── security-baseline.rego│ └── naming-conventions.rego│└── scripts/ # Helper scripts (imperative tasks) ├── import-existing.sh └── state-migration.shThe separation between reusable 'modules' and environment-specific 'roots' (the environments directory) is crucial. Modules encapsulate logic; environments instantiate modules with specific values. This enables reuse while maintaining environment-specific customization.
Infrastructure code benefits from well-defined branching strategies, but the requirements differ somewhat from application code. Infrastructure changes are often immediate and difficult to 'deploy to staging first' in the traditional sense.
Common Branching Patterns:
| Strategy | Description | Best For |
|---|---|---|
| Trunk-Based Development | All changes go to main; feature flags for incomplete work | Small teams, frequent deployments, mature testing |
| GitFlow | Feature branches, develop branch, release branches | Release cycles, multiple versions in production |
| Environment Branches | Separate branches per environment (main=prod, staging, dev) | Clear environment progression, manual promotions |
| PR-per-Environment | Single main branch; PRs target different apply paths | GitOps, automated pipelines, audit trails |
The Recommended Pattern: Trunk-Based with Environment Directories
For most teams, the best approach combines trunk-based development with environment-specific directories:
/environments/production/, /environments/staging/, etc.1234567891011121314151617181920212223242526272829303132333435363738
# Typical IaC Change Workflow 1. Engineer creates feature branch from main: git checkout -b feature/add-new-database 2. Makes changes to relevant environment: # Edit environments/staging/main.tf # Add new database resource 3. Commits with meaningful message: git commit -m "Add PostgreSQL database for user service - Creates db.t3.medium RDS instance - Configures in private subnet - Enables automated backups - Related: JIRA-1234" 4. Opens Pull Request to main: # Automated checks run: # - Terraform fmt (formatting) # - Terraform validate (syntax) # - Terraform plan (preview changes) # - Policy checks (security, cost) 5. Code review by peer: # Reviewer checks: # - Is the change correct? # - Are there security implications? # - Does the plan output look right? 6. After approval, merge to main: # Triggers automatic plan for production # Manual approval gate for apply 7. Apply executes: # Infrastructure updated # State file updated # Notification sentLong-lived feature branches cause merge conflicts and drift from main. For infrastructure, this is particularly dangerous—you might be developing against an outdated understanding of current state. Keep branches short-lived (hours to days, not weeks).
Code review is perhaps the most valuable practice enabled by version control. For infrastructure, where mistakes can have immediate and severe consequences, code review provides a critical safety net.
What Reviewers Should Check:
The Power of Plan Output in PRs:
Most CI/CD pipelines for Terraform automatically run terraform plan and post the output to the Pull Request. This is immensely valuable:
123456789101112131415161718192021222324252627282930
# Terraform Plan Output (Posted to PR) Terraform will perform the following actions: # aws_db_instance.users will be created + resource "aws_db_instance" "users" { + allocated_storage = 100 + engine = "postgres" + engine_version = "14.9" + identifier = "users-db" + instance_class = "db.t3.medium" + multi_az = true + storage_encrypted = true ... } # aws_security_group.db will be created + resource "aws_security_group" "db" { + name = "users-db-sg" + vpc_id = "vpc-12345" ... } Plan: 2 to add, 0 to change, 0 to destroy. ---✅ Plan looks correct✅ Security group restricts to VPC only✅ Multi-AZ enabled for production⚠️ Note: Estimated cost ~$150/monthAt minimum, require code review for all production infrastructure changes. Many teams also require reviews for staging. Development environments might allow self-merge to reduce friction, but even there, the audit trail of commits remains valuable.
In version-controlled infrastructure, commit messages serve as the primary documentation of why changes were made. The code shows what; the commits explain why. This makes commit message quality critically important.
Anatomy of an Excellent Infrastructure Commit:
12345678910111213141516171819
feat(vpc): Add NAT Gateway for private subnet egress Previously, instances in private subnets had no internet access,blocking package updates and external API calls. This change: - Adds a NAT Gateway in each AZ for high availability- Routes outbound traffic through NAT for private subnets- Enables instances to reach internet without being directly exposed Cost impact: ~$100/month per NAT Gateway ($300/month total) Alternatives considered:- VPC endpoints: More cost-effective for AWS services only, but we need access to external APIs (Stripe, Twilio)- NAT instances: Lower cost but requires managing EC2, not worth the operational overhead for our team size Related: INFRA-456Approved-by: security-team@example.comCommit Message Best Practices:
When done well, your Git history becomes an architectural decision record. 'Why do we have three NAT gateways?' can be answered by searching the commit history. This is living documentation that can never become stale—it's attached to the actual changes.
Regulatory compliance (SOC 2, HIPAA, PCI-DSS, etc.) requires demonstrating control over infrastructure changes. Version control provides this automatically, transforming compliance from a painful audit scramble into a routine process.
Compliance Questions Version Control Answers:
| Auditor Question | Without Version Control | With Version Control |
|---|---|---|
| Who made this change? | Check CloudTrail, correlate timestamps, interview staff | git log --author or git blame instantly |
| When was this change made? | Parse multiple log sources | git log shows timestamp and chronology |
| Was the change approved? | Check email threads, Slack, meeting notes | PR approval recorded in merge commit |
| What was the previous configuration? | Maybe backups exist? Maybe documentation? | git show HEAD~1:file.tf shows exact previous state |
| What changes happened in Q3? | Weeks of log correlation | git log --since='2024-07-01' --until='2024-10-01' |
| Can you prove change control exists? | Document manual processes | Show PR workflow with required reviews |
Automated Compliance Evidence:
Modern IaC pipelines can automatically generate compliance evidence:
Instead of annual panic preparing for audits, compliance becomes continuous. Every PR that passes review and policy checks is automatically compliant. Auditors can be given read access to the repository and see months of perfect change control with zero preparation.
One critical challenge when version-controlling infrastructure: secrets must never be committed to Git. Database passwords, API keys, and certificates don't belong in your repository—even a private one. Git history is permanent; secrets committed even briefly remain in history forever.
Approaches for Secret Management:
| Approach | Description | Best For |
|---|---|---|
| Environment Variables | Secrets injected at runtime, not in code | Simple setups, CI/CD pipelines |
| Secrets Managers | AWS Secrets Manager, HashiCorp Vault, Azure Key Vault | Production systems, rotation capabilities |
| Encrypted Files | SOPS, git-crypt encrypt secrets in repo | When secrets must be versioned alongside code |
| External References | Code references secret by name, value stored elsewhere | Terraform with data sources, Kubernetes external secrets |
123456789101112131415161718192021222324252627282930
# BAD: Never do this!# resource "aws_db_instance" "bad" {# password = "super-secret-password123" # EXPOSED IN GIT HISTORY FOREVER# } # GOOD: Reference from secrets managerdata "aws_secretsmanager_secret_version" "db_password" { secret_id = "production/database/password"} resource "aws_db_instance" "good" { identifier = "production-db" # ... other config password = data.aws_secretsmanager_secret_version.db_password.secret_string} # GOOD: Use Terraform variables, inject at runtimevariable "database_password" { type = string sensitive = true description = "Database password - provide via TF_VAR_database_password"} resource "aws_db_instance" "also_good" { identifier = "production-db" # ... other config password = var.database_password}Use pre-commit hooks to scan for secrets before they're committed. Tools like gitleaks, trufflehog, and detect-secrets can prevent accidental secret exposure. Also use .gitignore to exclude files likely to contain secrets (*.tfvars with sensitive values, .env files).
We've covered the comprehensive application of version control to infrastructure. Let's consolidate the key insights:
What's Next:
With version control in place, we can now achieve the ultimate goals of Infrastructure as Code: Reproducibility and Consistency. The next page explores how IaC eliminates configuration drift, ensures environments match, and enables reliable infrastructure at any scale.
You now understand how to apply version control to infrastructure code, including repository organization, branching strategies, code review practices, commit message standards, compliance benefits, and secret management. These practices form the foundation for professional infrastructure operations.