Loading learning content...
In the era of manual infrastructure management, changes were approved through meetings, tickets, and verbal agreements. The actual implementation was a black box—reviewers approved an intent, not the specific configuration that would be applied. Modern infrastructure practices flip this model entirely.
Pull Request (PR) workflows for infrastructure bring the discipline of code review to operational changes. Just as developers review each other's code for bugs, style, and design, infrastructure PRs enable teams to review the exact configuration changes, their impact on existing resources, and compliance with organizational policies—all before any modification reaches production.
By the end of this page, you will understand how to structure PR workflows for infrastructure changes, the key elements that must be automated within the PR lifecycle, how to handle different risk levels appropriately, and the review practices that catch issues before they become incidents. You will be equipped to design PR processes that balance velocity with safety.
In infrastructure-as-code workflows, the Pull Request becomes the fundamental unit of change. Every modification—from adjusting a single security group rule to provisioning an entire new environment—flows through a PR that captures intent, enables review, and creates an audit trail.
What a Good Infrastructure PR Contains:
123456789101112131415161718192021222324252627282930313233343536373839404142
## Infrastructure Change Request ### Summary<!-- Brief description of what this change does --> ### Motivation<!-- Why is this change needed? Link to ticket/issue if applicable -->Relates to: #123 ### Change Type<!-- Check all that apply -->- [ ] New resource creation- [ ] Resource modification- [ ] Resource deletion- [ ] Configuration update- [ ] Security/IAM change ### Environments Affected<!-- Which environments will this change impact? -->- [ ] Development- [ ] Staging- [ ] Production ### Risk Assessment<!-- Estimate the risk level of this change -->- [ ] 🟢 Low - No production impact, easily reversible- [ ] 🟡 Medium - Limited production impact, rollback available- [ ] 🔴 High - Significant production impact, careful review needed ### Pre-Deployment Checklist- [ ] Terraform plan reviewed- [ ] Security scan passed- [ ] Policy checks passed- [ ] Tested in lower environment (if applicable)- [ ] Runbook updated (if applicable)- [ ] Monitoring/alerting in place ### Rollback Plan<!-- How will we revert if something goes wrong? --> ### Additional Notes<!-- Any other context for reviewers -->PR templates ensure consistent information across all infrastructure changes. Reviewers know exactly where to find what they need, and authors are guided to provide essential context. Most Git platforms support automatic template injection for PRs.
The power of PR workflows comes from automation. Every PR should trigger a series of automated checks that validate the change before any human reviewer needs to look at it. This approach catches obvious issues early and lets reviewers focus on higher-level concerns.
The Automated Check Pipeline:
| Check Type | Tools | What It Catches | When to Block |
|---|---|---|---|
| Syntax Validation | terraform validate, yamllint | Parse errors, invalid references, schema violations | Always - invalid syntax can't proceed |
| Formatting | terraform fmt, prettier | Inconsistent style, indentation issues | Usually - maintain codebase consistency |
| Security Scanning | tfsec, checkov, trivy | Security misconfigurations, exposed secrets, vulnerabilities | High/critical findings block; low can warn |
| Policy Compliance | OPA, Sentinel, Conftest | Policy violations, naming standards, required tags | Depends on policy severity |
| Cost Estimation | infracost, terraform plan | Unexpected cost changes, budget exceedance | Block if exceeds threshold |
| Plan Generation | terraform plan, pulumi preview | What will actually happen on apply | Block if plan fails |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187
name: Infrastructure PR Checks on: pull_request: paths: - 'infrastructure/**' - 'modules/**' permissions: contents: read pull-requests: write security-events: write jobs: # Stage 1: Fast validation (seconds) validate: name: Validate Configuration runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Terraform uses: hashicorp/setup-terraform@v3 - name: Format Check run: terraform fmt -check -recursive -diff - name: Initialize run: terraform init -backend=false working-directory: infrastructure - name: Validate run: terraform validate working-directory: infrastructure # Stage 2: Security scanning (1-2 minutes) security: name: Security Analysis runs-on: ubuntu-latest needs: validate steps: - uses: actions/checkout@v4 - name: Run tfsec uses: aquasecurity/tfsec-action@v1.0.0 with: soft_fail: false - name: Run Checkov uses: bridgecrewio/checkov-action@v12 with: directory: infrastructure framework: terraform output_format: cli,sarif output_file_path: console,checkov-results.sarif - name: Upload SARIF uses: github/codeql-action/upload-sarif@v2 with: sarif_file: checkov-results.sarif # Stage 3: Cost estimation cost: name: Cost Analysis runs-on: ubuntu-latest needs: validate steps: - uses: actions/checkout@v4 - name: Setup Infracost uses: infracost/actions/setup@v2 with: api-key: ${{ secrets.INFRACOST_API_KEY }} - name: Generate Cost Breakdown run: | infracost breakdown --path=infrastructure \ --format=json \ --out-file=/tmp/infracost.json - name: Post Cost Comment uses: infracost/actions/comment@v1 with: path: /tmp/infracost.json behavior: update # Stage 4: Generate and post plan plan: name: Terraform Plan runs-on: ubuntu-latest needs: [validate, security] environment: plan steps: - uses: actions/checkout@v4 - name: Setup Terraform uses: hashicorp/setup-terraform@v3 - name: Configure AWS Credentials uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: ${{ secrets.AWS_PLAN_ROLE }} aws-region: us-west-2 - name: Terraform Init run: terraform init working-directory: infrastructure - name: Terraform Plan id: plan run: | terraform plan -no-color -out=tfplan 2>&1 | tee plan.txt working-directory: infrastructure - name: Analyze Plan id: analyze run: | # Count change types CREATES=$(grep -c "will be created" plan.txt || echo 0) UPDATES=$(grep -c "will be updated" plan.txt || echo 0) DESTROYS=$(grep -c "will be destroyed" plan.txt || echo 0) REPLACES=$(grep -c "must be replaced" plan.txt || echo 0) echo "creates=$CREATES" >> $GITHUB_OUTPUT echo "updates=$UPDATES" >> $GITHUB_OUTPUT echo "destroys=$DESTROYS" >> $GITHUB_OUTPUT echo "replaces=$REPLACES" >> $GITHUB_OUTPUT # Determine risk level if [ "$DESTROYS" -gt 0 ] || [ "$REPLACES" -gt 0 ]; then echo "risk=HIGH" >> $GITHUB_OUTPUT elif [ "$UPDATES" -gt 5 ]; then echo "risk=MEDIUM" >> $GITHUB_OUTPUT else echo "risk=LOW" >> $GITHUB_OUTPUT fi working-directory: infrastructure - name: Post Plan to PR uses: actions/github-script@v7 with: script: | const fs = require('fs'); const plan = fs.readFileSync('infrastructure/plan.txt', 'utf8'); const risk = '${{ steps.analyze.outputs.risk }}'; const riskBadge = { LOW: '🟢 Low Risk', MEDIUM: '🟡 Medium Risk', HIGH: '🔴 High Risk' }[risk]; const body = `## Terraform Plan ### Risk Assessment: ${riskBadge} | Change Type | Count | |-------------|-------| | Create | ${{ steps.analyze.outputs.creates }} | | Update | ${{ steps.analyze.outputs.updates }} | | Destroy | ${{ steps.analyze.outputs.destroys }} | | Replace | ${{ steps.analyze.outputs.replaces }} | <details> <summary>Show Plan Output</summary> \`\`\`hcl ${plan.substring(0, 60000)} \`\`\` </details> ${risk === 'HIGH' ? '\n⚠️ **This change includes destructive operations. Additional review required.**' : ''} `; github.rest.issues.createComment({ owner: context.repo.owner, repo: context.repo.repo, issue_number: context.issue.number, body }); - name: Save Plan Artifact uses: actions/upload-artifact@v4 with: name: tfplan-${{ github.event.pull_request.number }} path: infrastructure/tfplan retention-days: 7Notice the check dependency chain: fast checks (validate) run first, slower checks (security, plan) only run if fast checks pass. This fails fast on obvious issues and saves compute time. Security scanning and cost estimation can run in parallel once validation passes.
One of the most valuable patterns in infrastructure PRs is automatically posting the execution plan as a comment. This makes the plan visible to all reviewers directly in the PR interface, eliminating the need to run commands locally or navigate to CI systems.
What an Effective Plan Comment Shows:
Tools That Implement This Pattern:
atlantis apply commandsExample Plan Comment Flow:
Atlantis popularized the pattern of using PR comments as commands: 'atlantis plan' to regenerate the plan, 'atlantis apply' to apply after approval. This keeps all interaction within the PR interface, creating a complete audit trail of who planned, who approved, and who applied.
Not all infrastructure changes carry equal risk. A tweak to a development environment doesn't warrant the same scrutiny as modifying production database configuration. Effective PR workflows implement risk-based review requirements that match approval rigor to change impact.
Categorizing Change Risk:
| Risk Level | Examples | Review Requirements | Approval Gates |
|---|---|---|---|
| 🟢 Low | Update tags, modify dev resources, formatting fixes | 1 reviewer, auto-merge eligible | Automated checks pass |
| 🟡 Medium | Add new resources, modify staging config, update policies | 2 reviewers from different teams | Checks pass + human approval |
| 🔴 High | Modify production resources, IAM changes, network updates | Senior engineer + security review | Multiple approvals required |
| ⛔ Critical | Delete production resources, modify encryption, change VPCs | Architecture review + management sign-off | Extended approval chain |
Implementing Risk Classification:
Risk classification can be automated based on:
/production/ require more review than /development/12345678910111213141516171819202122
# GitHub CODEOWNERS for Infrastructure Repository # Default: Platform team reviews all infrastructure* @org/platform-team # Production infrastructure requires senior approval/infrastructure/production/ @org/platform-seniors @org/sre-oncall # Security-sensitive resources need security team/infrastructure/*/iam/ @org/security-team @org/platform-seniors/infrastructure/*/kms/ @org/security-team @org/platform-seniors/infrastructure/*/vpc/ @org/network-team @org/platform-seniors # Database changes require DBA review/infrastructure/*/rds/ @org/dba-team @org/platform-team/infrastructure/*/dynamodb/ @org/dba-team @org/platform-team # Development environment can have lighter review/infrastructure/development/ @org/platform-team # CI/CD configuration changes need platform team/.github/workflows/ @org/platform-seniors12345678910111213141516171819202122
{ "required_pull_request_reviews": { "dismiss_stale_reviews": true, "require_code_owner_reviews": true, "required_approving_review_count": 2, "require_last_push_approval": true }, "required_status_checks": { "strict": true, "checks": [ { "context": "validate" }, { "context": "security" }, { "context": "plan" }, { "context": "cost" } ] }, "enforce_admins": true, "required_linear_history": true, "allow_force_pushes": false, "allow_deletions": false, "required_conversation_resolution": true}Risk classification should reflect actual risk, not bureaucratic checkboxes. If every change is marked 'high risk,' reviewers become fatigued and rubber-stamp approvals. Calibrate your risk levels so 'high risk' truly means 'pay extra attention.'
Reviewing infrastructure PRs differs from reviewing application code. Reviewers must evaluate not just the code correctness but the operational impact, security implications, and blast radius of changes. Here's a structured approach to effective infrastructure reviews:
The Infrastructure PR Review Checklist:
Common Review Questions to Ask:
A well-written PR should make the reviewer's job easy. If reviewers regularly have to ask basic clarifying questions, improve your PR templates and guidance for authors. Reviewers shouldn't have to dig for context—it should be provided upfront.
The merge event triggers the actual infrastructure deployment. How merges are handled affects deployment reliability, history clarity, and rollback capability.
Git Merge Strategies for Infrastructure:
| Strategy | History Appearance | Rollback Ease | Best For |
|---|---|---|---|
| Merge Commit | Preserves branch history, merge commits visible | Moderate - identify merge commit to revert | Complex changes with meaningful branch history |
| Squash Merge | Single commit per PR, clean linear history | Easy - revert single commit | Most infrastructure changes; recommended default |
| Rebase Merge | Linear history, preserves individual commits | Moderate - may need to revert multiple commits | When individual commits are meaningful |
Most infrastructure repositories benefit from squash merging by default:
Post-Merge Deployment Flow:
After merge, the deployment workflow activates. The key is connecting the approved plan to the actual apply:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
name: Apply Infrastructure on: push: branches: [main] paths: - 'infrastructure/**' concurrency: group: terraform-apply cancel-in-progress: false jobs: apply: name: Apply Infrastructure Changes runs-on: ubuntu-latest environment: production steps: - uses: actions/checkout@v4 - name: Get PR Number id: pr run: | PR_NUM=$(gh pr list --search "${{ github.sha }}" \ --state merged --json number -q '.[0].number') echo "number=$PR_NUM" >> $GITHUB_OUTPUT env: GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} - name: Download Plan Artifact id: download uses: dawidd6/action-download-artifact@v2 with: workflow: pr-checks.yaml pr: ${{ steps.pr.outputs.number }} name: tfplan-${{ steps.pr.outputs.number }} path: infrastructure/ continue-on-error: true - name: Setup Terraform uses: hashicorp/setup-terraform@v3 - name: Terraform Init run: terraform init working-directory: infrastructure - name: Validate Plan Still Applicable id: validate if: steps.download.outcome == 'success' run: | # Check if plan is still valid terraform show tfplan -no-color > /dev/null 2>&1 echo "valid=true" >> $GITHUB_OUTPUT working-directory: infrastructure continue-on-error: true - name: Apply Saved Plan if: steps.validate.outputs.valid == 'true' run: | terraform apply -auto-approve tfplan working-directory: infrastructure - name: Regenerate and Apply (if saved plan invalid) if: steps.validate.outputs.valid != 'true' run: | echo "⚠️ Saved plan no longer valid. Regenerating..." terraform plan -out=tfplan terraform apply -auto-approve tfplan working-directory: infrastructure - name: Post-Apply Verification run: | # Run verification scripts ./scripts/verify-deployment.sh working-directory: infrastructureIf significant time passes between PR approval and merge, the saved plan may no longer be valid (infrastructure drifted, resources changed by another party). Robust workflows either validate the plan is still applicable or require re-planning. Never blindly apply an old plan.
Unlike application code where conflicts are resolved through Git merge mechanics, infrastructure PRs face an additional challenge: state conflicts. Two PRs might not conflict in code but could conflict when applied because they both target the same infrastructure resources.
Types of Infrastructure Conflicts:
| Conflict Type | Example | Detection | Resolution |
|---|---|---|---|
| Git Conflict | Both PRs modify same Terraform file | Git merge conflict | Standard merge resolution |
| Resource Conflict | Both PRs modify same AWS resource | Second plan shows unexpected changes | Coordinate changes, rebase PR |
| State Lock | Two applies attempted simultaneously | State lock error | Wait for first apply, then retry |
| Dependency Conflict | PR A creates resource, PR B depends on it | Plan failure in PR B | Merge in dependency order |
Strategies for Managing Concurrent Changes:
Require branch up-to-date — Force PRs to be rebased on latest main before merge. Ensures sequential application.
Lock during plan — Some teams lock state during the plan phase to guarantee plan matches apply.
Single apply queue — Use concurrency controls to ensure only one apply runs at a time.
Resource partitioning — Organize infrastructure so teams rarely need to modify the same resources.
Cross-PR coordination — Use PR labels or comments to flag related changes that must merge in order.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
# Ensure PR is based on latest main before allowing mergename: Require Fresh Branch on: pull_request: types: [opened, synchronize, reopened] jobs: check-freshness: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Check if branch is current id: check run: | git fetch origin main # Check if main is an ancestor of this branch if git merge-base --is-ancestor origin/main HEAD; then echo "✅ Branch is up to date with main" echo "fresh=true" >> $GITHUB_OUTPUT else echo "❌ Branch needs rebase on main" echo "fresh=false" >> $GITHUB_OUTPUT fi - name: Post status if stale if: steps.check.outputs.fresh == 'false' uses: actions/github-script@v7 with: script: | github.rest.issues.createComment({ owner: context.repo.owner, repo: context.repo.repo, issue_number: context.issue.number, body: `⚠️ This branch is behind \`main\`. Please rebase to ensure the plan reflects current infrastructure state. \`\`\` git fetch origin main git rebase origin/main git push --force-with-lease \`\`\`` }); - name: Fail if stale if: steps.check.outputs.fresh == 'false' run: exit 1Organizations using a single infrastructure repository can enforce strict merge ordering and detect conflicts earlier. With multiple repositories, cross-repo coordination requires additional tooling—or very disciplined communication.
Despite best intentions, emergencies happen. Systems break, incidents occur, and sometimes the normal PR process is too slow. Every organization needs break-glass procedures that allow emergency changes while maintaining accountability.
When Break-Glass Is Appropriate:
Break-Glass Principles:
Implementation Options:
emergency/ prefix auto-merge with reduced checksThe Backfill Requirement:
Emergency changes must be reconciled with the source of truth. If you made a change outside normal process, you must:
If break-glass procedures are used frequently, it indicates the normal process is too slow or burdensome. Trending emergency bypass usage is a signal to improve the standard workflow, not to normalize working around it.
Pull Request workflows transform infrastructure changes from opaque operations into transparent, reviewable, and auditable processes. The key principles to remember:
What's Next:
With PR workflows established, the next page covers Automated Testing for IaC—the testing strategies that catch issues before human review, from policy tests to integration tests to compliance validation.
You now understand how to structure Pull Request workflows for infrastructure, the automated checks that should run, how to implement risk-based reviews, and how to handle edge cases like concurrent changes and emergencies.