System Design (HLD)CI/CD for Infrastructure

CI/CD for Infrastructure

LevelAdvanced

Duration90 mins

TopicCI/CD for Infrastructure

3 / 5

Pull Request Workflows

Infrastructure Changes as Code Reviews

In the era of manual infrastructure management, changes were approved through meetings, tickets, and verbal agreements. The actual implementation was a black box—reviewers approved an intent, not the specific configuration that would be applied. Modern infrastructure practices flip this model entirely.

Pull Request (PR) workflows for infrastructure bring the discipline of code review to operational changes. Just as developers review each other's code for bugs, style, and design, infrastructure PRs enable teams to review the exact configuration changes, their impact on existing resources, and compliance with organizational policies—all before any modification reaches production.

What You Will Learn

By the end of this page, you will understand how to structure PR workflows for infrastructure changes, the key elements that must be automated within the PR lifecycle, how to handle different risk levels appropriately, and the review practices that catch issues before they become incidents. You will be equipped to design PR processes that balance velocity with safety.

The Pull Request as the Unit of Change

In infrastructure-as-code workflows, the Pull Request becomes the fundamental unit of change. Every modification—from adjusting a single security group rule to provisioning an entire new environment—flows through a PR that captures intent, enables review, and creates an audit trail.

What a Good Infrastructure PR Contains:

Elements of an Effective Infrastructure PR

•Intent Description — Why is this change being made? What problem does it solve or what feature does it enable?
•Configuration Changes — The actual code diff: Terraform, Kubernetes manifests, Ansible playbooks, etc.
•Execution Plan — Automated preview of exactly what resources will be created, modified, or destroyed
•Validation Results — Outputs from linters, security scanners, policy checks, and tests
•Affected Environments — Which environments will be impacted and in what order
•Rollback Procedure — How to revert if the change causes issues (often: 'revert this commit')
•Related Links — Associated tickets, design docs, runbooks, or related PRs

pr-template.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
## Infrastructure Change Request
 
### Summary
<!-- Brief description of what this change does -->
 
### Motivation
<!-- Why is this change needed? Link to ticket/issue if applicable -->
Relates to: #123
 
### Change Type
<!-- Check all that apply -->
- [ ] New resource creation
- [ ] Resource modification
- [ ] Resource deletion
- [ ] Configuration update
- [ ] Security/IAM change
 
### Environments Affected
<!-- Which environments will this change impact? -->
- [ ] Development
- [ ] Staging
- [ ] Production
 
### Risk Assessment
<!-- Estimate the risk level of this change -->
- [ ] 🟢 Low - No production impact, easily reversible
- [ ] 🟡 Medium - Limited production impact, rollback available
- [ ] 🔴 High - Significant production impact, careful review needed
 
### Pre-Deployment Checklist
- [ ] Terraform plan reviewed
- [ ] Security scan passed
- [ ] Policy checks passed
- [ ] Tested in lower environment (if applicable)
- [ ] Runbook updated (if applicable)
- [ ] Monitoring/alerting in place
 
### Rollback Plan
<!-- How will we revert if something goes wrong? -->
 
### Additional Notes
<!-- Any other context for reviewers -->

Templates Reduce Cognitive Load

PR templates ensure consistent information across all infrastructure changes. Reviewers know exactly where to find what they need, and authors are guided to provide essential context. Most Git platforms support automatic template injection for PRs.

Automated Checks in the PR Lifecycle

The power of PR workflows comes from automation. Every PR should trigger a series of automated checks that validate the change before any human reviewer needs to look at it. This approach catches obvious issues early and lets reviewers focus on higher-level concerns.

The Automated Check Pipeline:

Converting Mermaid diagram...

Automated PR Checks for Infrastructure
Check Type	Tools	What It Catches	When to Block
Syntax Validation	terraform validate, yamllint	Parse errors, invalid references, schema violations	Always - invalid syntax can't proceed
Formatting	terraform fmt, prettier	Inconsistent style, indentation issues	Usually - maintain codebase consistency
Security Scanning	tfsec, checkov, trivy	Security misconfigurations, exposed secrets, vulnerabilities	High/critical findings block; low can warn
Policy Compliance	OPA, Sentinel, Conftest	Policy violations, naming standards, required tags	Depends on policy severity
Cost Estimation	infracost, terraform plan	Unexpected cost changes, budget exceedance	Block if exceeds threshold
Plan Generation	terraform plan, pulumi preview	What will actually happen on apply	Block if plan fails

pr-checks.yaml
GitHub Actions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
name: Infrastructure PR Checks
 
on:
  pull_request:
    paths:
      - 'infrastructure/**'
      - 'modules/**'
 
permissions:
  contents: read
  pull-requests: write
  security-events: write
 
jobs:
  # Stage 1: Fast validation (seconds)
  validate:
    name: Validate Configuration
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        
      - name: Format Check
        run: terraform fmt -check -recursive -diff
        
      - name: Initialize
        run: terraform init -backend=false
        working-directory: infrastructure
        
      - name: Validate
        run: terraform validate
        working-directory: infrastructure
 
  # Stage 2: Security scanning (1-2 minutes)
  security:
    name: Security Analysis
    runs-on: ubuntu-latest
    needs: validate
    steps:
      - uses: actions/checkout@v4
      
      - name: Run tfsec
        uses: aquasecurity/tfsec-action@v1.0.0
        with:
          soft_fail: false
          
      - name: Run Checkov
        uses: bridgecrewio/checkov-action@v12
        with:
          directory: infrastructure
          framework: terraform
          output_format: cli,sarif
          output_file_path: console,checkov-results.sarif
          
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: checkov-results.sarif
 
  # Stage 3: Cost estimation
  cost:
    name: Cost Analysis
    runs-on: ubuntu-latest
    needs: validate
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Infracost
        uses: infracost/actions/setup@v2
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}
          
      - name: Generate Cost Breakdown
        run: |
          infracost breakdown --path=infrastructure \
            --format=json \
            --out-file=/tmp/infracost.json
            
      - name: Post Cost Comment
        uses: infracost/actions/comment@v1
        with:
          path: /tmp/infracost.json
          behavior: update
 
  # Stage 4: Generate and post plan
  plan:
    name: Terraform Plan
    runs-on: ubuntu-latest
    needs: [validate, security]
    environment: plan
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_PLAN_ROLE }}
          aws-region: us-west-2
          
      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure
        
      - name: Terraform Plan
        id: plan
        run: |
          terraform plan -no-color -out=tfplan 2>&1 | tee plan.txt
        working-directory: infrastructure
        
      - name: Analyze Plan
        id: analyze
        run: |
          # Count change types
          CREATES=$(grep -c "will be created" plan.txt || echo 0)
          UPDATES=$(grep -c "will be updated" plan.txt || echo 0)
          DESTROYS=$(grep -c "will be destroyed" plan.txt || echo 0)
          REPLACES=$(grep -c "must be replaced" plan.txt || echo 0)
          
          echo "creates=$CREATES" >> $GITHUB_OUTPUT
          echo "updates=$UPDATES" >> $GITHUB_OUTPUT
          echo "destroys=$DESTROYS" >> $GITHUB_OUTPUT
          echo "replaces=$REPLACES" >> $GITHUB_OUTPUT
          
          # Determine risk level
          if [ "$DESTROYS" -gt 0 ] || [ "$REPLACES" -gt 0 ]; then
            echo "risk=HIGH" >> $GITHUB_OUTPUT
          elif [ "$UPDATES" -gt 5 ]; then
            echo "risk=MEDIUM" >> $GITHUB_OUTPUT
          else
            echo "risk=LOW" >> $GITHUB_OUTPUT
          fi
        working-directory: infrastructure
        
      - name: Post Plan to PR
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('infrastructure/plan.txt', 'utf8');
            const risk = '${{ steps.analyze.outputs.risk }}';
            
            const riskBadge = {
              LOW: '🟢 Low Risk',
              MEDIUM: '🟡 Medium Risk', 
              HIGH: '🔴 High Risk'
            }[risk];
            
            const body = `## Terraform Plan
            
            ### Risk Assessment: ${riskBadge}
            
            | Change Type | Count |
            |-------------|-------|
            | Create | ${{ steps.analyze.outputs.creates }} |
            | Update | ${{ steps.analyze.outputs.updates }} |
            | Destroy | ${{ steps.analyze.outputs.destroys }} |
            | Replace | ${{ steps.analyze.outputs.replaces }} |
            
            <details>
            <summary>Show Plan Output</summary>
            
            \`\`\`hcl
            ${plan.substring(0, 60000)}
            \`\`\`
            
            </details>
            ${risk === 'HIGH' ? '\n⚠️ **This change includes destructive operations. Additional review required.**' : ''}
            `;
            
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body
            });
            
      - name: Save Plan Artifact
        uses: actions/upload-artifact@v4
        with:
          name: tfplan-${{ github.event.pull_request.number }}
          path: infrastructure/tfplan
          retention-days: 7

Check Ordering Matters

Notice the check dependency chain: fast checks (validate) run first, slower checks (security, plan) only run if fast checks pass. This fails fast on obvious issues and saves compute time. Security scanning and cost estimation can run in parallel once validation passes.

The Plan Comment Pattern

One of the most valuable patterns in infrastructure PRs is automatically posting the execution plan as a comment. This makes the plan visible to all reviewers directly in the PR interface, eliminating the need to run commands locally or navigate to CI systems.

What an Effective Plan Comment Shows:

Plan Comment Best Practices

•Summary statistics — Quick counts of creates, updates, deletes at the top
•Risk indicator — Visual badge or label indicating risk level
•Full plan in collapsible section — Entire plan available but not overwhelming the comment
•Highlight destructive changes — Special callout for any resource destruction
•Diff formatting — Terraform's plan output with +/- showing changes
•Update on push — Replace or update the comment when new commits are pushed

Tools That Implement This Pattern:

Atlantis — Purpose-built for Terraform PR workflows; posts plan comments and accepts atlantis apply commands
Terraform Cloud/Enterprise — Integrated plan posting in VCS integrations
GitHub Actions with terraform-pr-comment — Community actions for posting formatted plans
Spacelift — SaaS platform with native PR integration
env0 — Similar SaaS with advanced collaboration features

Example Plan Comment Flow:

Converting Mermaid diagram...

Atlantis Command Pattern

Atlantis popularized the pattern of using PR comments as commands: 'atlantis plan' to regenerate the plan, 'atlantis apply' to apply after approval. This keeps all interaction within the PR interface, creating a complete audit trail of who planned, who approved, and who applied.

Risk-Based Review Requirements

Not all infrastructure changes carry equal risk. A tweak to a development environment doesn't warrant the same scrutiny as modifying production database configuration. Effective PR workflows implement risk-based review requirements that match approval rigor to change impact.

Categorizing Change Risk:

Infrastructure Change Risk Matrix
Risk Level	Examples	Review Requirements	Approval Gates
🟢 Low	Update tags, modify dev resources, formatting fixes	1 reviewer, auto-merge eligible	Automated checks pass
🟡 Medium	Add new resources, modify staging config, update policies	2 reviewers from different teams	Checks pass + human approval
🔴 High	Modify production resources, IAM changes, network updates	Senior engineer + security review	Multiple approvals required
⛔ Critical	Delete production resources, modify encryption, change VPCs	Architecture review + management sign-off	Extended approval chain

Implementing Risk Classification:

Risk classification can be automated based on:

Path-based rules — Files in /production/ require more review than /development/
Change type detection — Terraform plan analysis: deletes/replaces = higher risk
Resource type sensitivity — IAM, KMS, VPC changes always high risk
Label-based signaling — Authors can label PRs with risk level, but automation validates
Policy engines — OPA/Sentinel rules that classify risk programmatically

CODEOWNERS

# GitHub CODEOWNERS for Infrastructure Repository
 
# Default: Platform team reviews all infrastructure
* @org/platform-team
 
# Production infrastructure requires senior approval
/infrastructure/production/ @org/platform-seniors @org/sre-oncall
 
# Security-sensitive resources need security team
/infrastructure/*/iam/ @org/security-team @org/platform-seniors
/infrastructure/*/kms/ @org/security-team @org/platform-seniors
/infrastructure/*/vpc/ @org/network-team @org/platform-seniors
 
# Database changes require DBA review
/infrastructure/*/rds/ @org/dba-team @org/platform-team
/infrastructure/*/dynamodb/ @org/dba-team @org/platform-team
 
# Development environment can have lighter review
/infrastructure/development/ @org/platform-team
 
# CI/CD configuration changes need platform team
/.github/workflows/ @org/platform-seniors

branch-protection.json
GitHub
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
  "required_pull_request_reviews": {
    "dismiss_stale_reviews": true,
    "require_code_owner_reviews": true,
    "required_approving_review_count": 2,
    "require_last_push_approval": true
  },
  "required_status_checks": {
    "strict": true,
    "checks": [
      { "context": "validate" },
      { "context": "security" },
      { "context": "plan" },
      { "context": "cost" }
    ]
  },
  "enforce_admins": true,
  "required_linear_history": true,
  "allow_force_pushes": false,
  "allow_deletions": false,
  "required_conversation_resolution": true
}

Don't Let Risk Classification Become Theater

Risk classification should reflect actual risk, not bureaucratic checkboxes. If every change is marked 'high risk,' reviewers become fatigued and rubber-stamp approvals. Calibrate your risk levels so 'high risk' truly means 'pay extra attention.'

Review Best Practices for Infrastructure PRs

Reviewing infrastructure PRs differs from reviewing application code. Reviewers must evaluate not just the code correctness but the operational impact, security implications, and blast radius of changes. Here's a structured approach to effective infrastructure reviews:

The Infrastructure PR Review Checklist:

What to Check in Every Infrastructure PR

•Understand Intent — Does the PR description clearly explain what and why? Can you understand the change without looking at code?
•Review the Plan, Not Just the Code — The plan shows actual impact. A one-line code change might cause widespread resource replacement.
•Check for Destructive Operations — Any deletes or replaces? Are they expected? Is data at risk?
•Validate Security Implications — Are ports being opened? Is access being broadened? Do security scan results look reasonable?
•Consider Blast Radius — If this goes wrong, what breaks? Is the impact contained or widespread?
•Verify Rollback Path — Can this be reverted safely? Are there one-way doors (database schema changes, deleted resources)?
•Check for Drift Introduction — Does this change make sense given current state? Or is it based on outdated assumptions?
•Review Naming and Tagging — Do resources follow naming conventions? Are cost allocation tags present?

Ineffective Review Practices

•Approving without reading the plan
•Rubber-stamping from familiar authors
•Ignoring security scan warnings
•Skipping cost impact review
•Not questioning destructive changes
•Approving 'just for dev' without scrutiny

Effective Review Practices

•Always read the plan summary first
•Ask clarifying questions liberally
•Verify security findings addressed
•Check cost estimates for surprises
•Require justification for deletes
•Apply consistent standards everywhere

Common Review Questions to Ask:

"I see this will replace the RDS instance—do we have a backup strategy?"
"This opens port 22 to 0.0.0.0/0—is this intentional for debugging, and will it be closed after?"
"The cost estimate shows $500/month increase—has this been budgeted?"
"I notice we're using instance type X—have you considered Y for this use case?"
"This change affects production. Has it been tested in staging?"

The Burden Is on the Author

A well-written PR should make the reviewer's job easy. If reviewers regularly have to ask basic clarifying questions, improve your PR templates and guidance for authors. Reviewers shouldn't have to dig for context—it should be provided upfront.

Merge Strategies and Post-Merge Flow

The merge event triggers the actual infrastructure deployment. How merges are handled affects deployment reliability, history clarity, and rollback capability.

Git Merge Strategies for Infrastructure:

Merge Strategy Comparison
Strategy	History Appearance	Rollback Ease	Best For
Merge Commit	Preserves branch history, merge commits visible	Moderate - identify merge commit to revert	Complex changes with meaningful branch history
Squash Merge	Single commit per PR, clean linear history	Easy - revert single commit	Most infrastructure changes; recommended default
Rebase Merge	Linear history, preserves individual commits	Moderate - may need to revert multiple commits	When individual commits are meaningful

Most infrastructure repositories benefit from squash merging by default:

Each PR becomes one commit, making the history scannable
Rollback is straightforward: revert one commit
Commit messages can be crafted to be descriptive of the overall change
Branch cleanup details (fixup commits, WIP) are hidden

Post-Merge Deployment Flow:

After merge, the deployment workflow activates. The key is connecting the approved plan to the actual apply:

Converting Mermaid diagram...

post-merge-apply.yaml
GitHub Actions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
name: Apply Infrastructure
 
on:
  push:
    branches: [main]
    paths:
      - 'infrastructure/**'
 
concurrency:
  group: terraform-apply
  cancel-in-progress: false
 
jobs:
  apply:
    name: Apply Infrastructure Changes
    runs-on: ubuntu-latest
    environment: production
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Get PR Number
        id: pr
        run: |
          PR_NUM=$(gh pr list --search "${{ github.sha }}" \
            --state merged --json number -q '.[0].number')
          echo "number=$PR_NUM" >> $GITHUB_OUTPUT
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          
      - name: Download Plan Artifact
        id: download
        uses: dawidd6/action-download-artifact@v2
        with:
          workflow: pr-checks.yaml
          pr: ${{ steps.pr.outputs.number }}
          name: tfplan-${{ steps.pr.outputs.number }}
          path: infrastructure/
        continue-on-error: true
          
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        
      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure
        
      - name: Validate Plan Still Applicable
        id: validate
        if: steps.download.outcome == 'success'
        run: |
          # Check if plan is still valid
          terraform show tfplan -no-color > /dev/null 2>&1
          echo "valid=true" >> $GITHUB_OUTPUT
        working-directory: infrastructure
        continue-on-error: true
        
      - name: Apply Saved Plan
        if: steps.validate.outputs.valid == 'true'
        run: |
          terraform apply -auto-approve tfplan
        working-directory: infrastructure
        
      - name: Regenerate and Apply (if saved plan invalid)
        if: steps.validate.outputs.valid != 'true'
        run: |
          echo "⚠️ Saved plan no longer valid. Regenerating..."
          terraform plan -out=tfplan
          terraform apply -auto-approve tfplan
        working-directory: infrastructure
        
      - name: Post-Apply Verification
        run: |
          # Run verification scripts
          ./scripts/verify-deployment.sh
        working-directory: infrastructure

Plan Expiration Risk

If significant time passes between PR approval and merge, the saved plan may no longer be valid (infrastructure drifted, resources changed by another party). Robust workflows either validate the plan is still applicable or require re-planning. Never blindly apply an old plan.

Handling Conflicts and Concurrent Changes

Unlike application code where conflicts are resolved through Git merge mechanics, infrastructure PRs face an additional challenge: state conflicts. Two PRs might not conflict in code but could conflict when applied because they both target the same infrastructure resources.

Types of Infrastructure Conflicts:

Infrastructure Conflict Types
Conflict Type	Example	Detection	Resolution
Git Conflict	Both PRs modify same Terraform file	Git merge conflict	Standard merge resolution
Resource Conflict	Both PRs modify same AWS resource	Second plan shows unexpected changes	Coordinate changes, rebase PR
State Lock	Two applies attempted simultaneously	State lock error	Wait for first apply, then retry
Dependency Conflict	PR A creates resource, PR B depends on it	Plan failure in PR B	Merge in dependency order

Strategies for Managing Concurrent Changes:

Require branch up-to-date — Force PRs to be rebased on latest main before merge. Ensures sequential application.
Lock during plan — Some teams lock state during the plan phase to guarantee plan matches apply.
Single apply queue — Use concurrency controls to ensure only one apply runs at a time.
Resource partitioning — Organize infrastructure so teams rarely need to modify the same resources.
Cross-PR coordination — Use PR labels or comments to flag related changes that must merge in order.

require-freshness.yaml
GitHub Actions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Ensure PR is based on latest main before allowing merge
name: Require Fresh Branch
 
on:
  pull_request:
    types: [opened, synchronize, reopened]
 
jobs:
  check-freshness:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
          
      - name: Check if branch is current
        id: check
        run: |
          git fetch origin main
          
          # Check if main is an ancestor of this branch
          if git merge-base --is-ancestor origin/main HEAD; then
            echo "✅ Branch is up to date with main"
            echo "fresh=true" >> $GITHUB_OUTPUT
          else
            echo "❌ Branch needs rebase on main"
            echo "fresh=false" >> $GITHUB_OUTPUT
          fi
          
      - name: Post status if stale
        if: steps.check.outputs.fresh == 'false'
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `⚠️ This branch is behind \`main\`. Please rebase to ensure the plan reflects current infrastructure state.
              
              \`\`\`
              git fetch origin main
              git rebase origin/main
              git push --force-with-lease
              \`\`\``
            });
            
      - name: Fail if stale
        if: steps.check.outputs.fresh == 'false'
        run: exit 1

The Monorepo Advantage

Organizations using a single infrastructure repository can enforce strict merge ordering and detect conflicts earlier. With multiple repositories, cross-repo coordination requires additional tooling—or very disciplined communication.

Emergency and Break-Glass Procedures

Despite best intentions, emergencies happen. Systems break, incidents occur, and sometimes the normal PR process is too slow. Every organization needs break-glass procedures that allow emergency changes while maintaining accountability.

When Break-Glass Is Appropriate:

Active incident requiring immediate infrastructure fix
Security vulnerability requiring emergency patch
Complete outage where normal tooling may be unavailable
Time-critical compliance requirement

Break-Glass Principles:

Emergency Change Guidelines

•Document while acting — Create a ticket/issue immediately, even if terse
•Minimize change scope — Fix the immediate problem only; save improvements for normal flow
•Notify stakeholders — Alert the team that emergency access is being used
•Audit everything — Ensure actions are logged even if reviews are bypassed
•Backfill promptly — Within 24-48 hours, create a normal PR that captures the emergency change
•Post-incident review — Analyze why normal processes didn't work fast enough

Implementation Options:

Admin bypass — Repository admins can merge without approvals (audited)
Emergency branch — Changes to emergency/ prefix auto-merge with reduced checks
Separate emergency repo — Emergency access isolated to specific repository with different rules
Manual override — Direct infrastructure access with automatic drift detection and alerting

The Backfill Requirement:

Emergency changes must be reconciled with the source of truth. If you made a change outside normal process, you must:

•Create a PR that captures the emergency change in code
•Document why emergency process was used
•Link to incident report or ticket
•Get retroactive review and approval
•Close the loop on any drift introduced

Break-Glass Is Not a Shortcut

If break-glass procedures are used frequently, it indicates the normal process is too slow or burdensome. Trending emergency bypass usage is a signal to improve the standard workflow, not to normalize working around it.

Summary: Pull Request Workflows

Pull Request workflows transform infrastructure changes from opaque operations into transparent, reviewable, and auditable processes. The key principles to remember:

Key Takeaways

•The PR is the unit of change — Every infrastructure modification flows through a documented, reviewable PR
•Automate everything possible — Validation, security scanning, plan generation, and cost estimation should run automatically
•Post plans as comments — Reviewers should see exactly what will happen without running commands
•Match review rigor to risk — Low-risk changes can have lighter review; high-risk requires scrutiny
•Review the plan, not just the code — A small code change can cause large infrastructure impact
•Use saved plans for apply — The plan reviewed must be the plan applied
•Handle concurrent changes carefully — Require rebasing, use state locks, queue applies
•Maintain emergency procedures — Break-glass for genuine emergencies, but always backfill

What's Next:

With PR workflows established, the next page covers Automated Testing for IaC—the testing strategies that catch issues before human review, from policy tests to integration tests to compliance validation.

Page Complete

You now understand how to structure Pull Request workflows for infrastructure, the automated checks that should run, how to implement risk-based reviews, and how to handle edge cases like concurrent changes and emergencies.

3 / 5

Loading learning content...

System Design (HLD)CI/CD for Infrastructure

CI/CD for Infrastructure

LevelAdvanced

Duration90 mins

TopicCI/CD for Infrastructure

3 / 5

Pull Request Workflows

Infrastructure Changes as Code Reviews

What You Will Learn

The Pull Request as the Unit of Change

What a Good Infrastructure PR Contains:

Elements of an Effective Infrastructure PR

•Intent Description — Why is this change being made? What problem does it solve or what feature does it enable?
•Configuration Changes — The actual code diff: Terraform, Kubernetes manifests, Ansible playbooks, etc.
•Execution Plan — Automated preview of exactly what resources will be created, modified, or destroyed
•Validation Results — Outputs from linters, security scanners, policy checks, and tests
•Affected Environments — Which environments will be impacted and in what order
•Rollback Procedure — How to revert if the change causes issues (often: 'revert this commit')
•Related Links — Associated tickets, design docs, runbooks, or related PRs

pr-template.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
## Infrastructure Change Request
 
### Summary
<!-- Brief description of what this change does -->
 
### Motivation
<!-- Why is this change needed? Link to ticket/issue if applicable -->
Relates to: #123
 
### Change Type
<!-- Check all that apply -->
- [ ] New resource creation
- [ ] Resource modification
- [ ] Resource deletion
- [ ] Configuration update
- [ ] Security/IAM change
 
### Environments Affected
<!-- Which environments will this change impact? -->
- [ ] Development
- [ ] Staging
- [ ] Production
 
### Risk Assessment
<!-- Estimate the risk level of this change -->
- [ ] 🟢 Low - No production impact, easily reversible
- [ ] 🟡 Medium - Limited production impact, rollback available
- [ ] 🔴 High - Significant production impact, careful review needed
 
### Pre-Deployment Checklist
- [ ] Terraform plan reviewed
- [ ] Security scan passed
- [ ] Policy checks passed
- [ ] Tested in lower environment (if applicable)
- [ ] Runbook updated (if applicable)
- [ ] Monitoring/alerting in place
 
### Rollback Plan
<!-- How will we revert if something goes wrong? -->
 
### Additional Notes
<!-- Any other context for reviewers -->

Templates Reduce Cognitive Load

Automated Checks in the PR Lifecycle

The Automated Check Pipeline:

Converting Mermaid diagram...

Automated PR Checks for Infrastructure
Check Type	Tools	What It Catches	When to Block
Syntax Validation	terraform validate, yamllint	Parse errors, invalid references, schema violations	Always - invalid syntax can't proceed
Formatting	terraform fmt, prettier	Inconsistent style, indentation issues	Usually - maintain codebase consistency
Security Scanning	tfsec, checkov, trivy	Security misconfigurations, exposed secrets, vulnerabilities	High/critical findings block; low can warn
Policy Compliance	OPA, Sentinel, Conftest	Policy violations, naming standards, required tags	Depends on policy severity
Cost Estimation	infracost, terraform plan	Unexpected cost changes, budget exceedance	Block if exceeds threshold
Plan Generation	terraform plan, pulumi preview	What will actually happen on apply	Block if plan fails

pr-checks.yaml
GitHub Actions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
name: Infrastructure PR Checks
 
on:
  pull_request:
    paths:
      - 'infrastructure/**'
      - 'modules/**'
 
permissions:
  contents: read
  pull-requests: write
  security-events: write
 
jobs:
  # Stage 1: Fast validation (seconds)
  validate:
    name: Validate Configuration
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        
      - name: Format Check
        run: terraform fmt -check -recursive -diff
        
      - name: Initialize
        run: terraform init -backend=false
        working-directory: infrastructure
        
      - name: Validate
        run: terraform validate
        working-directory: infrastructure
 
  # Stage 2: Security scanning (1-2 minutes)
  security:
    name: Security Analysis
    runs-on: ubuntu-latest
    needs: validate
    steps:
      - uses: actions/checkout@v4
      
      - name: Run tfsec
        uses: aquasecurity/tfsec-action@v1.0.0
        with:
          soft_fail: false
          
      - name: Run Checkov
        uses: bridgecrewio/checkov-action@v12
        with:
          directory: infrastructure
          framework: terraform
          output_format: cli,sarif
          output_file_path: console,checkov-results.sarif
          
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: checkov-results.sarif
 
  # Stage 3: Cost estimation
  cost:
    name: Cost Analysis
    runs-on: ubuntu-latest
    needs: validate
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Infracost
        uses: infracost/actions/setup@v2
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}
          
      - name: Generate Cost Breakdown
        run: |
          infracost breakdown --path=infrastructure \
            --format=json \
            --out-file=/tmp/infracost.json
            
      - name: Post Cost Comment
        uses: infracost/actions/comment@v1
        with:
          path: /tmp/infracost.json
          behavior: update
 
  # Stage 4: Generate and post plan
  plan:
    name: Terraform Plan
    runs-on: ubuntu-latest
    needs: [validate, security]
    environment: plan
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_PLAN_ROLE }}
          aws-region: us-west-2
          
      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure
        
      - name: Terraform Plan
        id: plan
        run: |
          terraform plan -no-color -out=tfplan 2>&1 | tee plan.txt
        working-directory: infrastructure
        
      - name: Analyze Plan
        id: analyze
        run: |
          # Count change types
          CREATES=$(grep -c "will be created" plan.txt || echo 0)
          UPDATES=$(grep -c "will be updated" plan.txt || echo 0)
          DESTROYS=$(grep -c "will be destroyed" plan.txt || echo 0)
          REPLACES=$(grep -c "must be replaced" plan.txt || echo 0)
          
          echo "creates=$CREATES" >> $GITHUB_OUTPUT
          echo "updates=$UPDATES" >> $GITHUB_OUTPUT
          echo "destroys=$DESTROYS" >> $GITHUB_OUTPUT
          echo "replaces=$REPLACES" >> $GITHUB_OUTPUT
          
          # Determine risk level
          if [ "$DESTROYS" -gt 0 ] || [ "$REPLACES" -gt 0 ]; then
            echo "risk=HIGH" >> $GITHUB_OUTPUT
          elif [ "$UPDATES" -gt 5 ]; then
            echo "risk=MEDIUM" >> $GITHUB_OUTPUT
          else
            echo "risk=LOW" >> $GITHUB_OUTPUT
          fi
        working-directory: infrastructure
        
      - name: Post Plan to PR
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('infrastructure/plan.txt', 'utf8');
            const risk = '${{ steps.analyze.outputs.risk }}';
            
            const riskBadge = {
              LOW: '🟢 Low Risk',
              MEDIUM: '🟡 Medium Risk', 
              HIGH: '🔴 High Risk'
            }[risk];
            
            const body = `## Terraform Plan
            
            ### Risk Assessment: ${riskBadge}
            
            | Change Type | Count |
            |-------------|-------|
            | Create | ${{ steps.analyze.outputs.creates }} |
            | Update | ${{ steps.analyze.outputs.updates }} |
            | Destroy | ${{ steps.analyze.outputs.destroys }} |
            | Replace | ${{ steps.analyze.outputs.replaces }} |
            
            <details>
            <summary>Show Plan Output</summary>
            
            \`\`\`hcl
            ${plan.substring(0, 60000)}
            \`\`\`
            
            </details>
            ${risk === 'HIGH' ? '\n⚠️ **This change includes destructive operations. Additional review required.**' : ''}
            `;
            
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body
            });
            
      - name: Save Plan Artifact
        uses: actions/upload-artifact@v4
        with:
          name: tfplan-${{ github.event.pull_request.number }}
          path: infrastructure/tfplan
          retention-days: 7

Check Ordering Matters

The Plan Comment Pattern

What an Effective Plan Comment Shows:

Plan Comment Best Practices

•Summary statistics — Quick counts of creates, updates, deletes at the top
•Risk indicator — Visual badge or label indicating risk level
•Full plan in collapsible section — Entire plan available but not overwhelming the comment
•Highlight destructive changes — Special callout for any resource destruction
•Diff formatting — Terraform's plan output with +/- showing changes
•Update on push — Replace or update the comment when new commits are pushed

Tools That Implement This Pattern:

Atlantis — Purpose-built for Terraform PR workflows; posts plan comments and accepts atlantis apply commands
Terraform Cloud/Enterprise — Integrated plan posting in VCS integrations
GitHub Actions with terraform-pr-comment — Community actions for posting formatted plans
Spacelift — SaaS platform with native PR integration
env0 — Similar SaaS with advanced collaboration features

Example Plan Comment Flow:

Converting Mermaid diagram...

Atlantis Command Pattern

Risk-Based Review Requirements

Categorizing Change Risk:

Infrastructure Change Risk Matrix
Risk Level	Examples	Review Requirements	Approval Gates
🟢 Low	Update tags, modify dev resources, formatting fixes	1 reviewer, auto-merge eligible	Automated checks pass
🟡 Medium	Add new resources, modify staging config, update policies	2 reviewers from different teams	Checks pass + human approval
🔴 High	Modify production resources, IAM changes, network updates	Senior engineer + security review	Multiple approvals required
⛔ Critical	Delete production resources, modify encryption, change VPCs	Architecture review + management sign-off	Extended approval chain

Implementing Risk Classification:

Risk classification can be automated based on:

Path-based rules — Files in /production/ require more review than /development/
Change type detection — Terraform plan analysis: deletes/replaces = higher risk
Resource type sensitivity — IAM, KMS, VPC changes always high risk
Label-based signaling — Authors can label PRs with risk level, but automation validates
Policy engines — OPA/Sentinel rules that classify risk programmatically

CODEOWNERS

# GitHub CODEOWNERS for Infrastructure Repository
 
# Default: Platform team reviews all infrastructure
* @org/platform-team
 
# Production infrastructure requires senior approval
/infrastructure/production/ @org/platform-seniors @org/sre-oncall
 
# Security-sensitive resources need security team
/infrastructure/*/iam/ @org/security-team @org/platform-seniors
/infrastructure/*/kms/ @org/security-team @org/platform-seniors
/infrastructure/*/vpc/ @org/network-team @org/platform-seniors
 
# Database changes require DBA review
/infrastructure/*/rds/ @org/dba-team @org/platform-team
/infrastructure/*/dynamodb/ @org/dba-team @org/platform-team
 
# Development environment can have lighter review
/infrastructure/development/ @org/platform-team
 
# CI/CD configuration changes need platform team
/.github/workflows/ @org/platform-seniors

branch-protection.json
GitHub
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
  "required_pull_request_reviews": {
    "dismiss_stale_reviews": true,
    "require_code_owner_reviews": true,
    "required_approving_review_count": 2,
    "require_last_push_approval": true
  },
  "required_status_checks": {
    "strict": true,
    "checks": [
      { "context": "validate" },
      { "context": "security" },
      { "context": "plan" },
      { "context": "cost" }
    ]
  },
  "enforce_admins": true,
  "required_linear_history": true,
  "allow_force_pushes": false,
  "allow_deletions": false,
  "required_conversation_resolution": true
}

Don't Let Risk Classification Become Theater

Review Best Practices for Infrastructure PRs

The Infrastructure PR Review Checklist:

What to Check in Every Infrastructure PR

•Understand Intent — Does the PR description clearly explain what and why? Can you understand the change without looking at code?
•Review the Plan, Not Just the Code — The plan shows actual impact. A one-line code change might cause widespread resource replacement.
•Check for Destructive Operations — Any deletes or replaces? Are they expected? Is data at risk?
•Validate Security Implications — Are ports being opened? Is access being broadened? Do security scan results look reasonable?
•Consider Blast Radius — If this goes wrong, what breaks? Is the impact contained or widespread?
•Verify Rollback Path — Can this be reverted safely? Are there one-way doors (database schema changes, deleted resources)?
•Check for Drift Introduction — Does this change make sense given current state? Or is it based on outdated assumptions?
•Review Naming and Tagging — Do resources follow naming conventions? Are cost allocation tags present?

Ineffective Review Practices

•Approving without reading the plan
•Rubber-stamping from familiar authors
•Ignoring security scan warnings
•Skipping cost impact review
•Not questioning destructive changes
•Approving 'just for dev' without scrutiny

Effective Review Practices

•Always read the plan summary first
•Ask clarifying questions liberally
•Verify security findings addressed
•Check cost estimates for surprises
•Require justification for deletes
•Apply consistent standards everywhere

Common Review Questions to Ask:

"I see this will replace the RDS instance—do we have a backup strategy?"
"This opens port 22 to 0.0.0.0/0—is this intentional for debugging, and will it be closed after?"
"The cost estimate shows $500/month increase—has this been budgeted?"
"I notice we're using instance type X—have you considered Y for this use case?"
"This change affects production. Has it been tested in staging?"

The Burden Is on the Author

Merge Strategies and Post-Merge Flow

The merge event triggers the actual infrastructure deployment. How merges are handled affects deployment reliability, history clarity, and rollback capability.

Git Merge Strategies for Infrastructure:

Merge Strategy Comparison
Strategy	History Appearance	Rollback Ease	Best For
Merge Commit	Preserves branch history, merge commits visible	Moderate - identify merge commit to revert	Complex changes with meaningful branch history
Squash Merge	Single commit per PR, clean linear history	Easy - revert single commit	Most infrastructure changes; recommended default
Rebase Merge	Linear history, preserves individual commits	Moderate - may need to revert multiple commits	When individual commits are meaningful

Most infrastructure repositories benefit from squash merging by default:

Each PR becomes one commit, making the history scannable
Rollback is straightforward: revert one commit
Commit messages can be crafted to be descriptive of the overall change
Branch cleanup details (fixup commits, WIP) are hidden

Post-Merge Deployment Flow:

After merge, the deployment workflow activates. The key is connecting the approved plan to the actual apply:

Converting Mermaid diagram...

post-merge-apply.yaml
GitHub Actions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
name: Apply Infrastructure
 
on:
  push:
    branches: [main]
    paths:
      - 'infrastructure/**'
 
concurrency:
  group: terraform-apply
  cancel-in-progress: false
 
jobs:
  apply:
    name: Apply Infrastructure Changes
    runs-on: ubuntu-latest
    environment: production
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Get PR Number
        id: pr
        run: |
          PR_NUM=$(gh pr list --search "${{ github.sha }}" \
            --state merged --json number -q '.[0].number')
          echo "number=$PR_NUM" >> $GITHUB_OUTPUT
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          
      - name: Download Plan Artifact
        id: download
        uses: dawidd6/action-download-artifact@v2
        with:
          workflow: pr-checks.yaml
          pr: ${{ steps.pr.outputs.number }}
          name: tfplan-${{ steps.pr.outputs.number }}
          path: infrastructure/
        continue-on-error: true
          
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        
      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure
        
      - name: Validate Plan Still Applicable
        id: validate
        if: steps.download.outcome == 'success'
        run: |
          # Check if plan is still valid
          terraform show tfplan -no-color > /dev/null 2>&1
          echo "valid=true" >> $GITHUB_OUTPUT
        working-directory: infrastructure
        continue-on-error: true
        
      - name: Apply Saved Plan
        if: steps.validate.outputs.valid == 'true'
        run: |
          terraform apply -auto-approve tfplan
        working-directory: infrastructure
        
      - name: Regenerate and Apply (if saved plan invalid)
        if: steps.validate.outputs.valid != 'true'
        run: |
          echo "⚠️ Saved plan no longer valid. Regenerating..."
          terraform plan -out=tfplan
          terraform apply -auto-approve tfplan
        working-directory: infrastructure
        
      - name: Post-Apply Verification
        run: |
          # Run verification scripts
          ./scripts/verify-deployment.sh
        working-directory: infrastructure

Plan Expiration Risk

Handling Conflicts and Concurrent Changes

Types of Infrastructure Conflicts:

Infrastructure Conflict Types
Conflict Type	Example	Detection	Resolution
Git Conflict	Both PRs modify same Terraform file	Git merge conflict	Standard merge resolution
Resource Conflict	Both PRs modify same AWS resource	Second plan shows unexpected changes	Coordinate changes, rebase PR
State Lock	Two applies attempted simultaneously	State lock error	Wait for first apply, then retry
Dependency Conflict	PR A creates resource, PR B depends on it	Plan failure in PR B	Merge in dependency order

Strategies for Managing Concurrent Changes:

Require branch up-to-date — Force PRs to be rebased on latest main before merge. Ensures sequential application.
Lock during plan — Some teams lock state during the plan phase to guarantee plan matches apply.
Single apply queue — Use concurrency controls to ensure only one apply runs at a time.
Resource partitioning — Organize infrastructure so teams rarely need to modify the same resources.
Cross-PR coordination — Use PR labels or comments to flag related changes that must merge in order.

require-freshness.yaml
GitHub Actions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Ensure PR is based on latest main before allowing merge
name: Require Fresh Branch
 
on:
  pull_request:
    types: [opened, synchronize, reopened]
 
jobs:
  check-freshness:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
          
      - name: Check if branch is current
        id: check
        run: |
          git fetch origin main
          
          # Check if main is an ancestor of this branch
          if git merge-base --is-ancestor origin/main HEAD; then
            echo "✅ Branch is up to date with main"
            echo "fresh=true" >> $GITHUB_OUTPUT
          else
            echo "❌ Branch needs rebase on main"
            echo "fresh=false" >> $GITHUB_OUTPUT
          fi
          
      - name: Post status if stale
        if: steps.check.outputs.fresh == 'false'
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `⚠️ This branch is behind \`main\`. Please rebase to ensure the plan reflects current infrastructure state.
              
              \`\`\`
              git fetch origin main
              git rebase origin/main
              git push --force-with-lease
              \`\`\``
            });
            
      - name: Fail if stale
        if: steps.check.outputs.fresh == 'false'
        run: exit 1

The Monorepo Advantage

Emergency and Break-Glass Procedures

When Break-Glass Is Appropriate:

Active incident requiring immediate infrastructure fix
Security vulnerability requiring emergency patch
Complete outage where normal tooling may be unavailable
Time-critical compliance requirement

Break-Glass Principles:

Emergency Change Guidelines

•Document while acting — Create a ticket/issue immediately, even if terse
•Minimize change scope — Fix the immediate problem only; save improvements for normal flow
•Notify stakeholders — Alert the team that emergency access is being used
•Audit everything — Ensure actions are logged even if reviews are bypassed
•Backfill promptly — Within 24-48 hours, create a normal PR that captures the emergency change
•Post-incident review — Analyze why normal processes didn't work fast enough

Implementation Options:

Admin bypass — Repository admins can merge without approvals (audited)
Emergency branch — Changes to emergency/ prefix auto-merge with reduced checks
Separate emergency repo — Emergency access isolated to specific repository with different rules
Manual override — Direct infrastructure access with automatic drift detection and alerting

The Backfill Requirement:

Emergency changes must be reconciled with the source of truth. If you made a change outside normal process, you must:

•Create a PR that captures the emergency change in code
•Document why emergency process was used
•Link to incident report or ticket
•Get retroactive review and approval
•Close the loop on any drift introduced

Break-Glass Is Not a Shortcut

Summary: Pull Request Workflows

Pull Request workflows transform infrastructure changes from opaque operations into transparent, reviewable, and auditable processes. The key principles to remember:

Key Takeaways

•The PR is the unit of change — Every infrastructure modification flows through a documented, reviewable PR
•Automate everything possible — Validation, security scanning, plan generation, and cost estimation should run automatically
•Post plans as comments — Reviewers should see exactly what will happen without running commands
•Match review rigor to risk — Low-risk changes can have lighter review; high-risk requires scrutiny
•Review the plan, not just the code — A small code change can cause large infrastructure impact
•Use saved plans for apply — The plan reviewed must be the plan applied
•Handle concurrent changes carefully — Require rebasing, use state locks, queue applies
•Maintain emergency procedures — Break-glass for genuine emergencies, but always backfill

What's Next:

Page Complete

3 / 5