Feature Flags - Learning Module

Loading content...

0/246

Testing with Feature Flags

The Testing Complexity Explosion

Feature flags create a testing challenge that many teams underestimate. Each boolean flag doubles the number of possible code paths. With 5 independent flags, you have 2⁵ = 32 possible states. With 10 flags, you have over 1,000 combinations. Testing every combination is impossible—but testing none of them is negligent.

The question isn't whether to test flagged code, but how to test it intelligently. This page covers strategies that maximize test coverage while keeping the test suite maintainable. We'll explore how to test both flag variants, mock flag state effectively, verify flag removal safety, and design test infrastructure that scales with your flag usage.

What You Will Learn

By the end of this page, you will understand how to structure tests for feature-flagged code, mocking strategies for flag state, testing strategies for multi-flag scenarios, integration testing approaches, and techniques for safely verifying flag removal.

The Fundamental Testing Strategy

The core principle of testing flagged code is simple: test both variants, but not all combinations.

The Test Matrix Approach

For each flag, create tests that verify:

Behavior when flag is OFF (control) — The existing/legacy behavior works correctly
Behavior when flag is ON (treatment) — The new behavior works correctly
Flag evaluation logic — The right users get the right variant
Transition behavior — What happens when a flag changes mid-operation (if applicable)

You do NOT need to test every combination of multiple flags. Instead, test each flag's behavior in isolation, then use integration tests to verify the most critical multi-flag scenarios.

basic-flag-testing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
// Basic pattern: Test both variants of a feature flag
 
describe("CheckoutService", () => {
    let checkoutService: CheckoutService;
    let mockFeatureFlags: MockFeatureFlagService;
    
    beforeEach(() => {
        mockFeatureFlags = new MockFeatureFlagService();
        checkoutService = new CheckoutService(mockFeatureFlags, ...otherDeps);
    });
    
    describe("when new-checkout-v2 flag is OFF", () => {
        beforeEach(() => {
            mockFeatureFlags.setFlag("new-checkout-v2", false);
        });
        
        it("should use legacy payment processor", async () => {
            const result = await checkoutService.processPayment(order);
            expect(result.processor).toBe("stripe-v1");
        });
        
        it("should calculate tax using legacy algorithm", async () => {
            const result = await checkoutService.calculateTotal(order);
            expect(result.taxCalculation).toBe("legacy");
        });
        
        it("should render legacy checkout UI template", async () => {
            const html = await checkoutService.renderCheckout(user);
            expect(html).toContain("checkout-legacy-form");
        });
    });
    
    describe("when new-checkout-v2 flag is ON", () => {
        beforeEach(() => {
            mockFeatureFlags.setFlag("new-checkout-v2", true);
        });
        
        it("should use new payment processor", async () => {
            const result = await checkoutService.processPayment(order);
            expect(result.processor).toBe("stripe-v2");
        });
        
        it("should calculate tax using new algorithm", async () => {
            const result = await checkoutService.calculateTotal(order);
            expect(result.taxCalculation).toBe("tax-service-v2");
        });
        
        it("should render new checkout UI template", async () => {
            const html = await checkoutService.renderCheckout(user);
            expect(html).toContain("checkout-v2-form");
        });
    });
});

Organize Tests by Flag State

Grouping tests by flag state (as shown above) makes it immediately clear what behavior is expected for each variant. When the flag is removed, you can delete the entire 'flag OFF' describe block and know exactly what tests to update.

Mocking Feature Flag State

Effective testing requires easy, reliable ways to control flag state. Never call the real flag service during unit tests—always use mocks.

The Mock Feature Flag Service

Create a mock implementation of your feature flag abstraction:

mock-feature-flag-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
// Comprehensive mock feature flag service for testing
 
export class MockFeatureFlagService implements FeatureFlagService {
    private flags = new Map<string, any>();
    private evaluationLog: FlagEvaluation[] = [];
    private contextualFlags = new Map<string, Map<string, any>>();
    
    // Simple flag setting
    setFlag(flagKey: string, value: any): void {
        this.flags.set(flagKey, value);
    }
    
    // Set multiple flags at once
    setFlags(flags: Record<string, any>): void {
        for (const [key, value] of Object.entries(flags)) {
            this.flags.set(key, value);
        }
    }
    
    // Contextual flag setting (different values for different users)
    setFlagForContext(
        flagKey: string, 
        contextMatcher: (ctx: EvaluationContext) => boolean,
        value: any
    ): void {
        // Store as a function for evaluation time
        this.contextualFlags.set(flagKey, { matcher: contextMatcher, value });
    }
    
    // Core evaluation
    isEnabled(flagKey: string, context?: EvaluationContext): boolean {
        const result = this.getValue(flagKey, context, false);
        
        // Log evaluation for assertion
        this.evaluationLog.push({
            flagKey,
            context,
            returnedValue: result,
            timestamp: new Date(),
        });
        
        return result;
    }
    
    getString(flagKey: string, context?: EvaluationContext, defaultValue = ""): string {
        return this.getValue(flagKey, context, defaultValue);
    }
    
    getNumber(flagKey: string, context?: EvaluationContext, defaultValue = 0): number {
        return this.getValue(flagKey, context, defaultValue);
    }
    
    getJSON<T>(flagKey: string, context?: EvaluationContext, defaultValue?: T): T {
        return this.getValue(flagKey, context, defaultValue);
    }
    
    private getValue<T>(flagKey: string, context?: EvaluationContext, defaultValue?: T): T {
        // Check contextual flags first
        const contextual = this.contextualFlags.get(flagKey);
        if (contextual && context && contextual.matcher(context)) {
            return contextual.value;
        }
        
        // Fall back to simple flags
        if (this.flags.has(flagKey)) {
            return this.flags.get(flagKey);
        }
        
        // Return default
        return defaultValue as T;
    }
    
    // Test utilities
    reset(): void {
        this.flags.clear();
        this.contextualFlags.clear();
        this.evaluationLog = [];
    }
    
    getEvaluationLog(): FlagEvaluation[] {
        return [...this.evaluationLog];
    }
    
    assertFlagWasEvaluated(flagKey: string): void {
        const found = this.evaluationLog.some(e => e.flagKey === flagKey);
        if (!found) {
            throw new Error(`Expected flag "${flagKey}" to be evaluated, but it was not`);
        }
    }
    
    assertFlagWasNotEvaluated(flagKey: string): void {
        const found = this.evaluationLog.some(e => e.flagKey === flagKey);
        if (found) {
            throw new Error(`Expected flag "${flagKey}" to NOT be evaluated, but it was`);
        }
    }
}
 
interface FlagEvaluation {
    flagKey: string;
    context?: EvaluationContext;
    returnedValue: any;
    timestamp: Date;
}

Test Helpers for Common Patterns

Create helper functions that make setting up flag states ergonomic:

flag-test-helpers.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
// Test helpers for feature flag testing
 
// Helper to run a test with specific flag state
export function withFlags(
    flagService: MockFeatureFlagService,
    flags: Record<string, any>,
    testFn: () => Promise<void> | void
): () => Promise<void> {
    return async () => {
        const previousFlags = new Map(flagService['flags']);
        try {
            flagService.setFlags(flags);
            await testFn();
        } finally {
            flagService['flags'] = previousFlags;
        }
    };
}
 
// Usage
it("should handle premium features when enabled", 
    withFlags(mockFlags, { "premium-features": true }, async () => {
        const result = await service.getPricingTier(user);
        expect(result).toBe("premium");
    })
);
 
// Parameterized testing for both flag states
export function testBothFlagStates(
    description: string,
    flagKey: string,
    testCases: {
        whenOff: (flagService: MockFeatureFlagService) => void;
        whenOn: (flagService: MockFeatureFlagService) => void;
    }
): void {
    describe(description, () => {
        describe(`when ${flagKey} is OFF`, () => {
            beforeEach(() => {
                mockFeatureFlags.setFlag(flagKey, false);
            });
            testCases.whenOff(mockFeatureFlags);
        });
        
        describe(`when ${flagKey} is ON`, () => {
            beforeEach(() => {
                mockFeatureFlags.setFlag(flagKey, true);
            });
            testCases.whenOn(mockFeatureFlags);
        });
    });
}
 
// Usage
testBothFlagStates("payment processing", "stripe-v2-integration", {
    whenOff: (flags) => {
        it("uses legacy Stripe API", async () => { /* ... */ });
        it("handles legacy error codes", async () => { /* ... */ });
    },
    whenOn: (flags) => {
        it("uses new Stripe API", async () => { /* ... */ });
        it("handles new error codes", async () => { /* ... */ });
    },
});

Testing Multi-Flag Scenarios

When multiple flags interact, testing every combination is impractical. Instead, use strategic sampling:

Strategy 1: Identify Critical Combinations

Only test combinations that represent real production scenarios or known interaction points:

multi-flag-testing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
// Strategic multi-flag testing
 
describe("Checkout with multiple flags", () => {
    // Define the critical combinations to test
    const criticalCombinations = [
        // Baseline: all flags off (current production behavior)
        {
            name: "all legacy systems",
            flags: {
                "new-checkout-ui": false,
                "stripe-v2": false,
                "dynamic-pricing": false,
            },
        },
        // Target: all flags on (full new experience)
        {
            name: "all new systems",
            flags: {
                "new-checkout-ui": true,
                "stripe-v2": true,
                "dynamic-pricing": true,
            },
        },
        // Known interaction: new UI with legacy payment
        {
            name: "new UI with legacy payment",
            flags: {
                "new-checkout-ui": true,
                "stripe-v2": false,
                "dynamic-pricing": false,
            },
        },
        // Risky combination identified in design review
        {
            name: "dynamic pricing with legacy payment",
            flags: {
                "new-checkout-ui": false,
                "stripe-v2": false,
                "dynamic-pricing": true,
            },
        },
    ];
    
    criticalCombinations.forEach(({ name, flags }) => {
        describe(`with ${name}`, () => {
            beforeEach(() => {
                mockFeatureFlags.setFlags(flags);
            });
            
            it("should complete checkout successfully", async () => {
                const result = await checkoutService.completeCheckout(order);
                expect(result.success).toBe(true);
            });
            
            it("should calculate correct total", async () => {
                const result = await checkoutService.calculateTotal(order);
                expect(result.total).toBeGreaterThan(0);
            });
            
            it("should send correct confirmation email", async () => {
                await checkoutService.completeCheckout(order);
                expect(mockEmailService.sentEmails).toHaveLength(1);
            });
        });
    });
});

Strategy 2: Pairwise Testing

Instead of testing all combinations, use pairwise testing which covers all pairs of flag values. This drastically reduces test cases while catching most interaction bugs:

pairwise-flag-testing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Pairwise testing for multi-flag scenarios
// With 4 flags and 2 values each, full coverage = 16 tests
// Pairwise coverage = 6 tests (covering all pairs)
 
const pairwiseCombinations = [
    { flagA: true,  flagB: true,  flagC: true,  flagD: true  },
    { flagA: true,  flagB: false, flagC: false, flagD: false },
    { flagA: false, flagB: true,  flagC: false, flagD: true  },
    { flagA: false, flagB: false, flagC: true,  flagD: false },
    { flagA: true,  flagB: true,  flagC: false, flagD: false },
    { flagA: false, flagB: false, flagC: true,  flagD: true  },
];
 
// Tools like PICT (Microsoft) or jenny can generate these automatically
// For complex flag sets, use:
// https://www.pairwise.org/tools.html
 
pairwiseCombinations.forEach((flags, index) => {
    describe(`pairwise combination ${index + 1}`, () => {
        beforeEach(() => {
            mockFeatureFlags.setFlags({
                "feature-a": flags.flagA,
                "feature-b": flags.flagB,
                "feature-c": flags.flagC,
                "feature-d": flags.flagD,
            });
        });
        
        it("should not throw errors during processing", async () => {
            await expect(service.process()).resolves.not.toThrow();
        });
        
        it("should maintain data consistency", async () => {
            const result = await service.process();
            expect(invariantHolds(result)).toBe(true);
        });
    });
});

Avoid Testing All Combinations

Resist the temptation to test every flag combination. With 10 flags, that's over 1,000 test cases. Instead, test: (1) all-off state, (2) all-on state, (3) each flag individually, (4) known risky combinations, (5) pairwise coverage if needed. This provides 80% of the value with 5% of the tests.

Integration Testing with Flags

Integration tests verify that flag-controlled behavior works correctly in end-to-end scenarios. The challenge is controlling flag state in environments with real flag services.

Pattern 1: Test Environment Flag Overrides

Most flag platforms support test-specific overrides:

integration-test-flags.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// Integration test with flag overrides
 
describe("Checkout Integration Tests", () => {
    let app: TestApp;
    
    beforeAll(async () => {
        app = await createTestApp({
            // Connect to test flag environment
            flagEnvironment: "integration-test",
        });
    });
    
    describe("new checkout flow", () => {
        beforeEach(async () => {
            // Override flag for this test session
            await app.flagService.overrideFlag({
                flagKey: "new-checkout-v2",
                value: true,
                scope: "test-session",
                sessionId: currentTestSessionId,
            });
        });
        
        afterEach(async () => {
            // Clear overrides
            await app.flagService.clearOverrides(currentTestSessionId);
        });
        
        it("should complete checkout through new flow", async () => {
            const response = await app.request()
                .post("/api/checkout")
                .set("X-Test-Session-Id", currentTestSessionId)
                .send(validOrder);
            
            expect(response.status).toBe(200);
            expect(response.body.checkoutVersion).toBe("v2");
        });
    });
});
 
// Alternative: Header-based flag overrides for E2E tests
describe("E2E Checkout Tests", () => {
    it("should use new checkout when override header is set", async () => {
        const response = await fetch("/checkout", {
            headers: {
                // Special header recognized by flag service
                "X-Feature-Override": "new-checkout-v2=true",
            },
        });
        
        const html = await response.text();
        expect(html).toContain("checkout-v2-container");
    });
});

Pattern 2: Test User Targeting

Use dedicated test users that are targeted by flags:

test-user-targeting.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// Test users with specific flag targeting
 
// In flag configuration
const newCheckoutFlag = {
    key: "new-checkout-v2",
    rules: [
        // Integration test users always get new checkout
        {
            type: "attribute",
            attribute: "email",
            operator: "matches",
            value: ".*@test\.internal\.company\.com$",
            return: true,
        },
        // Production users get gradual rollout
        {
            type: "percentage",
            percentage: 25,
        },
    ],
};
 
// In integration tests
describe("Checkout Integration with Test Users", () => {
    const testUser = {
        email: "checkout-v2-tester@test.internal.company.com",
        id: "test-user-123",
    };
    
    it("should always get new checkout for test user", async () => {
        const response = await api
            .post("/api/checkout")
            .authenticate(testUser)
            .send(validOrder);
        
        expect(response.body.checkoutVersion).toBe("v2");
    });
});
 
// Dedicated test user pools per flag variant
const testUserPools = {
    "new-checkout-v2": {
        enabled: [
            "checkout-v2-enabled-1@test.internal.company.com",
            "checkout-v2-enabled-2@test.internal.company.com",
        ],
        disabled: [
            "checkout-v2-disabled-1@test.internal.company.com",
            "checkout-v2-disabled-2@test.internal.company.com",
        ],
    },
};

Isolate Test Flag State

Integration tests can interfere with each other if they modify shared flag state. Use test-session-scoped overrides, dedicated test users, or reset flag state between tests. Flaky tests are often caused by leaked flag state.

Testing Flag Removal Safety

The most dangerous moment in a flag's lifecycle is removal. The code changes, dead code paths are deleted, and there's always the chance something was depending on that "dead" code. Here's how to test flag removal safely.

Strategy 1: Pre-Removal Verification

Before removing the flag from code, verify it's safe:

pre-removal-verification.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
// Pre-removal verification checklist (automated)
 
interface FlagRemovalSafetyCheck {
    flagKey: string;
    checks: SafetyCheck[];
}
 
const safetyChecks: SafetyCheck[] = [
    {
        name: "Flag always returns expected value",
        check: async (flagKey: string) => {
            const stats = await analytics.getFlagStats(flagKey, {
                timeRange: days(30),
            });
            
            // Should always return true if we're removing the "off" path
            const expectedValue = true;
            const alwaysReturnsExpected = stats.returnedValues.every(
                v => v.value === expectedValue
            );
            
            return {
                passed: alwaysReturnsExpected,
                message: alwaysReturnsExpected
                    ? "Flag has returned consistent value for 30 days"
                    : `Flag returned unexpected values: ${JSON.stringify(stats.returnedValues)}`,
            };
        },
    },
    {
        name: "No error rate increase when enabled",
        check: async (flagKey: string) => {
            const metrics = await observability.getErrorMetrics({
                flagKey,
                compareEnabledVsDisabled: true,
                timeRange: days(14),
            });
            
            const errorRateIncrease = metrics.enabledErrorRate - metrics.disabledErrorRate;
            const threshold = 0.01; // 1% increase threshold
            
            return {
                passed: errorRateIncrease < threshold,
                message: errorRateIncrease < threshold
                    ? "Error rate is stable when flag is enabled"
                    : `Error rate increased by ${(errorRateIncrease * 100).toFixed(2)}% when enabled`,
            };
        },
    },
    {
        name: "All tests pass with flag hardcoded",
        check: async (flagKey: string) => {
            // Run test suite with flag forced to target value
            const testResult = await ci.runTestsWithFlagOverride({
                [flagKey]: true,  // Hardcode to value we're keeping
            });
            
            return {
                passed: testResult.failed === 0,
                message: testResult.failed === 0
                    ? "All tests pass with flag hardcoded to target value"
                    : `${testResult.failed} tests failed`,
            };
        },
    },
    {
        name: "No code references to disabled path",
        check: async (flagKey: string) => {
            const references = await codeSearch.findFlagReferences(flagKey);
            const disabledPathRefs = references.filter(
                ref => ref.isDisabledPath
            );
            
            return {
                passed: disabledPathRefs.length === 0,
                message: disabledPathRefs.length === 0
                    ? "No active code depends on disabled path"
                    : `Found ${disabledPathRefs.length} references that may depend on disabled path`,
            };
        },
    },
];
 
// Run before creating removal PR
async function verifyFlagRemovalSafety(flagKey: string): Promise<SafetyReport> {
    const results = [];
    
    for (const check of safetyChecks) {
        const result = await check.check(flagKey);
        results.push({ name: check.name, ...result });
    }
    
    const allPassed = results.every(r => r.passed);
    
    return {
        flagKey,
        safe: allPassed,
        checks: results,
        recommendation: allPassed
            ? "Safe to remove"
            : "Address failed checks before removal",
    };
}

Strategy 2: Canary Removal

Remove the flag from a small percentage of traffic first, then expand:

canary-flag-removal.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// Canary flag removal process
 
/**
 * Instead of removing the flag entirely, first hardcode 
 * the value for a percentage of traffic
 */
 
// Step 1: Modify flag to hardcode for 5% of users
const checkoutFlag = {
    key: "new-checkout-v2",
    rules: [
        // 5% "canary" - behave as if flag is removed (always true)
        {
            type: "percentage",
            percentage: 5,
            return: true,
            variant: "canary-removal",
        },
        // Remaining 95% - normal flag evaluation
        {
            type: "percentage", 
            percentage: 100,
            return: featureFlags.isEnabled("new-checkout-v2"),
            variant: "normal",
        },
    ],
};
 
// Step 2: Monitor canary group for issues
async function monitorCanaryRemoval(flagKey: string): Promise<void> {
    const metrics = await observability.compareVariants({
        flagKey,
        variants: ["canary-removal", "normal"],
        metrics: ["error_rate", "latency_p99", "conversion_rate"],
        alertThreshold: 0.05,  // 5% degradation triggers alert
    });
    
    if (metrics.degradation > 0.05) {
        await alerting.sendCritical({
            message: `Flag removal canary showing degradation: ${flagKey}`,
            details: metrics,
        });
    }
}
 
// Step 3: Gradually increase canary percentage
// 5% → 25% → 50% → 100% → remove code
 
// Step 4: Only after 100% canary is stable, remove the flag code

Flag Removal as a Feature Flag

Ironically, you can use a meta-flag to control flag removal. Create a 'remove-checkout-v2-flag' flag that, when enabled, causes the checkout-v2 flag to behave as if it's always enabled. This gives you instant rollback capability during the removal process.

Testing Flag Targeting Logic

Beyond testing behavior, you need to verify that the right users get the right flag values. Targeting rule bugs can expose features to wrong audiences or exclude intended users.

targeting-logic-tests.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
// Testing flag targeting logic
 
describe("new-checkout-v2 targeting rules", () => {
    const flagKey = "new-checkout-v2";
    
    describe("internal user targeting", () => {
        it("should enable for internal users", () => {
            const result = featureFlags.isEnabled(flagKey, {
                userId: "user-123",
                email: "engineer@company.com",
            });
            expect(result).toBe(true);
        });
        
        it("should not auto-enable for external users with similar domains", () => {
            const result = featureFlags.isEnabled(flagKey, {
                userId: "user-456",
                email: "hacker@malicious-company.com",
            });
            // Should fall through to percentage targeting
            expect(result).toBeDefined();
        });
    });
    
    describe("percentage rollout", () => {
        it("should be deterministic for same user", () => {
            const context = { userId: "consistent-user-123" };
            
            // Run multiple times - should always return same value
            const results = Array(100).fill(null).map(() =>
                featureFlags.isEnabled(flagKey, context)
            );
            
            const uniqueResults = [...new Set(results)];
            expect(uniqueResults).toHaveLength(1); // All same value
        });
        
        it("should match expected rollout percentage approximately", () => {
            // Generate many unique users and check distribution
            const sampleSize = 10000;
            const users = Array(sampleSize).fill(null).map((_, i) => ({
                userId: `test-user-${i}`,
            }));
            
            const enabledCount = users.filter(user =>
                featureFlags.isEnabled(flagKey, user)
            ).length;
            
            const actualPercentage = enabledCount / sampleSize;
            const expectedPercentage = 0.25; // 25% rollout
            const tolerance = 0.03; // ±3%
            
            expect(actualPercentage).toBeGreaterThan(expectedPercentage - tolerance);
            expect(actualPercentage).toBeLessThan(expectedPercentage + tolerance);
        });
    });
    
    describe("segment targeting", () => {
        it("should enable for premium tier users", () => {
            const result = featureFlags.isEnabled(flagKey, {
                userId: "premium-user",
                subscriptionTier: "premium",
            });
            expect(result).toBe(true);
        });
        
        it("should not enable for free tier users (outside rollout)", () => {
            // Use user ID known to be outside rollout percentage
            const result = featureFlags.isEnabled(flagKey, {
                userId: "free-user-outside-rollout",
                subscriptionTier: "free",
            });
            expect(result).toBe(false);
        });
    });
    
    describe("geographic targeting", () => {
        it("should enable for US users", () => {
            const result = featureFlags.isEnabled(flagKey, {
                userId: "us-user",
                country: "US",
            });
            expect(result).toBe(true);
        });
        
        it("should not enable for EU users yet", () => {
            const result = featureFlags.isEnabled(flagKey, {
                userId: "eu-user",
                country: "DE",
            });
            expect(result).toBe(false);
        });
    });
});

Test Targeting Rules Separately

Targeting rule tests are distinct from behavior tests. Behavior tests verify what happens when the flag is on/off. Targeting tests verify who gets which value. Both are necessary for complete coverage.

Test Infrastructure for Flags

As flag usage grows, invest in test infrastructure that makes flag testing frictionless:

Test Infrastructure Components

•Mock flag service — Stateful mock that's injected via DI, with assertion helpers
•Flag fixture files — Predefined flag configurations for common test scenarios
•Test helpers — withFlags(), testBothFlagStates(), assertFlagEvaluated()
•CI integration — Run tests with all flags off, all flags on, and production configuration
•Flag coverage reporting — Track which flag+variant combinations have test coverage
•Automatic flag state detection — Warn when tests don't explicitly set flag state

flag-test-infrastructure.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
// Comprehensive flag test infrastructure
 
// 1. Jest/Vitest setup with automatic flag mocking
// jest.setup.ts
import { mockFeatureFlagService } from "./test-utils/mock-flags";
 
beforeEach(() => {
    // Reset flags before each test
    mockFeatureFlagService.reset();
    
    // Warn if test doesn't set flag state explicitly
    jest.spyOn(console, "warn").mockImplementation((msg) => {
        if (msg.includes("Flag evaluated without explicit test state")) {
            throw new Error(msg);
        }
    });
});
 
// 2. Flag coverage tracking
class FlagCoverageTracker {
    private covered = new Map<string, Set<any>>();
    
    recordEvaluation(flagKey: string, returnedValue: any): void {
        if (!this.covered.has(flagKey)) {
            this.covered.set(flagKey, new Set());
        }
        this.covered.get(flagKey)!.add(returnedValue);
    }
    
    getReport(): FlagCoverageReport {
        const registeredFlags = flagRegistry.getAllFlags();
        
        return registeredFlags.map(flag => {
            const testedValues = this.covered.get(flag.key) || new Set();
            const expectedValues = flag.possibleValues;
            
            return {
                flagKey: flag.key,
                testedValues: [...testedValues],
                expectedValues,
                coverage: testedValues.size / expectedValues.length,
                missing: expectedValues.filter(v => !testedValues.has(v)),
            };
        });
    }
}
 
// 3. CI configuration for matrix testing
// .github/workflows/test.yml
`
name: Tests with Flag Matrix
 
jobs:
  test:
    strategy:
      matrix:
        flag-state: [all-off, all-on, production-config]
    
    steps:
      - run: npm test
        env:
          FLAG_TEST_MODE: ${{ matrix.flag-state }}
`;
 
// 4. Flag state determinism checking
export function ensureDeterministicFlagState(testFn: () => void): void {
    // Run test twice and verify flag evaluations are identical
    const run1 = captureEvaluations(testFn);
    const run2 = captureEvaluations(testFn);
    
    expect(run1).toEqual(run2);
}

Summary: Testing Feature-Flagged Code

We've covered comprehensive strategies for testing feature-flagged code. Let's consolidate:

Key Takeaways

•Test both variants — Each flag needs tests for both on and off states
•Mock extensively — Use mock flag services with assertion helpers
•Avoid combinatorial explosion — Use critical combinations and pairwise testing for multi-flag scenarios
•Integration test with overrides — Use session-scoped overrides or test users for E2E tests
•Verify removal safety — Pre-removal checks, canary removal, and gradual rollout of code changes
•Test targeting — Verify the right users get the right variants
•Invest in infrastructure — Mock services, helpers, coverage tracking, and CI integration

Module Complete:

You now have a comprehensive understanding of feature flags: what they are, how to design them, how to manage their lifecycle, and how to test them effectively. Feature flags are a powerful tool for modern software delivery, enabling safer deployments, experimentation, and operational control. With the patterns and practices covered in this module, you can adopt flags confidently while avoiding the common pitfalls that lead to technical debt.

Module Complete

Congratulations! You've completed the Feature Flags module. You now understand the complete feature flag lifecycle: from creation and design patterns, through governance and lifecycle management, to comprehensive testing strategies. Apply these practices to enjoy the benefits of flags—progressive rollouts, instant rollback, experimentation—without accumulating crippling technical debt.