Loading learning content...
Testing microservices requires running multiple services, databases, message brokers, and supporting infrastructure. Unlike monolithic applications where a single process can be tested in isolation, microservices demand sophisticated environment strategies. The difference between teams that ship confidently and teams that fear every deployment often comes down to their test environment practices.
Poor environment management manifests as: 'It works on my machine' failures, tests that pass locally but fail in CI, staging environments that drift from production, and hours spent debugging environment configuration instead of actual bugs. Mastering test environments is essential infrastructure for microservices success.
By the end of this page, you will understand how to design a comprehensive test environment strategy for microservices, including local development environments, CI/CD environments, ephemeral preview environments, and production testing approaches. You'll learn patterns for environment parity, data management, and infrastructure as code.
Microservices testing requires multiple environment types, each serving different purposes and making different trade-offs between fidelity, cost, and speed.
The Environment Hierarchy:
| Environment | Purpose | Fidelity | Cost | Speed | Isolation |
|---|---|---|---|---|---|
| Local | Developer iteration | Low-Medium | Free | Instant | Complete |
| CI | Automated verification | Medium | Low | Minutes | Complete |
| Preview/Ephemeral | PR validation | High | Medium | Minutes | Per PR |
| Staging | Release validation | High | Medium-High | Always on | Shared |
| Production | Live verification | Perfect | High | Always on | Careful design |
Environment Fidelity Spectrum:
Low Fidelity: Single service running locally with mocked dependencies. Fast, cheap, but may miss integration issues.
Medium Fidelity: Multiple services running locally with real databases but simplified infrastructure. Good for most integration testing.
High Fidelity: Complete system running in cloud with near-production configuration. Required for E2E and release validation.
Production Fidelity: Actual production environment, possibly with feature flags or traffic mirroring. Only way to verify production-specific issues.
The key insight: you need all of these, not just one. Each environment type catches different categories of bugs at different stages of development.
The closer your test environment matches production, the more representative your tests are—but the more expensive and slower they become. Strive for maximum parity where it matters (infrastructure, configuration, data shapes) and acceptable differences where it doesn't (scale, geographic distribution, real user data).
The local development environment is where developers spend most of their time. A well-designed local setup enables fast iteration, realistic testing, and minimal context-switching to remote environments.
Goals for Local Environment:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146
# docker-compose.yml - Local development environmentversion: '3.8' # Profiles allow selective startup: docker compose --profile web upservices: # Shared infrastructure - always starts postgres: image: postgres:15-alpine ports: - "5432:5432" environment: POSTGRES_PASSWORD: postgres POSTGRES_USER: postgres volumes: - postgres_data:/var/lib/postgresql/data - ./scripts/init-dbs.sql:/docker-entrypoint-initdb.d/init.sql healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres"] interval: 5s timeout: 5s retries: 5 redis: image: redis:7-alpine ports: - "6379:6379" healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 5s timeout: 5s retries: 5 kafka: image: confluentinc/cp-kafka:7.4.0 ports: - "9092:9092" environment: KAFKA_NODE_ID: 1 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,HOST:PLAINTEXT KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:29092,CONTROLLER://0.0.0.0:9093,HOST://0.0.0.0:9092 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,HOST://localhost:9092 KAFKA_PROCESS_ROLES: broker,controller KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093 KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER CLUSTER_ID: 'local-dev-cluster-001' KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true' healthcheck: test: ["CMD-SHELL", "kafka-broker-api-versions --bootstrap-server localhost:29092"] interval: 10s timeout: 10s retries: 10 # Optional: Kafka UI for debugging kafka-ui: image: provectuslabs/kafka-ui:latest profiles: ["debug"] ports: - "8080:8080" environment: KAFKA_CLUSTERS_0_NAME: local KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:29092 # Services - selectively started based on needs order-service: build: context: ./services/order target: development profiles: ["order", "web", "all"] ports: - "3001:3000" environment: NODE_ENV: development DATABASE_URL: postgresql://postgres:postgres@postgres:5432/orders KAFKA_BROKERS: kafka:29092 REDIS_URL: redis://redis:6379 USER_SERVICE_URL: http://user-service:3000 PRODUCT_SERVICE_URL: http://product-service:3000 volumes: - ./services/order/src:/app/src depends_on: postgres: condition: service_healthy kafka: condition: service_healthy command: npm run dev user-service: build: context: ./services/user target: development profiles: ["user", "web", "all"] ports: - "3002:3000" environment: NODE_ENV: development DATABASE_URL: postgresql://postgres:postgres@postgres:5432/users REDIS_URL: redis://redis:6379 volumes: - ./services/user/src:/app/src depends_on: postgres: condition: service_healthy command: npm run dev product-service: build: context: ./services/product target: development profiles: ["product", "web", "all"] ports: - "3003:3000" environment: NODE_ENV: development DATABASE_URL: postgresql://postgres:postgres@postgres:5432/products ELASTICSEARCH_URL: http://elasticsearch:9200 volumes: - ./services/product/src:/app/src depends_on: postgres: condition: service_healthy command: npm run dev # Optional: Only for product search development elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:8.9.0 profiles: ["product", "search", "all"] ports: - "9200:9200" environment: - discovery.type=single-node - xpack.security.enabled=false - "ES_JAVA_OPTS=-Xms512m -Xmx512m" healthcheck: test: ["CMD-SHELL", "curl -sf http://localhost:9200/_cluster/health"] interval: 10s timeout: 10s retries: 10 volumes: postgres_data: # Database initialization script# scripts/init-dbs.sql# CREATE DATABASE orders;# CREATE DATABASE users;# CREATE DATABASE products;Notice the volume mounts for source code. This enables hot reloading—code changes reflect immediately without container rebuilds. Combined with watch mode (npm run dev), developers can iterate on code changes in seconds, not minutes. This speed is essential for maintaining flow state during development.
CI/CD environments run automated tests on every commit and deploy validated changes. They must be reproducible, fast to provision, and completely isolated between runs.
CI Environment Requirements:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206
# .github/workflows/ci.yml - CI pipeline with proper environment managementname: CI on: push: branches: [main, develop] pull_request: branches: [main] env: # Use consistent versions across all jobs NODE_VERSION: '20' DOCKER_BUILDKIT: 1 COMPOSE_DOCKER_CLI_BUILD: 1 jobs: # Build and push images first (once) build: runs-on: ubuntu-latest outputs: image_tag: ${{ steps.meta.outputs.tags }} steps: - uses: actions/checkout@v4 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Login to Container Registry uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract metadata id: meta uses: docker/metadata-action@v5 with: images: ghcr.io/${{ github.repository }} tags: | type=sha,prefix= type=ref,event=pr # Build all service images in parallel - name: Build and push images run: | for service in order user product; do docker buildx build \ --push \ --cache-from type=gha,scope=$service \ --cache-to type=gha,mode=max,scope=$service \ -t ghcr.io/${{ github.repository }}/$service:${{ github.sha }} \ -f services/$service/Dockerfile \ services/$service & done wait # Unit and integration tests (parallel per service) test: needs: build runs-on: ubuntu-latest strategy: fail-fast: false matrix: service: [order, user, product] services: # GitHub Actions native service containers postgres: image: postgres:15 env: POSTGRES_PASSWORD: test POSTGRES_USER: test POSTGRES_DB: test ports: - 5432:5432 options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 redis: image: redis:7 ports: - 6379:6379 options: >- --health-cmd "redis-cli ping" --health-interval 10s --health-timeout 5s --health-retries 5 steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: ${{ env.NODE_VERSION }} cache: 'npm' cache-dependency-path: services/${{ matrix.service }}/package-lock.json - name: Install dependencies working-directory: services/${{ matrix.service }} run: npm ci - name: Run migrations working-directory: services/${{ matrix.service }} run: npm run db:migrate env: DATABASE_URL: postgresql://test:test@localhost:5432/test - name: Run unit tests working-directory: services/${{ matrix.service }} run: npm run test:unit -- --coverage - name: Run integration tests working-directory: services/${{ matrix.service }} run: npm run test:integration env: DATABASE_URL: postgresql://test:test@localhost:5432/test REDIS_URL: redis://localhost:6379 - name: Upload coverage uses: codecov/codecov-action@v3 with: files: services/${{ matrix.service }}/coverage/lcov.info flags: ${{ matrix.service }} # Contract tests after unit/integration contract-tests: needs: [build, test] runs-on: ubuntu-latest strategy: matrix: service: [order, user, product] steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: ${{ env.NODE_VERSION }} - name: Install dependencies working-directory: services/${{ matrix.service }} run: npm ci - name: Run consumer contract tests working-directory: services/${{ matrix.service }} run: npm run test:pact:consumer - name: Publish pacts if: github.event_name == 'push' working-directory: services/${{ matrix.service }} run: npm run pact:publish env: PACT_BROKER_BASE_URL: ${{ secrets.PACT_BROKER_URL }} PACT_BROKER_TOKEN: ${{ secrets.PACT_BROKER_TOKEN }} GIT_COMMIT: ${{ github.sha }} GIT_BRANCH: ${{ github.ref_name }} # E2E tests with full environment e2e: needs: [build, test, contract-tests] runs-on: ubuntu-latest timeout-minutes: 30 steps: - uses: actions/checkout@v4 - name: Create Docker network run: docker network create e2e-network - name: Start infrastructure run: | docker compose -f docker-compose.ci.yml up -d postgres redis kafka ./scripts/wait-healthy.sh - name: Start services run: | docker compose -f docker-compose.ci.yml up -d order-service user-service product-service gateway ./scripts/wait-for-services.sh env: IMAGE_TAG: ${{ github.sha }} - name: Run E2E tests run: npx playwright test env: E2E_BASE_URL: http://localhost - name: Collect logs on failure if: failure() run: docker compose -f docker-compose.ci.yml logs > docker-logs.txt - name: Upload artifacts if: always() uses: actions/upload-artifact@v4 with: name: e2e-results path: | playwright-report/ test-results/ docker-logs.txtGitHub Actions, GitLab CI, and other platforms offer native service containers—infrastructure that runs alongside your job. These are faster to start than Docker Compose and provide automatic health checking. Use service containers for simple databases and caches; use Docker Compose for complex multi-service setups.
Ephemeral preview environments—also called review apps or PR environments—provide isolated, full-stack environments for each pull request. They enable realistic testing of changes before merge without polluting shared environments.
Benefits of Preview Environments:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155
# Preview environment deployment with Kubernetes# .github/workflows/preview.ymlname: Preview Environment on: pull_request: types: [opened, synchronize, reopened, closed] env: PREVIEW_NAMESPACE: preview-pr-${{ github.event.pull_request.number }} jobs: deploy-preview: if: github.event.action != 'closed' runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Configure kubectl uses: azure/k8s-set-context@v3 with: kubeconfig: ${{ secrets.KUBE_CONFIG }} - name: Create namespace run: | kubectl create namespace ${{ env.PREVIEW_NAMESPACE }} --dry-run=client -o yaml | kubectl apply -f - # Label for automatic cleanup kubectl label namespace ${{ env.PREVIEW_NAMESPACE }} \ preview=true \ pr-number=${{ github.event.pull_request.number }} \ created-at=$(date +%s) \ --overwrite - name: Build and push images run: | for service in order user product gateway; do docker build -t ${{ secrets.REGISTRY }}/$service:pr-${{ github.event.pull_request.number }} \ services/$service docker push ${{ secrets.REGISTRY }}/$service:pr-${{ github.event.pull_request.number }} done - name: Deploy infrastructure run: | helm upgrade --install infra ./charts/preview-infra \ --namespace ${{ env.PREVIEW_NAMESPACE }} \ --set postgresql.postgresPassword=${{ secrets.PREVIEW_DB_PASSWORD }} \ --wait - name: Run migrations run: | for service in order user product; do kubectl run migrate-$service \ --namespace ${{ env.PREVIEW_NAMESPACE }} \ --image ${{ secrets.REGISTRY }}/$service:pr-${{ github.event.pull_request.number }} \ --restart=Never \ --command -- npm run db:migrate kubectl wait --for=condition=complete job/migrate-$service \ --namespace ${{ env.PREVIEW_NAMESPACE }} \ --timeout=120s done - name: Deploy services run: | helm upgrade --install services ./charts/microservices \ --namespace ${{ env.PREVIEW_NAMESPACE }} \ --set image.tag=pr-${{ github.event.pull_request.number }} \ --set ingress.host=pr-${{ github.event.pull_request.number }}.${{ secrets.PREVIEW_DOMAIN }} \ --wait - name: Run smoke tests run: | export E2E_BASE_URL=https://pr-${{ github.event.pull_request.number }}.${{ secrets.PREVIEW_DOMAIN }} npx playwright test --grep @smoke - name: Comment on PR uses: actions/github-script@v7 with: script: | github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: `## 🚀 Preview Environment Ready **URL:** https://pr-${context.issue.number}.${{ secrets.PREVIEW_DOMAIN }} This environment will be automatically destroyed when the PR is closed. **Services deployed:** - Order Service - User Service - Product Service - API Gateway **Smoke tests:** ✅ Passed` }) # Cleanup when PR is closed cleanup-preview: if: github.event.action == 'closed' runs-on: ubuntu-latest steps: - name: Configure kubectl uses: azure/k8s-set-context@v3 with: kubeconfig: ${{ secrets.KUBE_CONFIG }} - name: Delete namespace run: kubectl delete namespace ${{ env.PREVIEW_NAMESPACE }} --ignore-not-found - name: Comment on PR uses: actions/github-script@v7 with: script: | github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: '🧹 Preview environment has been cleaned up.' }) ---# Automatic cleanup CronJob for orphaned preview environmentsapiVersion: batch/v1kind: CronJobmetadata: name: preview-cleanup namespace: preview-systemspec: schedule: "0 * * * *" # Every hour jobTemplate: spec: template: spec: serviceAccountName: preview-cleanup containers: - name: cleanup image: bitnami/kubectl:latest command: - /bin/sh - -c - | # Delete preview namespaces older than 48 hours CUTOFF=$(($(date +%s) - 172800)) for ns in $(kubectl get ns -l preview=true -o jsonpath='{.items[*].metadata.name}'); do CREATED=$(kubectl get ns $ns -o jsonpath='{.metadata.labels.created-at}') if [ "$CREATED" -lt "$CUTOFF" ]; then echo "Deleting stale preview namespace: $ns" kubectl delete ns $ns fi done restartPolicy: NeverPreview environments can become expensive if not managed carefully. Each PR running its own databases and services adds up. Implement automatic cleanup (on PR close and after time limits), use smaller instance sizes than production, and consider shared databases with schema-per-PR isolation for database-heavy workloads.
The staging environment is the final validation step before production. It should mirror production as closely as possible while remaining safe for testing.
Staging Environment Principles:
| Aspect | Should Match Production? | Notes |
|---|---|---|
| Infrastructure (K8s, databases) | Yes | Same types, smaller scale acceptable |
| Configuration structure | Yes | Same config keys, different values |
| Service versions | Yes | Staging is next-production |
| Network topology | Yes | Same VPCs, load balancers, ingress |
| Monitoring/alerting | Yes | Catch observability issues early |
| Data volume | Reduced OK | Representative patterns, smaller scale |
| Real user data | No | Use synthetic or anonymized data |
| Third-party integrations | Sandbox mode | Use test/sandbox APIs |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
# Terraform: Staging environment that mirrors production structure# infrastructure/environments/staging/main.tf module "vpc" { source = "../../modules/vpc" environment = "staging" cidr_block = "10.1.0.0/16" # Different from prod: 10.0.0.0/16 # Same AZ structure as production azs = ["us-east-1a", "us-east-1b", "us-east-1c"]} module "eks" { source = "../../modules/eks" cluster_name = "microservices-staging" vpc_id = module.vpc.vpc_id subnet_ids = module.vpc.private_subnet_ids # Smaller node groups than production node_groups = { general = { instance_types = ["t3.large"] # Prod uses t3.xlarge min_size = 2 # Prod uses 3 max_size = 6 # Prod uses 15 desired_size = 3 # Prod uses 6 } }} module "databases" { source = "../../modules/rds" for_each = { orders = { db_name = "orders" } users = { db_name = "users" } products = { db_name = "products" } } identifier = "staging-${each.key}" vpc_id = module.vpc.vpc_id subnet_ids = module.vpc.private_subnet_ids # Same engine versions as production engine = "postgres" engine_version = "15.4" # Smaller instance class instance_class = "db.t3.medium" # Prod uses db.r5.large # Single AZ (not multi-AZ like prod) multi_az = false # Same encryption settings storage_encrypted = true kms_key_id = module.kms.key_id} module "redis" { source = "../../modules/elasticache" cluster_id = "staging-cache" vpc_id = module.vpc.vpc_id subnet_ids = module.vpc.private_subnet_ids # Same engine version engine_version = "7.0" # Smaller node type node_type = "cache.t3.small" # Prod uses cache.r5.large num_cache_nodes = 1 # Prod uses 2} module "kafka" { source = "../../modules/msk" cluster_name = "staging-events" vpc_id = module.vpc.vpc_id subnet_ids = module.vpc.private_subnet_ids # Same Kafka version kafka_version = "3.5.1" # Smaller brokers broker_node_type = "kafka.t3.small" # Prod uses kafka.m5.large number_of_nodes = 2 # Prod uses 3} # Outputs for service configurationoutput "database_endpoints" { value = { for k, v in module.databases : k => v.endpoint }} output "redis_endpoint" { value = module.redis.endpoint} output "kafka_bootstrap_servers" { value = module.kafka.bootstrap_servers}Never copy production data to staging—it creates compliance and security risks. Instead, generate synthetic data that matches production patterns: same data shapes, realistic volumes, representative edge cases. Tools like Faker can generate realistic-looking data; custom scripts can match your domain's specific patterns.
No staging environment perfectly replicates production. Real users, real data volumes, real traffic patterns, and real third-party integrations only exist in production. Mature organizations embrace production testing—carefully controlled validation in the live environment.
Production Testing Techniques:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143
// Synthetic monitoring - continuous production validation// src/monitoring/synthetics/checkout-flow.ts import { SyntheticMonitor } from "./framework"; export const checkoutFlowMonitor = new SyntheticMonitor({ name: "checkout-flow", schedule: "*/5 * * * *", // Every 5 minutes locations: ["us-east-1", "us-west-2", "eu-west-1"], async run({ page, metrics }) { const startTime = Date.now(); try { // Use dedicated test account await page.goto(process.env.PRODUCTION_URL!); await this.login(page, process.env.SYNTHETIC_USER!, process.env.SYNTHETIC_PASSWORD!); // Browse to product await page.goto("/products/synthetic-test-product"); metrics.recordTiming("product_page_load", Date.now() - startTime); // Add to cart await page.click('[data-action="add-to-cart"]'); await page.waitForSelector('[data-testid="cart-updated"]'); metrics.recordTiming("add_to_cart", Date.now() - startTime); // Proceed to checkout await page.click('[data-action="proceed-to-checkout"]'); await page.waitForSelector('[data-testid="checkout-form"]'); metrics.recordTiming("checkout_page_load", Date.now() - startTime); // Verify payment options load await page.waitForSelector('[data-testid="payment-methods"]'); // Don't actually complete purchase - just verify flow works await page.click('[data-action="cancel-checkout"]'); metrics.recordSuccess(); metrics.recordTiming("total_flow", Date.now() - startTime); } catch (error) { metrics.recordFailure(error); // Capture diagnostics on failure await page.screenshot({ path: `/tmp/synthetic-failure-${Date.now()}.png` }); // Alert on-call if critical path is broken if (this.isWithinBusinessHours()) { await this.sendAlert({ severity: "high", title: "Checkout flow synthetic failing", details: error.message, }); } } }, private async login(page: Page, email: string, password: string) { await page.goto("/login"); await page.fill('[data-testid="email"]', email); await page.fill('[data-testid="password"]', password); await page.click('[data-action="login"]'); await page.waitForSelector('[data-testid="user-menu"]'); }, private isWithinBusinessHours(): boolean { const hour = new Date().getHours(); return hour >= 9 && hour < 18; },}); // Canary deployment with automated rollback// kubernetes/rollout-strategy.yaml---apiVersion: argoproj.io/v1alpha1kind: Rolloutmetadata: name: order-servicespec: replicas: 10 strategy: canary: # Canary steps steps: - setWeight: 5 # 5% traffic to canary - pause: { duration: 5m } - setWeight: 20 # 20% traffic - pause: { duration: 10m } - setWeight: 50 # 50% traffic - pause: { duration: 15m } - setWeight: 100 # Full rollout # Analysis for automatic rollback analysis: templates: - templateName: success-rate startingStep: 1 # Start analysis after first step args: - name: service-name value: order-service # Automatic rollback triggers canaryMetadata: labels: role: canary stableMetadata: labels: role: stable---apiVersion: argoproj.io/v1alpha1kind: AnalysisTemplatemetadata: name: success-ratespec: args: - name: service-name metrics: - name: success-rate interval: 1m successCondition: result[0] >= 0.99 # 99% success rate required failureLimit: 3 provider: prometheus: address: http://prometheus:9090 query: | sum(rate(http_requests_total{service="{{args.service-name}}", status=~"2.."}[5m])) / sum(rate(http_requests_total{service="{{args.service-name}}"}[5m])) - name: latency-p99 interval: 1m successCondition: result[0] <= 500 # P99 under 500ms failureLimit: 3 provider: prometheus: address: http://prometheus:9090 query: | histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{ service="{{args.service-name}}"}[5m])) by (le) ) * 1000Production testing must be done carefully. Use dedicated test accounts that don't affect real users. Ensure synthetic transactions are identifiable and excluded from business metrics. Design feature flags for quick rollback. Always have a clear blast radius—know exactly what could go wrong and how to fix it.
Configuration differences between environments are a major source of 'works in staging, fails in production' bugs. Proper configuration management ensures environments differ only in intended ways.
Configuration Principles:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145
// Layered configuration pattern// src/config/index.ts import { z } from "zod";import { baseConfig } from "./base";import { developmentOverrides } from "./development";import { stagingOverrides } from "./staging";import { productionOverrides } from "./production"; // Schema validation ensures configuration is complete and correctconst ConfigSchema = z.object({ // Environment identification environment: z.enum(["development", "staging", "production"]), // Service configuration service: z.object({ name: z.string(), version: z.string(), port: z.number().int().positive(), }), // Database configuration database: z.object({ host: z.string(), port: z.number().int().positive(), name: z.string(), user: z.string(), password: z.string(), // From secrets poolSize: z.number().int().positive(), ssl: z.boolean(), }), // Cache configuration cache: z.object({ host: z.string(), port: z.number().int().positive(), ttlSeconds: z.number().int().positive(), }), // Kafka configuration kafka: z.object({ brokers: z.array(z.string()), clientId: z.string(), ssl: z.boolean(), sasl: z.object({ mechanism: z.enum(["plain", "scram-sha-256", "scram-sha-512"]), username: z.string(), password: z.string(), }).optional(), }), // External services integrations: z.object({ paymentGateway: z.object({ baseUrl: z.string().url(), apiKey: z.string(), testMode: z.boolean(), }), emailService: z.object({ baseUrl: z.string().url(), apiKey: z.string(), sandboxMode: z.boolean(), }), }), // Feature flags features: z.object({ newCheckoutFlow: z.boolean(), enhancedSearch: z.boolean(), betaFeatures: z.boolean(), }), // Observability observability: z.object({ logLevel: z.enum(["debug", "info", "warn", "error"]), tracingEnabled: z.boolean(), metricsEnabled: z.boolean(), }),}); type Config = z.infer<typeof ConfigSchema>; function loadConfig(): Config { const environment = process.env.NODE_ENV || "development"; // Start with base config let config: unknown = baseConfig; // Apply environment-specific overrides switch (environment) { case "development": config = deepMerge(config, developmentOverrides); break; case "staging": config = deepMerge(config, stagingOverrides); break; case "production": config = deepMerge(config, productionOverrides); break; default: throw new Error(`Unknown environment: ${environment}`); } // Inject secrets from environment variables config = injectSecrets(config as object); // Validate and return const result = ConfigSchema.safeParse(config); if (!result.success) { console.error("Configuration validation failed:"); console.error(result.error.format()); throw new Error("Invalid configuration"); } return result.data;} // Secret injection from environment variablesfunction injectSecrets(config: object): object { const secrets = { "database.password": process.env.DATABASE_PASSWORD, "kafka.sasl.password": process.env.KAFKA_PASSWORD, "integrations.paymentGateway.apiKey": process.env.PAYMENT_GATEWAY_API_KEY, "integrations.emailService.apiKey": process.env.EMAIL_SERVICE_API_KEY, }; let result = { ...config }; for (const [path, value] of Object.entries(secrets)) { if (value) { result = setPath(result, path, value); } } return result;} export const config = loadConfig(); // Validate on startupconsole.log(`Configuration loaded for environment: ${config.environment}`);console.log(`Features enabled: ${Object.entries(config.features) .filter(([_, v]) => v) .map(([k, _]) => k) .join(", ")}`);Store secrets in environment variables, not configuration files. Use a secrets manager (HashiCorp Vault, AWS Secrets Manager) to inject secrets at runtime. Configuration files in version control should contain structure and non-sensitive values; secrets are always injected from secure sources.
Test environment management is foundational infrastructure for microservices testing. Well-designed environments enable fast feedback, reliable tests, and confident deployments. Poor environment practices lead to constant friction, flaky tests, and deployment fear.
Module Complete:
Congratulations! You've completed the Testing Microservices module. You now understand the complete testing strategy for distributed systems—from unit tests that run in milliseconds to production monitoring that runs continuously. The testing pyramid is replicated across every service, and contract testing enables the independent deployment that makes microservices valuable.
The key takeaway: testing microservices is not about more tests or different tests—it's about the right tests at the right level. Unit tests verify logic, integration tests verify infrastructure, contract tests verify compatibility, E2E tests verify journeys, and the right environments make all of these reliable.
You've completed the Testing Microservices module! You now have a comprehensive understanding of unit testing, integration testing, contract testing, end-to-end testing, and test environment management for distributed systems. These skills are essential for building and maintaining reliable microservices at scale.