Loading content...
Every public-facing API needs a way to identify who is making requests. Before we dive into OAuth flows, JWTs, or mutual TLS, there exists a simpler, universal mechanism that virtually every API relies upon: API keys. Despite their apparent simplicity, API key management is fraught with security pitfalls that have led to some of the most devastating data breaches in history.
From Twilio's 2022 breach traced back to compromised API credentials to countless GitHub-exposed secrets causing millions in damages, improper API key management remains one of the most common yet preventable security failures in modern software systems. Understanding how to properly generate, distribute, store, rotate, and revoke API keys isn't just good practice—it's essential infrastructure security.
By the end of this page, you will understand the complete lifecycle of API keys, including secure generation algorithms, entropy requirements, distribution strategies, storage best practices, rotation policies, revocation mechanisms, and monitoring approaches. You'll be equipped to design API key systems that can withstand sophisticated attacks while maintaining developer experience.
An API key is a unique identifier used to authenticate a request to an API. Unlike user credentials (username/password) which authenticate individual humans, API keys typically authenticate applications, services, or systems—they answer the question "which application is making this request?" rather than "which user is making this request?"
The Anatomy of an API Request with API Key:
When a client makes an API request with an API key, the flow typically looks like this:
This simple flow belies significant complexity in implementation. Each step has security implications that, if ignored, can expose your entire system to compromise.
| Characteristic | API Keys | OAuth Tokens | JWTs | mTLS Certificates |
|---|---|---|---|---|
| Primary Use Case | Service-to-service, developer access | User-delegated access | Stateless authentication | High-security service mesh |
| Lifetime | Long-lived (months/years) | Short-lived (minutes/hours) | Short-lived with refresh | Medium (certificate validity) |
| Revocation | Immediate (database lookup) | Requires token introspection | Difficult until expiry | CRL/OCSP required |
| Complexity | Low | High | Medium | Very High |
| Statefulness | Stateful (server stores key) | Can be stateless/stateful | Stateless | Stateful (PKI required) |
| Self-Contained | No (requires lookup) | Depends on implementation | Yes (claims in token) | Yes (cert contains identity) |
A common misconception is that API keys provide strong security on their own. In reality, API keys are like house keys: they prove you have authorization, but anyone who copies them gains the same access. Always pair API keys with additional security measures (TLS, IP restrictions, request signing) in production systems.
The security of your API key system begins at generation. A weak key generation process can render all subsequent security measures meaningless. Let's examine the critical aspects of secure key generation.
Entropy Requirements:
Entropy measures the unpredictability of your key. Attack resistance depends directly on having sufficient entropy:
With 256 bits of entropy, an attacker attempting brute-force would need 2^256 attempts—more operations than atoms in the observable universe.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
import secretsimport hashlibimport base64from datetime import datetime, timezonefrom typing import Tuple class SecureAPIKeyGenerator: """ Production-grade API key generator with proper entropy, prefix support, and checksum validation. """ # Prefix identifies key type and enables quick routing PREFIX_LIVE = "sk_live_" # Production keys PREFIX_TEST = "sk_test_" # Testing/sandbox keys PREFIX_RESTRICTED = "rk_" # Restricted scope keys KEY_BYTES = 32 # 256 bits of entropy @classmethod def generate_key(cls, environment: str = "live") -> Tuple[str, str]: """ Generate a cryptographically secure API key. Returns: Tuple of (full_key, key_hash) where: - full_key: The complete API key to give to the client - key_hash: SHA-256 hash to store in database """ # Use secrets module (CSPRNG) - NEVER use random module raw_bytes = secrets.token_bytes(cls.KEY_BYTES) # URL-safe base64 encoding for easy transmission key_body = base64.urlsafe_b64encode(raw_bytes).decode('utf-8') # Remove padding for cleaner keys key_body = key_body.rstrip('=') # Add environment-specific prefix prefix = cls.PREFIX_LIVE if environment == "live" else cls.PREFIX_TEST # Add 4-character checksum for typo detection checksum = cls._compute_checksum(key_body) full_key = f"{prefix}{key_body}_{checksum}" # CRITICAL: Never store the raw key, only its hash key_hash = hashlib.sha256(full_key.encode()).hexdigest() return full_key, key_hash @classmethod def _compute_checksum(cls, key_body: str) -> str: """Compute 4-character checksum for typo detection.""" hash_bytes = hashlib.sha256(key_body.encode()).digest() return base64.urlsafe_b64encode(hash_bytes[:3]).decode('utf-8')[:4] @classmethod def validate_format(cls, api_key: str) -> bool: """ Validate API key format WITHOUT checking database. Useful for quick rejection of malformed keys. """ # Check prefix valid_prefixes = (cls.PREFIX_LIVE, cls.PREFIX_TEST, cls.PREFIX_RESTRICTED) if not any(api_key.startswith(p) for p in valid_prefixes): return False # Extract parts parts = api_key.split('_') if len(parts) < 3: return False # Verify checksum key_body = '_'.join(parts[2:-1]) if len(parts) > 3 else parts[2].rsplit('_', 1)[0] provided_checksum = parts[-1] expected_checksum = cls._compute_checksum(key_body) return secrets.compare_digest(provided_checksum, expected_checksum) @classmethod def hash_key_for_lookup(cls, api_key: str) -> str: """Hash the key for secure database lookup.""" return hashlib.sha256(api_key.encode()).hexdigest() # Example usageif __name__ == "__main__": generator = SecureAPIKeyGenerator() # Generate production key api_key, key_hash = generator.generate_key("live") print(f"API Key (give to client): {api_key}") print(f"Key Hash (store in DB): {key_hash}") print(f"Format Valid: {generator.validate_format(api_key)}")secrets module, Go's crypto/rand, Node's crypto.randomBytes(). Never use Math.random() or similar pseudo-random generators.sk_live_ vs sk_test_ prevent accidental use of test keys in production and vice versa. They also allow quick routing at the edge.How you store API keys determines your blast radius when (not if) your database is compromised. The fundamental principle is simple: never store what you don't need to store, and always hash what you must store.
The One-Way Hash Model:
The safest storage model treats API keys like passwords:
This model means even a complete database breach yields nothing usable—attackers get hashes that can't be reversed into working keys.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
-- API Keys table with security-first designCREATE TABLE api_keys ( -- Internal identifier (never exposed) id UUID PRIMARY KEY DEFAULT gen_random_uuid(), -- SHA-256 hash of the full API key (for lookup) key_hash VARCHAR(64) UNIQUE NOT NULL, -- Key prefix for display (e.g., "sk_live_abc...") -- Only first 12 chars stored for identification key_prefix VARCHAR(20) NOT NULL, -- Owner information organization_id UUID NOT NULL REFERENCES organizations(id), created_by_user_id UUID REFERENCES users(id), -- Human-readable name for the key name VARCHAR(255) NOT NULL, -- Key metadata environment VARCHAR(20) NOT NULL CHECK (environment IN ('live', 'test')), -- Permissions (JSON array of allowed scopes) scopes JSONB NOT NULL DEFAULT '[]', -- Rate limiting configuration rate_limit_per_minute INTEGER DEFAULT 1000, rate_limit_per_day INTEGER DEFAULT 100000, -- IP restrictions (NULL = allow all) allowed_ips INET[] DEFAULT NULL, allowed_cidrs CIDR[] DEFAULT NULL, -- Lifecycle management created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), last_used_at TIMESTAMPTZ, expires_at TIMESTAMPTZ, -- NULL = never expires revoked_at TIMESTAMPTZ, revoked_by_user_id UUID REFERENCES users(id), revocation_reason TEXT, -- Key version for rotation tracking version INTEGER NOT NULL DEFAULT 1, -- Indexes for performance CONSTRAINT valid_expiration CHECK (expires_at IS NULL OR expires_at > created_at)); -- Index for fast key lookup by hash (most critical index)CREATE UNIQUE INDEX idx_api_keys_hash ON api_keys(key_hash) WHERE revoked_at IS NULL; -- Index for listing keys by organizationCREATE INDEX idx_api_keys_org ON api_keys(organization_id, created_at DESC); -- Index for finding expired keys (cleanup job)CREATE INDEX idx_api_keys_expires ON api_keys(expires_at) WHERE expires_at IS NOT NULL AND revoked_at IS NULL; -- Index for usage tracking updatesCREATE INDEX idx_api_keys_last_used ON api_keys(last_used_at) WHERE revoked_at IS NULL; -- ================================================-- API Key Usage Log (for analytics and auditing)-- ================================================CREATE TABLE api_key_usage_log ( id BIGSERIAL PRIMARY KEY, key_id UUID NOT NULL REFERENCES api_keys(id), -- Request metadata timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(), endpoint VARCHAR(500) NOT NULL, method VARCHAR(10) NOT NULL, status_code INTEGER NOT NULL, response_time_ms INTEGER, -- Client information client_ip INET NOT NULL, user_agent TEXT, -- Request identifiers request_id UUID, -- Partition by month for efficient cleanup created_month DATE NOT NULL DEFAULT date_trunc('month', NOW())) PARTITION BY RANGE (created_month); -- Create partitions (example for current and next month)CREATE TABLE api_key_usage_log_2024_01 PARTITION OF api_key_usage_log FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'); -- Index for querying by keyCREATE INDEX idx_usage_log_key_time ON api_key_usage_log(key_id, timestamp DESC);The full API key should NEVER appear in: database tables, application logs, error messages, Sentry/exception tracking, analytics events, or debugging outputs. Even partial exposure (like the last 8 characters) can aid attackers in brute-force attempts.
The Data Model Trade-offs:
Your storage architecture must balance several concerns:
| Concern | Approach | Trade-off |
|---|---|---|
| Lookup speed | Hash index on key_hash | Hash computation on every request |
| Auditability | Store key_prefix for display | Slightly more storage |
| Revocation | Soft delete with revoked_at | Requires index filter |
| Rate limiting | Store limits per key | More complex validation |
| Usage analytics | Separate log table | Storage growth over time |
Caching Considerations:
For high-traffic APIs, hitting the database on every request is prohibitive. Implement a caching layer with careful attention to cache invalidation on revocation:
Generating secure keys means nothing if they're distributed insecurely. The distribution phase is often the weakest link, with keys exposed in emails, chat messages, or insecure channels.
The Golden Rule: Keys Are Shown Once
When a user creates an API key:
sk_live_abc...xyz) for identificationThis model ensures that even if your admin panel is compromised, attackers cannot extract existing keys—they can only create new ones (which triggers alerts).
Dashboard UX for Key Management:
A well-designed key management interface should:
For enterprise customers or CI/CD integration, provide a secure API for key creation that returns the key exactly once in the response. The API should require elevated authentication (MFA token, short-lived grant) and log all creations. Never provide a 'list keys' endpoint that returns full key values.
Key rotation is the practice of periodically replacing API keys to limit the window of exposure from potential compromises. Even if a key is leaked, regular rotation ensures the leak has limited impact.
Why Rotate Keys?
Rotation Strategies:
| Strategy | How It Works | Pros | Cons | Best For |
|---|---|---|---|---|
| Scheduled Rotation | Keys automatically expire after fixed period (30/60/90 days) | Predictable, enforceable | Disruption if not planned | High-security environments |
| Grace Period Rotation | New key issued, old key valid for overlap period (7 days) | Zero downtime, smooth transition | Brief dual-key window | Production APIs |
| On-Demand Rotation | User manually rotates when needed | Flexible, user-controlled | May never happen | Developer-facing APIs |
| Automated Pipeline Rotation | CI/CD rotates keys as part of deployment | Automated, frequent | Complex setup | Infrastructure-as-code shops |
| Event-Triggered Rotation | Rotate on suspicious activity, personnel change | Responsive to threats | Requires monitoring | Security-conscious organizations |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155
from datetime import datetime, timedelta, timezonefrom typing import Optional, Tuplefrom dataclasses import dataclassimport logging logger = logging.getLogger(__name__) @dataclassclass APIKey: id: str key_hash: str organization_id: str version: int created_at: datetime expires_at: Optional[datetime] revoked_at: Optional[datetime] class KeyRotationService: """ Implements zero-downtime API key rotation with grace periods. """ # How long both keys are valid during rotation GRACE_PERIOD_DAYS = 7 # Warning threshold before expiration WARNING_THRESHOLD_DAYS = 14 def __init__(self, key_repository, key_generator, notification_service): self.key_repository = key_repository self.key_generator = key_generator self.notification_service = notification_service def rotate_key( self, current_key_id: str, initiated_by: str, reason: str = "scheduled_rotation" ) -> Tuple[str, datetime]: """ Rotate an API key with grace period for zero-downtime transition. Returns: Tuple of (new_api_key, old_key_expiration) """ # Fetch current key current_key = self.key_repository.get_by_id(current_key_id) if not current_key: raise ValueError(f"Key {current_key_id} not found") if current_key.revoked_at: raise ValueError("Cannot rotate a revoked key") # Generate new key with incremented version new_api_key, new_key_hash = self.key_generator.generate_key( environment=current_key.environment ) # Calculate when old key should expire old_key_expiration = datetime.now(timezone.utc) + timedelta( days=self.GRACE_PERIOD_DAYS ) # Create new key entry new_key_record = self.key_repository.create( key_hash=new_key_hash, key_prefix=new_api_key[:20] + "...", organization_id=current_key.organization_id, version=current_key.version + 1, scopes=current_key.scopes, # Inherit permissions rate_limits=current_key.rate_limits, # Inherit limits name=f"{current_key.name} (rotated)", previous_key_id=current_key.id, ) # Schedule old key expiration (soft delete) self.key_repository.schedule_expiration( key_id=current_key.id, expires_at=old_key_expiration ) # Log the rotation event self.key_repository.log_event( key_id=current_key.id, event_type="rotation_initiated", initiated_by=initiated_by, reason=reason, new_key_id=new_key_record.id, old_key_expires_at=old_key_expiration, ) # Notify organization admins self.notification_service.notify_key_rotation( organization_id=current_key.organization_id, old_key_prefix=current_key.key_prefix, old_key_expires_at=old_key_expiration, reason=reason, ) logger.info( f"Rotated key {current_key.id} -> {new_key_record.id}, " f"old key expires at {old_key_expiration}" ) return new_api_key, old_key_expiration def check_keys_needing_rotation(self) -> list: """ Find keys approaching expiration that need rotation warnings. Run this as a scheduled job. """ warning_threshold = datetime.now(timezone.utc) + timedelta( days=self.WARNING_THRESHOLD_DAYS ) keys = self.key_repository.find_expiring_before(warning_threshold) for key in keys: self.notification_service.notify_key_expiring( organization_id=key.organization_id, key_prefix=key.key_prefix, expires_at=key.expires_at, days_remaining=(key.expires_at - datetime.now(timezone.utc)).days ) return keys def emergency_revoke_all( self, organization_id: str, initiated_by: str, reason: str ) -> int: """ Emergency revocation of all keys for an organization. Use in case of suspected breach. """ revoked_count = self.key_repository.revoke_all_for_org( organization_id=organization_id, revoked_by=initiated_by, reason=reason, ) self.notification_service.notify_emergency_revocation( organization_id=organization_id, revoked_count=revoked_count, reason=reason, ) logger.critical( f"EMERGENCY: Revoked {revoked_count} keys for org {organization_id}. " f"Reason: {reason}" ) return revoked_countDuring the grace period, both old and new keys are valid. This is a security trade-off: it enables zero-downtime rotation but briefly expands the attack surface. Keep grace periods as short as operationally feasible (3-7 days). For emergency rotations (suspected compromise), set grace period to zero—immediate revocation.
The ability to instantly revoke a compromised key is perhaps the most critical capability of your key management system. When GitHub scans commits and finds exposed credentials, when a security researcher reports a leak, when an employee is terminated—you need revocation to work immediately and completely.
Revocation Requirements:
The Cache Invalidation Challenge:
If you're caching key metadata for performance (which you should at scale), revocation must invalidate cached entries. This is a distributed systems problem with several solutions:
| Approach | Latency | Complexity | Best For |
|---|---|---|---|
| Direct invalidation | Instant | High | Small clusters |
| Pub/sub broadcast | Near-instant | Medium | Multi-region |
| Short TTL caching | Max TTL seconds | Low | Most systems |
| Hybrid | Near-instant | Medium-High | High-security + scale |
For most systems, short TTL caching (60-120 seconds) provides an acceptable balance. For high-security systems where even 60 seconds of post-revocation access is unacceptable, implement pub/sub invalidation.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173
package apikey import ( "context" "encoding/json" "fmt" "time" "github.com/redis/go-redis/v9") // RevocationService handles API key revocation with cache invalidationtype RevocationService struct { db *KeyRepository redis *redis.Client pubsub *redis.PubSub auditLogger *AuditLogger} // RevocationEvent is published when a key is revokedtype RevocationEvent struct { KeyHash string `json:"key_hash"` KeyID string `json:"key_id"` OrgID string `json:"org_id"` RevokedAt time.Time `json:"revoked_at"` RevokedBy string `json:"revoked_by"` Reason string `json:"reason"`} const ( revocationChannel = "api_key_revocations" keyCacheTTL = 60 * time.Second keyPrefix = "apikey:") // RevokeKey immediately revokes an API key and broadcasts invalidationfunc (s *RevocationService) RevokeKey( ctx context.Context, keyID string, revokedBy string, reason string,) error { // 1. Update database (source of truth) key, err := s.db.RevokeKey(ctx, keyID, revokedBy, reason) if err != nil { return fmt.Errorf("failed to revoke key in database: %w", err) } // 2. Immediately delete from local cache cacheKey := keyPrefix + key.KeyHash if err := s.redis.Del(ctx, cacheKey).Err(); err != nil { // Log but don't fail - cache will expire s.auditLogger.Warn("cache delete failed", "key_id", keyID, "error", err) } // 3. Broadcast revocation to all instances event := RevocationEvent{ KeyHash: key.KeyHash, KeyID: key.ID, OrgID: key.OrganizationID, RevokedAt: time.Now().UTC(), RevokedBy: revokedBy, Reason: reason, } eventJSON, err := json.Marshal(event) if err != nil { return fmt.Errorf("failed to marshal revocation event: %w", err) } if err := s.redis.Publish(ctx, revocationChannel, eventJSON).Err(); err != nil { // Log critical - other instances won't know about revocation s.auditLogger.Error("CRITICAL: revocation broadcast failed", "key_id", keyID, "error", err, ) // Don't return error - revocation succeeded in DB } // 4. Log to audit trail s.auditLogger.Log(AuditEvent{ Type: "key_revoked", KeyID: keyID, Actor: revokedBy, Reason: reason, Timestamp: event.RevokedAt, }) return nil} // StartRevocationListener subscribes to revocation eventsfunc (s *RevocationService) StartRevocationListener(ctx context.Context) { s.pubsub = s.redis.Subscribe(ctx, revocationChannel) go func() { ch := s.pubsub.Channel() for { select { case <-ctx.Done(): s.pubsub.Close() return case msg := <-ch: var event RevocationEvent if err := json.Unmarshal([]byte(msg.Payload), &event); err != nil { s.auditLogger.Error("failed to unmarshal revocation event", "error", err, ) continue } // Invalidate local cache cacheKey := keyPrefix + event.KeyHash s.redis.Del(ctx, cacheKey) s.auditLogger.Info("processed revocation broadcast", "key_id", event.KeyID, ) } } }()} // ValidateKey checks if a key is valid, using cache with revocation awarenessfunc (s *RevocationService) ValidateKey( ctx context.Context, keyHash string,) (*KeyMetadata, error) { cacheKey := keyPrefix + keyHash // Try cache first cached, err := s.redis.Get(ctx, cacheKey).Bytes() if err == nil { var metadata KeyMetadata if err := json.Unmarshal(cached, &metadata); err == nil { // Check if cached entry is marked as revoked if metadata.RevokedAt != nil { return nil, ErrKeyRevoked } return &metadata, nil } } // Cache miss - query database key, err := s.db.GetKeyByHash(ctx, keyHash) if err != nil { return nil, err } if key == nil { // Cache negative result briefly to prevent repeated lookups s.redis.Set(ctx, cacheKey, []byte("{}"), 30*time.Second) return nil, ErrKeyNotFound } if key.RevokedAt != nil { return nil, ErrKeyRevoked } // Cache valid key metadata := &KeyMetadata{ KeyID: key.ID, OrgID: key.OrganizationID, Scopes: key.Scopes, RateLimits: key.RateLimits, RevokedAt: key.RevokedAt, } cacheBytes, _ := json.Marshal(metadata) s.redis.Set(ctx, cacheKey, cacheBytes, keyCacheTTL) return metadata, nil}Effective API key management requires continuous monitoring to detect anomalies, prevent abuse, and support forensic investigation. Your monitoring strategy should cover both operational metrics and security events.
Key Metrics to Track:
Alert Conditions:
Not all anomalies require immediate action. Design your alerting to distinguish between:
| Severity | Condition | Response Time | Example |
|---|---|---|---|
| Critical | Key used after revocation attempt | Immediate | Revocation failed, key still valid |
| High | Abnormal usage spike (>1000% baseline) | Minutes | Possible credential compromise |
| Medium | Auth failures from single IP | Hours | Brute-force attempt |
| Low | Key approaching expiration | Days | Rotation reminder |
| Info | New key created | None | Audit trail only |
Partner with GitHub, GitGuardian, and similar secret scanning services. When they detect your API key pattern in public repositories, you should receive alerts and consider automatic revocation. Services like GitHub Advanced Security can be configured to notify you in real-time when keys matching your pattern are exposed.
API key management is foundational to API security. Let's consolidate the essential practices:
What's Next:
API keys provide identification but not verification—they prove who is making a request but don't prevent request tampering. The next page explores HMAC Authentication, which adds cryptographic verification to ensure that requests haven't been modified in transit and truly originate from the claimed sender.
You now understand the complete lifecycle of API key management: from secure generation and storage through distribution, rotation, and revocation. You've learned the architectural patterns for building key management systems that can scale while maintaining security. Next, we'll add cryptographic authentication to prevent request tampering.