Loading learning content...
Systems frequently need to use sensitive data without exposing its actual values. Developers need production-like test data without real Social Security numbers. Customer support agents need to verify accounts without seeing full credit card numbers. Analytics teams need to track user behavior without accessing personal identifiers. Data masking and tokenization are the techniques that make this possible.
These aren't just compliance checkboxes—they're fundamental tools for implementing least privilege at the data level. By replacing or obscuring sensitive values, you minimize exposure surface area while maintaining the data's utility for legitimate purposes.
By the end of this page, you will understand the differences between masking, tokenization, and encryption; learn when to apply each technique; master implementation patterns for distributed systems; and design architectures that balance security with operational requirements.
Before diving into implementation, it's crucial to understand what distinguishes these protection techniques. Each serves different purposes and has different security properties.
Definitions:
***-**-1234 for SSN.tok_7d8f3a2b for credit card.aGVsbG8gd29ybGQ=.| Property | Masking | Tokenization | Encryption |
|---|---|---|---|
| Reversibility | No - one-way transformation | Yes - with token vault access | Yes - with decryption key |
| Format Preservation | Yes - maintains structure | Configurable - can preserve format | No - ciphertext is different format |
| Original Data Location | Destroyed | Token vault (separate system) | Same location (in encrypted form) |
| Key/Vault Required | No | Yes - vault access needed | Yes - key required |
| Performance Impact | Minimal | Vault lookup latency | CPU for encrypt/decrypt |
| Compliance Scope Reduction | Yes - masked data not regulated | Yes - tokens not regulated | No - encrypted data still regulated |
| Use in Queries/Indexes | Limited - partial matching only | Yes - token can be indexed | No - must decrypt first |
| Analytics Utility | Limited - lossy | Preserves relationships | Requires decryption |
When to Use Each Technique:
| Scenario | Recommended Technique | Rationale |
|---|---|---|
| Non-production test data | Masking | No need for reversibility; reduces compliance scope |
| Customer support display | Masking | Verification without full access |
| Payment processing | Tokenization | PCI DSS scope reduction; need to complete transactions |
| Cross-system data sharing | Tokenization | Maintain referential integrity without exposing data |
| Data at rest protection | Encryption | Need full access for authorized users |
| Database field storage | Encryption or Tokenization | Depends on access patterns and compliance needs |
A critical distinction: encrypted data is still considered sensitive under most regulations. If you encrypt a credit card number and store it, you're still 'storing' cardholder data for PCI DSS purposes. Tokenization, by contrast, removes the sensitive data from your system entirely (it's in the vault), reducing your compliance scope.
Data masking transforms sensitive values into non-sensitive representations while preserving data utility for specific purposes. Different masking techniques trade off between security (how much is hidden) and utility (what can still be done with the data).
Masking Technique Taxonomy:
| Technique | Description | Example | Best For |
|---|---|---|---|
| Character Substitution | Replace characters with fixed symbols | John → J*** | Display masking, partial reveal |
| Nulling/Deletion | Remove value entirely | SSN: null | Data not needed for purpose |
| Shuffling | Randomize values within dataset | Swap names between records | Statistical analysis preservation |
| Variance | Add random noise to numeric values | Salary: $50,000 → $48,732 | Analytics on approximations |
| Substitution | Replace with realistic fake values | John Smith → James Wilson | Realistic test data |
| Date Aging | Shift dates by random/fixed offset | 1980-03-15 → 1982-07-23 | Preserve date relationships |
| Truncation | Remove portion of value | 4532-xxxx-xxxx-1234 → 1234 | Last-4 identification |
| Hashing | One-way cryptographic hash | john@email.com → a3f2c8d... | De-identification with consistency |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220
// Enterprise Data Masking Service// Applies configurable masking strategies to sensitive data interface MaskingRule { fieldPattern: string; // Field name pattern (regex) dataType: DataType; strategy: MaskingStrategy; config: MaskingConfig;} enum MaskingStrategy { CHARACTER_MASK = 'character_mask', SUBSTITUTION = 'substitution', HASH = 'hash', SHUFFLE = 'shuffle', VARIANCE = 'variance', TRUNCATE = 'truncate', NULLIFY = 'nullify', FORMAT_PRESERVING = 'format_preserving',} interface MaskingConfig { maskChar?: string; // Character for masking (default: *) revealFirst?: number; // Characters to show at start revealLast?: number; // Characters to show at end hashAlgorithm?: string; // For hash strategy variancePercent?: number; // For numeric variance substitutionSource?: string; // Data source for substitutions preserveFormat?: boolean; // Maintain character types (letter→letter)} class DataMaskingService { private rules: Map<string, MaskingRule[]>; private substitutionData: SubstitutionDataProvider; private deterministicSeed: string; constructor(config: MaskingServiceConfig) { this.rules = this.loadRules(config.rulesPath); this.substitutionData = new SubstitutionDataProvider(config.substitutionDb); this.deterministicSeed = config.seed || crypto.randomBytes(32).toString('hex'); } async maskRecord<T extends object>( record: T, context: MaskingContext ): Promise<T> { const masked = { ...record } as any; for (const [fieldPath, value] of this.traverseObject(record)) { const applicableRules = this.findApplicableRules(fieldPath, value); if (applicableRules.length > 0) { const rule = applicableRules[0]; // First matching rule const maskedValue = await this.applyMasking( value, rule, context, fieldPath ); this.setFieldValue(masked, fieldPath, maskedValue); } } return masked; } private async applyMasking( value: any, rule: MaskingRule, context: MaskingContext, fieldPath: string ): Promise<any> { if (value === null || value === undefined) { return value; } switch (rule.strategy) { case MaskingStrategy.CHARACTER_MASK: return this.characterMask(String(value), rule.config); case MaskingStrategy.SUBSTITUTION: return this.substitute(value, rule, context); case MaskingStrategy.HASH: return this.hashMask(String(value), rule.config, fieldPath); case MaskingStrategy.VARIANCE: return this.applyVariance(Number(value), rule.config); case MaskingStrategy.TRUNCATE: return this.truncate(String(value), rule.config); case MaskingStrategy.NULLIFY: return null; case MaskingStrategy.FORMAT_PRESERVING: return this.formatPreservingMask(String(value), rule.config); case MaskingStrategy.SHUFFLE: // Shuffle is applied at dataset level, not individual record throw new Error('Shuffle strategy requires dataset-level masking'); default: return this.characterMask(String(value), { maskChar: '*' }); } } private characterMask(value: string, config: MaskingConfig): string { const maskChar = config.maskChar || '*'; const revealFirst = config.revealFirst || 0; const revealLast = config.revealLast || 0; if (value.length <= revealFirst + revealLast) { return maskChar.repeat(value.length); } const start = value.substring(0, revealFirst); const end = value.substring(value.length - revealLast); const middleLength = value.length - revealFirst - revealLast; const masked = maskChar.repeat(middleLength); return start + masked + end; } private async substitute( value: any, rule: MaskingRule, context: MaskingContext ): Promise<any> { // Deterministic substitution for consistency across records const seed = this.generateDeterministicSeed(value, context.recordId); switch (rule.dataType) { case DataType.NAME: return this.substitutionData.getRandomName(seed); case DataType.EMAIL: return this.substitutionData.getRandomEmail(seed); case DataType.PHONE: return this.substitutionData.getRandomPhone(seed); case DataType.ADDRESS: return this.substitutionData.getRandomAddress(seed); case DataType.SSN: return this.substitutionData.getRandomSSN(seed); default: return value; } } private hashMask( value: string, config: MaskingConfig, fieldPath: string ): string { const algorithm = config.hashAlgorithm || 'sha256'; // Include field path and seed to prevent cross-field rainbow attacks const saltedValue = `${this.deterministicSeed}:${fieldPath}:${value}`; return crypto .createHash(algorithm) .update(saltedValue) .digest('hex') .substring(0, 16); // Truncate for readability } private formatPreservingMask(value: string, config: MaskingConfig): string { // Replace each character with same type (letter→letter, digit→digit) let result = ''; const revealFirst = config.revealFirst || 0; const revealLast = config.revealLast || 0; for (let i = 0; i < value.length; i++) { if (i < revealFirst || i >= value.length - revealLast) { result += value[i]; } else { result += this.replacePreservingFormat(value[i]); } } return result; } private replacePreservingFormat(char: string): string { if (/[a-z]/.test(char)) return 'x'; if (/[A-Z]/.test(char)) return 'X'; if (/[0-9]/.test(char)) return '0'; return char; // Preserve special characters, spaces, etc. } private applyVariance(value: number, config: MaskingConfig): number { const variancePercent = config.variancePercent || 10; const variance = value * (variancePercent / 100); const offset = (Math.random() * 2 - 1) * variance; return Math.round(value + offset); } // Dataset-level shuffling for referential consistency async shuffleDataset( records: any[], fieldsToShuffle: string[] ): Promise<any[]> { const shuffled = records.map(r => ({ ...r })); for (const field of fieldsToShuffle) { // Extract all values for this field const values = shuffled.map(r => this.getFieldValue(r, field)); // Shuffle the values array for (let i = values.length - 1; i > 0; i--) { const j = Math.floor(Math.random() * (i + 1)); [values[i], values[j]] = [values[j], values[i]]; } // Apply shuffled values back shuffled.forEach((record, index) => { this.setFieldValue(record, field, values[index]); }); } return shuffled; }}Deterministic masking (same input always produces same output) preserves referential integrity—the same email always masks to the same value. Random masking provides better security but breaks foreign key relationships. Choose based on whether you need cross-record consistency.
Dynamic Data Masking (DDM) applies masking at query time based on the requesting user's authorization level. Unlike static masking (which transforms stored data), DDM stores data in original form and masks on read. This enables role-based data access where privileged users see full values while others see masked versions.
DDM Architecture:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171
// Application-Level Dynamic Data Masking// Masks query results based on caller authorization interface MaskingPolicy { fieldPath: string; defaultMask: MaskingStrategy; exemptRoles: string[]; // Roles that see unmasked data exemptPurposes: string[]; // Approved purposes for full access} interface DataAccessContext { userId: string; roles: string[]; accessPurpose: string; clientType: 'internal' | 'external' | 'api';} class DynamicMaskingMiddleware { private policies: Map<string, MaskingPolicy>; private maskingService: DataMaskingService; private auditLogger: DataAccessAuditLogger; async processQueryResult<T>( results: T[], entityType: string, context: DataAccessContext ): Promise<T[]> { const entityPolicies = this.getPoliciesForEntity(entityType); if (entityPolicies.length === 0) { return results; // No masking policies for this entity } const maskedResults: T[] = []; const accessLog: FieldAccessLog[] = []; for (const record of results) { const maskedRecord = { ...record } as any; for (const policy of entityPolicies) { const fieldValue = this.getFieldValue(record, policy.fieldPath); if (fieldValue === undefined || fieldValue === null) { continue; } // Check if user is exempt from masking const isExempt = this.checkExemption(policy, context); if (!isExempt) { // Apply masking const maskedValue = await this.maskingService.maskValue( fieldValue, policy.defaultMask ); this.setFieldValue(maskedRecord, policy.fieldPath, maskedValue); accessLog.push({ field: policy.fieldPath, masked: true, reason: 'policy_enforcement', }); } else { accessLog.push({ field: policy.fieldPath, masked: false, reason: this.getExemptionReason(policy, context), }); } } maskedResults.push(maskedRecord); } // Audit log the access await this.auditLogger.logDataAccess({ userId: context.userId, entityType, recordCount: results.length, fieldAccess: accessLog, timestamp: new Date(), purpose: context.accessPurpose, }); return maskedResults; } private checkExemption( policy: MaskingPolicy, context: DataAccessContext ): boolean { // Check role-based exemption if (policy.exemptRoles.some(role => context.roles.includes(role))) { return true; } // Check purpose-based exemption if (policy.exemptPurposes.includes(context.accessPurpose)) { return true; } return false; } // SQL View approach for database-level DDM generateMaskedView( tableName: string, policies: MaskingPolicy[], roleVariable: string = 'CURRENT_ROLE()' ): string { const selectClauses = []; const columns = this.getTableColumns(tableName); for (const column of columns) { const policy = policies.find(p => p.fieldPath === column.name); if (!policy) { selectClauses.push(column.name); continue; } // Generate CASE statement for role-based masking const exemptRolesStr = policy.exemptRoles .map(r => `'${r}'`) .join(', '); const maskExpression = this.generateSQLMaskExpression( column.name, column.type, policy.defaultMask ); selectClauses.push(` CASE WHEN ${roleVariable} IN (${exemptRolesStr}) THEN ${column.name} ELSE ${maskExpression} END AS ${column.name} `); } return ` CREATE OR REPLACE VIEW ${tableName}_masked AS SELECT ${selectClauses.join(',\n ')} FROM ${tableName} `; } private generateSQLMaskExpression( columnName: string, columnType: string, strategy: MaskingStrategy ): string { switch (strategy) { case MaskingStrategy.CHARACTER_MASK: return `CONCAT(SUBSTRING(${columnName}, 1, 1), '****')`; case MaskingStrategy.TRUNCATE: return `RIGHT(${columnName}, 4)`; case MaskingStrategy.NULLIFY: return 'NULL'; case MaskingStrategy.HASH: return `LEFT(SHA2(${columnName}, 256), 16)`; default: return `'[MASKED]'`; } }}Consistent masking patterns can leak information. If SSN '--1234' always appears in the same records as 'John S**', attackers can correlate. Use techniques like row-level security in combination with DDM, and monitor for suspicious query patterns.
Tokenization replaces sensitive data with non-sensitive tokens while storing the actual values in a secure token vault. Unlike masking (which is lossy), tokenization is reversible—the original value can be retrieved by authorized systems. This makes it ideal for scenarios where you need to process sensitive data later but don't want to store it in your primary systems.
Payment Card Industry (PCI) Tokenization Example:
When you save a credit card for future purchases, the merchant typically doesn't store your actual card number. Instead:
tok_7f8d3a2b)Result: Merchant's systems never contain card data, dramatically reducing PCI DSS compliance scope.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241
// Enterprise Tokenization Vault Service// Securely stores sensitive data and issues non-sensitive tokens interface TokenRecord { token: string; vault: string; // Logical vault (e.g., 'payment_cards', 'ssn') encryptedValue: Buffer; // AES-256-GCM encrypted original keyVersion: number; // For key rotation metadata: TokenMetadata; createdAt: Date; expiresAt?: Date; accessCount: number; lastAccessedAt?: Date;} interface TokenMetadata { format: string; // Original format for validation lastFour?: string; // For display (cards, SSN) dataType: string; ownerId: string; // Data subject ID createdBy: string; // System that created token} interface TokenizationRequest { value: string; vault: string; ownerId: string; expirationDays?: number; preserveFormat?: boolean;} class TokenizationVaultService { private db: TokenDatabase; private keyManager: KeyManagementService; private auditLogger: VaultAuditLogger; private formatPreserver: FormatPreservingEncryptor; async tokenize(request: TokenizationRequest): Promise<TokenResult> { // Validate input format matches expected pattern this.validateFormat(request.value, request.vault); // Check for existing token (idempotency) const existingToken = await this.findExistingToken( request.value, request.vault, request.ownerId ); if (existingToken) { return { token: existingToken.token, isNew: false, metadata: existingToken.metadata, }; } // Generate token const token = request.preserveFormat ? await this.generateFormatPreservingToken(request.value, request.vault) : this.generateRandomToken(request.vault); // Encrypt the sensitive value const currentKey = await this.keyManager.getCurrentKey('tokenization'); const encryptedValue = await this.encrypt(request.value, currentKey); // Extract safe metadata (last 4 digits, etc.) const metadata = this.extractSafeMetadata(request.value, request.vault); // Store token record const record: TokenRecord = { token, vault: request.vault, encryptedValue, keyVersion: currentKey.version, metadata: { ...metadata, ownerId: request.ownerId, createdBy: this.getCurrentSystemId(), }, createdAt: new Date(), expiresAt: request.expirationDays ? new Date(Date.now() + request.expirationDays * 24 * 60 * 60 * 1000) : undefined, accessCount: 0, }; await this.db.store(record); // Audit log await this.auditLogger.logTokenization({ token, vault: request.vault, ownerId: request.ownerId, timestamp: new Date(), }); return { token, isNew: true, metadata: record.metadata, }; } async detokenize( token: string, context: DetokenizationContext ): Promise<string> { // Fetch token record const record = await this.db.getByToken(token); if (!record) { throw new TokenNotFoundError(`Token ${token} not found`); } // Check expiration if (record.expiresAt && record.expiresAt < new Date()) { throw new TokenExpiredError(`Token ${token} has expired`); } // Authorize the detokenization request await this.authorizeDetokenization(record, context); // Retrieve encryption key (may need old version for rotation) const key = await this.keyManager.getKey( 'tokenization', record.keyVersion ); // Decrypt the value const value = await this.decrypt(record.encryptedValue, key); // Update access metadata await this.db.updateAccessMetadata(token, { accessCount: record.accessCount + 1, lastAccessedAt: new Date(), }); // Audit log await this.auditLogger.logDetokenization({ token, vault: record.vault, accessedBy: context.requesterId, purpose: context.purpose, timestamp: new Date(), }); return value; } async deleteToken(token: string, context: DeletionContext): Promise<void> { const record = await this.db.getByToken(token); if (!record) { return; // Idempotent deletion } // Verify deletion authorization await this.authorizeDeletion(record, context); // Delete the record (actual value is gone) await this.db.delete(token); // Audit log await this.auditLogger.logTokenDeletion({ token, vault: record.vault, ownerId: record.metadata.ownerId, deletedBy: context.requesterId, reason: context.reason, timestamp: new Date(), }); } // Batch operations for efficient processing async batchTokenize( requests: TokenizationRequest[] ): Promise<Map<string, TokenResult>> { const results = new Map<string, TokenResult>(); // Use transaction for atomicity await this.db.transaction(async (tx) => { for (const request of requests) { const result = await this.tokenize(request); results.set(request.value, result); } }); return results; } // For data subject rights - delete all tokens for a user async deleteAllForOwner(ownerId: string, reason: string): Promise<number> { const tokens = await this.db.getByOwner(ownerId); for (const token of tokens) { await this.deleteToken(token.token, { requesterId: 'system', reason: reason, }); } return tokens.length; } private generateRandomToken(vault: string): string { const prefix = vault.substring(0, 3).toLowerCase(); const random = crypto.randomBytes(16).toString('hex'); return `${prefix}_${random}`; } private async generateFormatPreservingToken( value: string, vault: string ): Promise<string> { // Format-preserving encryption keeps the token the same format as original // Useful for systems expecting specific formats (e.g., 16-digit card number) return this.formatPreserver.encrypt(value, vault); } private extractSafeMetadata(value: string, vault: string): Partial<TokenMetadata> { switch (vault) { case 'payment_cards': return { format: 'card', lastFour: value.slice(-4), dataType: 'credit_card', }; case 'ssn': return { format: 'ssn', lastFour: value.replace(/-/g, '').slice(-4), dataType: 'ssn', }; default: return { format: 'general', dataType: vault, }; } }}The token vault becomes the highest-value target in your architecture—it contains all the sensitive data. Apply maximum security controls: hardware security modules (HSMs) for key management, strict access controls, comprehensive audit logging, and network isolation. Consider cloud-managed vault services (AWS Secrets Manager, HashiCorp Vault) for their hardened security posture.
Format-Preserving Encryption (FPE) is a specialized encryption technique that produces ciphertext in the same format as the plaintext. A 16-digit credit card number encrypts to another 16-digit number, a 9-digit SSN encrypts to another 9-digit number. This enables encryption of data in systems with fixed-format field constraints without schema changes.
FPE Use Cases:
| Algorithm | Standard | Alphabet Support | Security Level |
|---|---|---|---|
| FF1 | NIST SP 800-38G | Any character set | AES-based, well-analyzed |
| FF3-1 | NIST SP 800-38G Rev 1 | Any character set | AES-based, corrected from FF3 |
| FE1 | Academic | Numeric only | Feistel-based, less common |
| BPS | Proprietary | Alphanumeric | Vendor-specific |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163
// Format-Preserving Encryption Implementation// Uses FF1 algorithm per NIST SP 800-38G interface FPEConfig { key: Buffer; // 128, 192, or 256 bits tweak: Buffer; // Context-specific tweak alphabet: string; // Valid characters (e.g., '0123456789')} class FormatPreservingEncryptor { private ff1: FF1Algorithm; constructor() { this.ff1 = new FF1Algorithm(); } async encryptPreservingFormat( plaintext: string, config: FPEConfig ): Promise<string> { // Validate input contains only alphabet characters this.validateAlphabet(plaintext, config.alphabet); // Extract format-relevant characters and their positions const { extracted, template } = this.extractCharacters( plaintext, config.alphabet ); if (extracted.length === 0) { return plaintext; // No encryptable characters } // Convert to numeric representation const numericPlaintext = this.toNumeric(extracted, config.alphabet); // Apply FF1 encryption const numericCiphertext = await this.ff1.encrypt( numericPlaintext, config.key, config.tweak, config.alphabet.length ); // Convert back to original alphabet const encryptedChars = this.fromNumeric( numericCiphertext, config.alphabet ); // Reinsert into original format template return this.applyTemplate(encryptedChars, template); } async decryptPreservingFormat( ciphertext: string, config: FPEConfig ): Promise<string> { const { extracted, template } = this.extractCharacters( ciphertext, config.alphabet ); const numericCiphertext = this.toNumeric(extracted, config.alphabet); const numericPlaintext = await this.ff1.decrypt( numericCiphertext, config.key, config.tweak, config.alphabet.length ); const decryptedChars = this.fromNumeric( numericPlaintext, config.alphabet ); return this.applyTemplate(decryptedChars, template); } // Extract alphabet characters while preserving format template private extractCharacters( input: string, alphabet: string ): { extracted: string; template: (number | string)[] } { const extracted: string[] = []; const template: (number | string)[] = []; let extractedIndex = 0; for (const char of input) { if (alphabet.includes(char)) { extracted.push(char); template.push(extractedIndex++); } else { template.push(char); // Non-alphabet char preserved in template } } return { extracted: extracted.join(''), template, }; } // Apply encrypted characters back into format template private applyTemplate( chars: string, template: (number | string)[] ): string { let result = ''; for (const item of template) { if (typeof item === 'number') { result += chars[item]; } else { result += item; // Preserved non-alphabet character } } return result; } // Example: Encrypt credit card preserving format encryptCreditCard(cardNumber: string, key: Buffer): Promise<string> { // Remove spaces/dashes for encryption, preserve positions const config: FPEConfig = { key, tweak: Buffer.from('credit_card'), // Domain-specific tweak alphabet: '0123456789', }; return this.encryptPreservingFormat(cardNumber, config); } // Example: Encrypt SSN preserving format encryptSSN(ssn: string, key: Buffer): Promise<string> { const config: FPEConfig = { key, tweak: Buffer.from('social_security'), alphabet: '0123456789', }; // Input: 123-45-6789 // Output: xxx-xx-xxxx (different digits, same format) return this.encryptPreservingFormat(ssn, config); }} // Usage exampleasync function example() { const fpe = new FormatPreservingEncryptor(); const key = crypto.randomBytes(32); // 256-bit key // Credit card encryption const card = '4532-1234-5678-9012'; const encryptedCard = await fpe.encryptCreditCard(card, key); console.log(encryptedCard); // '8291-7456-3012-4567' (same format) // SSN encryption const ssn = '123-45-6789'; const encryptedSSN = await fpe.encryptSSN(ssn, key); console.log(encryptedSSN); // '847-92-1356' (same format)}FPE provides weaker security than AES-GCM for the same key size because it operates on smaller data blocks. It's suitable for format constraints but shouldn't be the primary encryption for highly sensitive data in transit. Use standard encryption where format preservation isn't required.
Development and testing environments need realistic data but must never contain actual production sensitive data. Test data generation creates synthetic data that mimics production characteristics—realistic names, valid email formats, properly formatted credit cards—without using real information.
Test Data Generation Strategies:
| Approach | Description | Pros | Cons |
|---|---|---|---|
| Synthetic Generation | Create entirely fake data from scratch | No production data exposure; unlimited volume | May miss edge cases; requires careful schema understanding |
| Production Masking | Copy production, apply static masking | Preserves realistic distributions | Risk of incomplete masking; needs production copy |
| Subset + Mask | Extract subset of production, mask sensitive fields | Realistic relationships; manageable size | Referential integrity challenges |
| Schema-Based Generation | Generate from database schema with constraints | Respects constraints; deterministic | May lack realistic distributions |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217
// Realistic Test Data Generation Service// Generates synthetic data that mimics production characteristics import { faker } from '@faker-js/faker'; interface DataGenerationSchema { tableName: string; count: number; fields: FieldGenerator[]; relationships?: RelationshipSpec[];} interface FieldGenerator { name: string; type: GeneratorType; options?: GeneratorOptions;} class TestDataGeneratorService { private generatedData: Map<string, any[]> = new Map(); private relationshipResolver: RelationshipResolver; constructor() { this.relationshipResolver = new RelationshipResolver(); } async generateDataset( schemas: DataGenerationSchema[] ): Promise<Map<string, any[]>> { // Sort by dependencies to ensure foreign keys are satisfied const sortedSchemas = this.topologicalSort(schemas); for (const schema of sortedSchemas) { const records = await this.generateTable(schema); this.generatedData.set(schema.tableName, records); } return this.generatedData; } private async generateTable(schema: DataGenerationSchema): Promise<any[]> { const records: any[] = []; for (let i = 0; i < schema.count; i++) { const record: any = {}; for (const field of schema.fields) { record[field.name] = await this.generateField(field, i, schema); } // Resolve foreign key relationships if (schema.relationships) { for (const rel of schema.relationships) { record[rel.foreignKey] = this.relationshipResolver.resolve( rel, this.generatedData, i ); } } records.push(record); } return records; } private async generateField( field: FieldGenerator, index: number, schema: DataGenerationSchema ): Promise<any> { const options = field.options || {}; switch (field.type) { // Personal Information (all synthetic) case 'firstName': return faker.person.firstName(); case 'lastName': return faker.person.lastName(); case 'fullName': return faker.person.fullName(); case 'email': // Generate unique, testable emails return `user${index}@testdomain-${schema.tableName}.example.com`; case 'phone': // Use clearly fake patterns return faker.phone.number('555-###-####'); // Financial Data (valid format, fake values) case 'creditCard': // Generate Luhn-valid test card numbers return this.generateTestCreditCard(options.cardType); case 'ssn': // Use reserved test SSN ranges (900-999 prefix) return `9${faker.string.numeric(2)}-${faker.string.numeric(2)}-${faker.string.numeric(4)}`; case 'bankAccount': return faker.finance.accountNumber(); // Addresses (real-looking but fictional) case 'address': return { street: faker.location.streetAddress(), city: faker.location.city(), state: faker.location.state(), zip: faker.location.zipCode(), country: 'Test Country', }; // Temporal Data case 'dateOfBirth': return faker.date.birthdate({ min: options.minAge || 18, max: options.maxAge || 80, mode: 'age', }); case 'timestamp': return faker.date.between({ from: options.from || new Date('2020-01-01'), to: options.to || new Date(), }); // Business Data case 'companyName': return faker.company.name(); case 'jobTitle': return faker.person.jobTitle(); case 'amount': return faker.number.float({ min: options.min || 0, max: options.max || 10000, multipleOf: 0.01, }); // Sequential/Reference Fields case 'uuid': return faker.string.uuid(); case 'sequence': return index + (options.startAt || 1); // Custom Patterns case 'pattern': return faker.helpers.replaceSymbols(options.pattern); case 'enum': return faker.helpers.arrayElement(options.values); case 'weighted': return this.weightedRandom(options.weights); default: return faker.lorem.word(); } } private generateTestCreditCard(type?: string): string { // Test card numbers that pass Luhn but are clearly fake const testCards: Record<string, string> = { visa: '4111111111111111', mastercard: '5500000000000004', amex: '340000000000009', discover: '6011000000000004', }; return testCards[type || 'visa']; } private weightedRandom(weights: Record<string, number>): string { const entries = Object.entries(weights); const total = entries.reduce((sum, [_, w]) => sum + w, 0); let random = Math.random() * total; for (const [value, weight] of entries) { random -= weight; if (random <= 0) return value; } return entries[0][0]; } // Generate consistent test datasets for reproducible tests generateDeterministic( schemas: DataGenerationSchema[], seed: number ): Promise<Map<string, any[]>> { faker.seed(seed); return this.generateDataset(schemas); }} // Usage Exampleconst schema: DataGenerationSchema = { tableName: 'customers', count: 1000, fields: [ { name: 'id', type: 'uuid' }, { name: 'firstName', type: 'firstName' }, { name: 'lastName', type: 'lastName' }, { name: 'email', type: 'email' }, { name: 'phone', type: 'phone' }, { name: 'ssn', type: 'ssn' }, { name: 'createdAt', type: 'timestamp' }, { name: 'tier', type: 'weighted', options: { weights: { 'free': 0.7, 'premium': 0.25, 'enterprise': 0.05 } }}, ],};Make test data obviously fake: emails at @example.com, phone numbers in 555-xxx-xxxx range (reserved for fiction), SSNs starting with 9xx (reserved for testing). This prevents accidental use of test data in production and makes accidental data exposure obvious.
Successfully implementing masking and tokenization requires careful architectural decisions. Here are patterns learned from enterprise deployments.
Key Implementation Considerations:
Anti-Patterns to Avoid:
| Anti-Pattern | Problem | Better Approach |
|---|---|---|
| Tokenizing encrypted data | Double transformation adds complexity without security benefit | Tokenize OR encrypt, not both |
| Storing tokens and values together | Compromise exposes both | Token vault must be separate system |
| Predictable token generation | Enables token guessing attacks | Use cryptographically secure random generation |
| Skipping detokenization audit | Cannot detect token abuse | Log every detokenization with context |
| Infinite token validity | Stale tokens accumulate | Set expiration, implement token refresh |
| Trusting token metadata | Metadata can be spoofed | Validate token ownership on every access |
Tokens protect data at rest and in transit, but detokenization must still be authorized. Ensure robust access control for detokenization APIs, not just the vault's internal storage. A compromised service with detokenization access can extract all values.
Data masking and tokenization are essential techniques for protecting sensitive data while maintaining system functionality. Understanding when to apply each technique—and how to implement them correctly—is crucial for both security and compliance.
Key Takeaways:
Next Steps:
With data transformation techniques mastered, we'll explore how long to keep data. The next page covers Data Retention Policies—the frameworks and implementations for defining how long data is kept, when it's archived, and when it must be permanently deleted.
You now understand the differences between masking, tokenization, and encryption, and when to apply each. You can implement data masking strategies, design tokenization architectures, use format-preserving encryption, and generate safe test data. Next, we'll explore data retention policies and their implementation.