System Design (HLD)Data Protection

Data Protection

LevelAdvanced

Duration90 mins

TopicData Protection

3 / 5

Data Masking and Tokenization

Protecting Data While Maintaining Utility

Systems frequently need to use sensitive data without exposing its actual values. Developers need production-like test data without real Social Security numbers. Customer support agents need to verify accounts without seeing full credit card numbers. Analytics teams need to track user behavior without accessing personal identifiers. Data masking and tokenization are the techniques that make this possible.

These aren't just compliance checkboxes—they're fundamental tools for implementing least privilege at the data level. By replacing or obscuring sensitive values, you minimize exposure surface area while maintaining the data's utility for legitimate purposes.

What You Will Learn

By the end of this page, you will understand the differences between masking, tokenization, and encryption; learn when to apply each technique; master implementation patterns for distributed systems; and design architectures that balance security with operational requirements.

Masking, Tokenization, and Encryption: Key Differences

Before diving into implementation, it's crucial to understand what distinguishes these protection techniques. Each serves different purposes and has different security properties.

Definitions:

Data Protection Techniques

•Data Masking — Irreversibly obscuring portions of data while retaining format and partial information. Original value cannot be recovered. Example: ***-**-1234 for SSN.
•Tokenization — Replacing sensitive data with non-sensitive surrogate values (tokens). Original data stored securely elsewhere and can be retrieved with proper authorization. Example: tok_7d8f3a2b for credit card.
•Encryption — Mathematically transforming data using cryptographic algorithms. Reversible with the correct key. Ciphertext is typically longer and unreadable. Example: aGVsbG8gd29ybGQ=.

Technique Comparison
Property	Masking	Tokenization	Encryption
Reversibility	No - one-way transformation	Yes - with token vault access	Yes - with decryption key
Format Preservation	Yes - maintains structure	Configurable - can preserve format	No - ciphertext is different format
Original Data Location	Destroyed	Token vault (separate system)	Same location (in encrypted form)
Key/Vault Required	No	Yes - vault access needed	Yes - key required
Performance Impact	Minimal	Vault lookup latency	CPU for encrypt/decrypt
Compliance Scope Reduction	Yes - masked data not regulated	Yes - tokens not regulated	No - encrypted data still regulated
Use in Queries/Indexes	Limited - partial matching only	Yes - token can be indexed	No - must decrypt first
Analytics Utility	Limited - lossy	Preserves relationships	Requires decryption

When to Use Each Technique:

Scenario	Recommended Technique	Rationale
Non-production test data	Masking	No need for reversibility; reduces compliance scope
Customer support display	Masking	Verification without full access
Payment processing	Tokenization	PCI DSS scope reduction; need to complete transactions
Cross-system data sharing	Tokenization	Maintain referential integrity without exposing data
Data at rest protection	Encryption	Need full access for authorized users
Database field storage	Encryption or Tokenization	Depends on access patterns and compliance needs

Encryption Doesn't Reduce Compliance Scope

A critical distinction: encrypted data is still considered sensitive under most regulations. If you encrypt a credit card number and store it, you're still 'storing' cardholder data for PCI DSS purposes. Tokenization, by contrast, removes the sensitive data from your system entirely (it's in the vault), reducing your compliance scope.

Data Masking: Techniques and Patterns

Data masking transforms sensitive values into non-sensitive representations while preserving data utility for specific purposes. Different masking techniques trade off between security (how much is hidden) and utility (what can still be done with the data).

Masking Technique Taxonomy:

Data Masking Techniques
Technique	Description	Example	Best For
Character Substitution	Replace characters with fixed symbols	John → J***	Display masking, partial reveal
Nulling/Deletion	Remove value entirely	SSN: null	Data not needed for purpose
Shuffling	Randomize values within dataset	Swap names between records	Statistical analysis preservation
Variance	Add random noise to numeric values	Salary: $50,000 → $48,732	Analytics on approximations
Substitution	Replace with realistic fake values	John Smith → James Wilson	Realistic test data
Date Aging	Shift dates by random/fixed offset	1980-03-15 → 1982-07-23	Preserve date relationships
Truncation	Remove portion of value	4532-xxxx-xxxx-1234 → 1234	Last-4 identification
Hashing	One-way cryptographic hash	john@email.com → a3f2c8d...	De-identification with consistency

data-masking-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
// Enterprise Data Masking Service
// Applies configurable masking strategies to sensitive data
 
interface MaskingRule {
  fieldPattern: string;  // Field name pattern (regex)
  dataType: DataType;
  strategy: MaskingStrategy;
  config: MaskingConfig;
}
 
enum MaskingStrategy {
  CHARACTER_MASK = 'character_mask',
  SUBSTITUTION = 'substitution',
  HASH = 'hash',
  SHUFFLE = 'shuffle',
  VARIANCE = 'variance',
  TRUNCATE = 'truncate',
  NULLIFY = 'nullify',
  FORMAT_PRESERVING = 'format_preserving',
}
 
interface MaskingConfig {
  maskChar?: string;           // Character for masking (default: *)
  revealFirst?: number;        // Characters to show at start
  revealLast?: number;         // Characters to show at end
  hashAlgorithm?: string;      // For hash strategy
  variancePercent?: number;    // For numeric variance
  substitutionSource?: string; // Data source for substitutions
  preserveFormat?: boolean;    // Maintain character types (letter→letter)
}
 
class DataMaskingService {
  private rules: Map<string, MaskingRule[]>;
  private substitutionData: SubstitutionDataProvider;
  private deterministicSeed: string;
 
  constructor(config: MaskingServiceConfig) {
    this.rules = this.loadRules(config.rulesPath);
    this.substitutionData = new SubstitutionDataProvider(config.substitutionDb);
    this.deterministicSeed = config.seed || crypto.randomBytes(32).toString('hex');
  }
 
  async maskRecord<T extends object>(
    record: T,
    context: MaskingContext
  ): Promise<T> {
    const masked = { ...record } as any;
    
    for (const [fieldPath, value] of this.traverseObject(record)) {
      const applicableRules = this.findApplicableRules(fieldPath, value);
      
      if (applicableRules.length > 0) {
        const rule = applicableRules[0]; // First matching rule
        const maskedValue = await this.applyMasking(
          value,
          rule,
          context,
          fieldPath
        );
        this.setFieldValue(masked, fieldPath, maskedValue);
      }
    }
    
    return masked;
  }
 
  private async applyMasking(
    value: any,
    rule: MaskingRule,
    context: MaskingContext,
    fieldPath: string
  ): Promise<any> {
    if (value === null || value === undefined) {
      return value;
    }
    
    switch (rule.strategy) {
      case MaskingStrategy.CHARACTER_MASK:
        return this.characterMask(String(value), rule.config);
        
      case MaskingStrategy.SUBSTITUTION:
        return this.substitute(value, rule, context);
        
      case MaskingStrategy.HASH:
        return this.hashMask(String(value), rule.config, fieldPath);
        
      case MaskingStrategy.VARIANCE:
        return this.applyVariance(Number(value), rule.config);
        
      case MaskingStrategy.TRUNCATE:
        return this.truncate(String(value), rule.config);
        
      case MaskingStrategy.NULLIFY:
        return null;
        
      case MaskingStrategy.FORMAT_PRESERVING:
        return this.formatPreservingMask(String(value), rule.config);
        
      case MaskingStrategy.SHUFFLE:
        // Shuffle is applied at dataset level, not individual record
        throw new Error('Shuffle strategy requires dataset-level masking');
        
      default:
        return this.characterMask(String(value), { maskChar: '*' });
    }
  }
 
  private characterMask(value: string, config: MaskingConfig): string {
    const maskChar = config.maskChar || '*';
    const revealFirst = config.revealFirst || 0;
    const revealLast = config.revealLast || 0;
    
    if (value.length <= revealFirst + revealLast) {
      return maskChar.repeat(value.length);
    }
    
    const start = value.substring(0, revealFirst);
    const end = value.substring(value.length - revealLast);
    const middleLength = value.length - revealFirst - revealLast;
    const masked = maskChar.repeat(middleLength);
    
    return start + masked + end;
  }
 
  private async substitute(
    value: any,
    rule: MaskingRule,
    context: MaskingContext
  ): Promise<any> {
    // Deterministic substitution for consistency across records
    const seed = this.generateDeterministicSeed(value, context.recordId);
    
    switch (rule.dataType) {
      case DataType.NAME:
        return this.substitutionData.getRandomName(seed);
      case DataType.EMAIL:
        return this.substitutionData.getRandomEmail(seed);
      case DataType.PHONE:
        return this.substitutionData.getRandomPhone(seed);
      case DataType.ADDRESS:
        return this.substitutionData.getRandomAddress(seed);
      case DataType.SSN:
        return this.substitutionData.getRandomSSN(seed);
      default:
        return value;
    }
  }
 
  private hashMask(
    value: string,
    config: MaskingConfig,
    fieldPath: string
  ): string {
    const algorithm = config.hashAlgorithm || 'sha256';
    // Include field path and seed to prevent cross-field rainbow attacks
    const saltedValue = `${this.deterministicSeed}:${fieldPath}:${value}`;
    return crypto
      .createHash(algorithm)
      .update(saltedValue)
      .digest('hex')
      .substring(0, 16); // Truncate for readability
  }
 
  private formatPreservingMask(value: string, config: MaskingConfig): string {
    // Replace each character with same type (letter→letter, digit→digit)
    let result = '';
    const revealFirst = config.revealFirst || 0;
    const revealLast = config.revealLast || 0;
    
    for (let i = 0; i < value.length; i++) {
      if (i < revealFirst || i >= value.length - revealLast) {
        result += value[i];
      } else {
        result += this.replacePreservingFormat(value[i]);
      }
    }
    
    return result;
  }
 
  private replacePreservingFormat(char: string): string {
    if (/[a-z]/.test(char)) return 'x';
    if (/[A-Z]/.test(char)) return 'X';
    if (/[0-9]/.test(char)) return '0';
    return char; // Preserve special characters, spaces, etc.
  }
 
  private applyVariance(value: number, config: MaskingConfig): number {
    const variancePercent = config.variancePercent || 10;
    const variance = value * (variancePercent / 100);
    const offset = (Math.random() * 2 - 1) * variance;
    return Math.round(value + offset);
  }
 
  // Dataset-level shuffling for referential consistency
  async shuffleDataset(
    records: any[],
    fieldsToShuffle: string[]
  ): Promise<any[]> {
    const shuffled = records.map(r => ({ ...r }));
    
    for (const field of fieldsToShuffle) {
      // Extract all values for this field
      const values = shuffled.map(r => this.getFieldValue(r, field));
      
      // Shuffle the values array
      for (let i = values.length - 1; i > 0; i--) {
        const j = Math.floor(Math.random() * (i + 1));
        [values[i], values[j]] = [values[j], values[i]];
      }
      
      // Apply shuffled values back
      shuffled.forEach((record, index) => {
        this.setFieldValue(record, field, values[index]);
      });
    }
    
    return shuffled;
  }
}

Deterministic vs Random Masking

Deterministic masking (same input always produces same output) preserves referential integrity—the same email always masks to the same value. Random masking provides better security but breaks foreign key relationships. Choose based on whether you need cross-record consistency.

Dynamic Data Masking

Dynamic Data Masking (DDM) applies masking at query time based on the requesting user's authorization level. Unlike static masking (which transforms stored data), DDM stores data in original form and masks on read. This enables role-based data access where privileged users see full values while others see masked versions.

DDM Architecture:

Dynamic Masking Implementation Layers

•Database-Level DDM — Built into database (SQL Server, Azure SQL, Oracle). Transparent to applications but limited customization.
•Proxy-Level DDM — Transparent proxy between application and database intercepts and transforms results. No application changes required.
•Application-Level DDM — Data access layer applies masking based on user context. Most flexible but requires consistent implementation.
•API Gateway DDM — Gateway transforms response payloads based on caller identity. Centralizes masking logic.

DDM Advantages

•Original data preserved for authorized access
•Role-based visibility without data duplication
•No data transformation jobs needed
•Real-time based on current permissions
•Audit trail of actual access vs masked access

DDM Limitations

•Original data still stored (still in compliance scope)
•Performance overhead on every query
•Doesn't protect against privileged access abuse
•Can be bypassed with direct database access
•Inference attacks from masked patterns possible

dynamic-masking-middleware.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
// Application-Level Dynamic Data Masking
// Masks query results based on caller authorization
 
interface MaskingPolicy {
  fieldPath: string;
  defaultMask: MaskingStrategy;
  exemptRoles: string[];      // Roles that see unmasked data
  exemptPurposes: string[];   // Approved purposes for full access
}
 
interface DataAccessContext {
  userId: string;
  roles: string[];
  accessPurpose: string;
  clientType: 'internal' | 'external' | 'api';
}
 
class DynamicMaskingMiddleware {
  private policies: Map<string, MaskingPolicy>;
  private maskingService: DataMaskingService;
  private auditLogger: DataAccessAuditLogger;
 
  async processQueryResult<T>(
    results: T[],
    entityType: string,
    context: DataAccessContext
  ): Promise<T[]> {
    const entityPolicies = this.getPoliciesForEntity(entityType);
    
    if (entityPolicies.length === 0) {
      return results; // No masking policies for this entity
    }
    
    const maskedResults: T[] = [];
    const accessLog: FieldAccessLog[] = [];
    
    for (const record of results) {
      const maskedRecord = { ...record } as any;
      
      for (const policy of entityPolicies) {
        const fieldValue = this.getFieldValue(record, policy.fieldPath);
        
        if (fieldValue === undefined || fieldValue === null) {
          continue;
        }
        
        // Check if user is exempt from masking
        const isExempt = this.checkExemption(policy, context);
        
        if (!isExempt) {
          // Apply masking
          const maskedValue = await this.maskingService.maskValue(
            fieldValue,
            policy.defaultMask
          );
          this.setFieldValue(maskedRecord, policy.fieldPath, maskedValue);
          
          accessLog.push({
            field: policy.fieldPath,
            masked: true,
            reason: 'policy_enforcement',
          });
        } else {
          accessLog.push({
            field: policy.fieldPath,
            masked: false,
            reason: this.getExemptionReason(policy, context),
          });
        }
      }
      
      maskedResults.push(maskedRecord);
    }
    
    // Audit log the access
    await this.auditLogger.logDataAccess({
      userId: context.userId,
      entityType,
      recordCount: results.length,
      fieldAccess: accessLog,
      timestamp: new Date(),
      purpose: context.accessPurpose,
    });
    
    return maskedResults;
  }
 
  private checkExemption(
    policy: MaskingPolicy,
    context: DataAccessContext
  ): boolean {
    // Check role-based exemption
    if (policy.exemptRoles.some(role => context.roles.includes(role))) {
      return true;
    }
    
    // Check purpose-based exemption
    if (policy.exemptPurposes.includes(context.accessPurpose)) {
      return true;
    }
    
    return false;
  }
 
  // SQL View approach for database-level DDM
  generateMaskedView(
    tableName: string,
    policies: MaskingPolicy[],
    roleVariable: string = 'CURRENT_ROLE()'
  ): string {
    const selectClauses = [];
    const columns = this.getTableColumns(tableName);
    
    for (const column of columns) {
      const policy = policies.find(p => p.fieldPath === column.name);
      
      if (!policy) {
        selectClauses.push(column.name);
        continue;
      }
      
      // Generate CASE statement for role-based masking
      const exemptRolesStr = policy.exemptRoles
        .map(r => `'${r}'`)
        .join(', ');
      
      const maskExpression = this.generateSQLMaskExpression(
        column.name,
        column.type,
        policy.defaultMask
      );
      
      selectClauses.push(`
        CASE 
          WHEN ${roleVariable} IN (${exemptRolesStr}) THEN ${column.name}
          ELSE ${maskExpression}
        END AS ${column.name}
      `);
    }
    
    return `
      CREATE OR REPLACE VIEW ${tableName}_masked AS
      SELECT 
        ${selectClauses.join(',\n        ')}
      FROM ${tableName}
    `;
  }
 
  private generateSQLMaskExpression(
    columnName: string,
    columnType: string,
    strategy: MaskingStrategy
  ): string {
    switch (strategy) {
      case MaskingStrategy.CHARACTER_MASK:
        return `CONCAT(SUBSTRING(${columnName}, 1, 1), '****')`;
        
      case MaskingStrategy.TRUNCATE:
        return `RIGHT(${columnName}, 4)`;
        
      case MaskingStrategy.NULLIFY:
        return 'NULL';
        
      case MaskingStrategy.HASH:
        return `LEFT(SHA2(${columnName}, 256), 16)`;
        
      default:
        return `'[MASKED]'`;
    }
  }
}

Defense Against Inference Attacks

Consistent masking patterns can leak information. If SSN '--1234' always appears in the same records as 'John S**', attackers can correlate. Use techniques like row-level security in combination with DDM, and monitor for suspicious query patterns.

Tokenization: Architecture and Implementation

Tokenization replaces sensitive data with non-sensitive tokens while storing the actual values in a secure token vault. Unlike masking (which is lossy), tokenization is reversible—the original value can be retrieved by authorized systems. This makes it ideal for scenarios where you need to process sensitive data later but don't want to store it in your primary systems.

Payment Card Industry (PCI) Tokenization Example:

When you save a credit card for future purchases, the merchant typically doesn't store your actual card number. Instead:

Card number is sent to a Payment Service Provider (tokenization vault)
Vault stores the card securely and returns a token (e.g., tok_7f8d3a2b)
Merchant stores only the token
For future charges, merchant sends token to PSP, which retrieves actual card for processing

Result: Merchant's systems never contain card data, dramatically reducing PCI DSS compliance scope.

tokenization-vault-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
// Enterprise Tokenization Vault Service
// Securely stores sensitive data and issues non-sensitive tokens
 
interface TokenRecord {
  token: string;
  vault: string;                // Logical vault (e.g., 'payment_cards', 'ssn')
  encryptedValue: Buffer;       // AES-256-GCM encrypted original
  keyVersion: number;           // For key rotation
  metadata: TokenMetadata;
  createdAt: Date;
  expiresAt?: Date;
  accessCount: number;
  lastAccessedAt?: Date;
}
 
interface TokenMetadata {
  format: string;               // Original format for validation
  lastFour?: string;            // For display (cards, SSN)
  dataType: string;
  ownerId: string;              // Data subject ID
  createdBy: string;            // System that created token
}
 
interface TokenizationRequest {
  value: string;
  vault: string;
  ownerId: string;
  expirationDays?: number;
  preserveFormat?: boolean;
}
 
class TokenizationVaultService {
  private db: TokenDatabase;
  private keyManager: KeyManagementService;
  private auditLogger: VaultAuditLogger;
  private formatPreserver: FormatPreservingEncryptor;
 
  async tokenize(request: TokenizationRequest): Promise<TokenResult> {
    // Validate input format matches expected pattern
    this.validateFormat(request.value, request.vault);
    
    // Check for existing token (idempotency)
    const existingToken = await this.findExistingToken(
      request.value,
      request.vault,
      request.ownerId
    );
    
    if (existingToken) {
      return {
        token: existingToken.token,
        isNew: false,
        metadata: existingToken.metadata,
      };
    }
    
    // Generate token
    const token = request.preserveFormat
      ? await this.generateFormatPreservingToken(request.value, request.vault)
      : this.generateRandomToken(request.vault);
    
    // Encrypt the sensitive value
    const currentKey = await this.keyManager.getCurrentKey('tokenization');
    const encryptedValue = await this.encrypt(request.value, currentKey);
    
    // Extract safe metadata (last 4 digits, etc.)
    const metadata = this.extractSafeMetadata(request.value, request.vault);
    
    // Store token record
    const record: TokenRecord = {
      token,
      vault: request.vault,
      encryptedValue,
      keyVersion: currentKey.version,
      metadata: {
        ...metadata,
        ownerId: request.ownerId,
        createdBy: this.getCurrentSystemId(),
      },
      createdAt: new Date(),
      expiresAt: request.expirationDays
        ? new Date(Date.now() + request.expirationDays * 24 * 60 * 60 * 1000)
        : undefined,
      accessCount: 0,
    };
    
    await this.db.store(record);
    
    // Audit log
    await this.auditLogger.logTokenization({
      token,
      vault: request.vault,
      ownerId: request.ownerId,
      timestamp: new Date(),
    });
    
    return {
      token,
      isNew: true,
      metadata: record.metadata,
    };
  }
 
  async detokenize(
    token: string,
    context: DetokenizationContext
  ): Promise<string> {
    // Fetch token record
    const record = await this.db.getByToken(token);
    
    if (!record) {
      throw new TokenNotFoundError(`Token ${token} not found`);
    }
    
    // Check expiration
    if (record.expiresAt && record.expiresAt < new Date()) {
      throw new TokenExpiredError(`Token ${token} has expired`);
    }
    
    // Authorize the detokenization request
    await this.authorizeDetokenization(record, context);
    
    // Retrieve encryption key (may need old version for rotation)
    const key = await this.keyManager.getKey(
      'tokenization',
      record.keyVersion
    );
    
    // Decrypt the value
    const value = await this.decrypt(record.encryptedValue, key);
    
    // Update access metadata
    await this.db.updateAccessMetadata(token, {
      accessCount: record.accessCount + 1,
      lastAccessedAt: new Date(),
    });
    
    // Audit log
    await this.auditLogger.logDetokenization({
      token,
      vault: record.vault,
      accessedBy: context.requesterId,
      purpose: context.purpose,
      timestamp: new Date(),
    });
    
    return value;
  }
 
  async deleteToken(token: string, context: DeletionContext): Promise<void> {
    const record = await this.db.getByToken(token);
    
    if (!record) {
      return; // Idempotent deletion
    }
    
    // Verify deletion authorization
    await this.authorizeDeletion(record, context);
    
    // Delete the record (actual value is gone)
    await this.db.delete(token);
    
    // Audit log
    await this.auditLogger.logTokenDeletion({
      token,
      vault: record.vault,
      ownerId: record.metadata.ownerId,
      deletedBy: context.requesterId,
      reason: context.reason,
      timestamp: new Date(),
    });
  }
 
  // Batch operations for efficient processing
  async batchTokenize(
    requests: TokenizationRequest[]
  ): Promise<Map<string, TokenResult>> {
    const results = new Map<string, TokenResult>();
    
    // Use transaction for atomicity
    await this.db.transaction(async (tx) => {
      for (const request of requests) {
        const result = await this.tokenize(request);
        results.set(request.value, result);
      }
    });
    
    return results;
  }
 
  // For data subject rights - delete all tokens for a user
  async deleteAllForOwner(ownerId: string, reason: string): Promise<number> {
    const tokens = await this.db.getByOwner(ownerId);
    
    for (const token of tokens) {
      await this.deleteToken(token.token, {
        requesterId: 'system',
        reason: reason,
      });
    }
    
    return tokens.length;
  }
 
  private generateRandomToken(vault: string): string {
    const prefix = vault.substring(0, 3).toLowerCase();
    const random = crypto.randomBytes(16).toString('hex');
    return `${prefix}_${random}`;
  }
 
  private async generateFormatPreservingToken(
    value: string,
    vault: string
  ): Promise<string> {
    // Format-preserving encryption keeps the token the same format as original
    // Useful for systems expecting specific formats (e.g., 16-digit card number)
    return this.formatPreserver.encrypt(value, vault);
  }
 
  private extractSafeMetadata(value: string, vault: string): Partial<TokenMetadata> {
    switch (vault) {
      case 'payment_cards':
        return {
          format: 'card',
          lastFour: value.slice(-4),
          dataType: 'credit_card',
        };
      case 'ssn':
        return {
          format: 'ssn',
          lastFour: value.replace(/-/g, '').slice(-4),
          dataType: 'ssn',
        };
      default:
        return {
          format: 'general',
          dataType: vault,
        };
    }
  }
}

Token Vault Security is Critical

The token vault becomes the highest-value target in your architecture—it contains all the sensitive data. Apply maximum security controls: hardware security modules (HSMs) for key management, strict access controls, comprehensive audit logging, and network isolation. Consider cloud-managed vault services (AWS Secrets Manager, HashiCorp Vault) for their hardened security posture.

Format-Preserving Encryption (FPE)

Format-Preserving Encryption (FPE) is a specialized encryption technique that produces ciphertext in the same format as the plaintext. A 16-digit credit card number encrypts to another 16-digit number, a 9-digit SSN encrypts to another 9-digit number. This enables encryption of data in systems with fixed-format field constraints without schema changes.

FPE Use Cases:

When to Use Format-Preserving Encryption

•Legacy System Integration — Existing systems expect specific data formats; FPE encrypts without breaking field constraints.
•Database Migration — Encrypt existing data in-place without schema modifications or application changes.
•Tokenization with Original Format — Generate tokens that look like original data for systems with format validation.
•Data Analytics — Preserve statistical properties of data (distributions) while protecting values.
•Cross-System Data Flow — Maintain consistent format across systems with different encryption capabilities.

FPE Algorithm Comparison
Algorithm	Standard	Alphabet Support	Security Level
FF1	NIST SP 800-38G	Any character set	AES-based, well-analyzed
FF3-1	NIST SP 800-38G Rev 1	Any character set	AES-based, corrected from FF3
FE1	Academic	Numeric only	Feistel-based, less common
BPS	Proprietary	Alphanumeric	Vendor-specific

format-preserving-encryption.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
// Format-Preserving Encryption Implementation
// Uses FF1 algorithm per NIST SP 800-38G
 
interface FPEConfig {
  key: Buffer;           // 128, 192, or 256 bits
  tweak: Buffer;         // Context-specific tweak
  alphabet: string;      // Valid characters (e.g., '0123456789')
}
 
class FormatPreservingEncryptor {
  private ff1: FF1Algorithm;
 
  constructor() {
    this.ff1 = new FF1Algorithm();
  }
 
  async encryptPreservingFormat(
    plaintext: string,
    config: FPEConfig
  ): Promise<string> {
    // Validate input contains only alphabet characters
    this.validateAlphabet(plaintext, config.alphabet);
    
    // Extract format-relevant characters and their positions
    const { extracted, template } = this.extractCharacters(
      plaintext,
      config.alphabet
    );
    
    if (extracted.length === 0) {
      return plaintext; // No encryptable characters
    }
    
    // Convert to numeric representation
    const numericPlaintext = this.toNumeric(extracted, config.alphabet);
    
    // Apply FF1 encryption
    const numericCiphertext = await this.ff1.encrypt(
      numericPlaintext,
      config.key,
      config.tweak,
      config.alphabet.length
    );
    
    // Convert back to original alphabet
    const encryptedChars = this.fromNumeric(
      numericCiphertext,
      config.alphabet
    );
    
    // Reinsert into original format template
    return this.applyTemplate(encryptedChars, template);
  }
 
  async decryptPreservingFormat(
    ciphertext: string,
    config: FPEConfig
  ): Promise<string> {
    const { extracted, template } = this.extractCharacters(
      ciphertext,
      config.alphabet
    );
    
    const numericCiphertext = this.toNumeric(extracted, config.alphabet);
    
    const numericPlaintext = await this.ff1.decrypt(
      numericCiphertext,
      config.key,
      config.tweak,
      config.alphabet.length
    );
    
    const decryptedChars = this.fromNumeric(
      numericPlaintext,
      config.alphabet
    );
    
    return this.applyTemplate(decryptedChars, template);
  }
 
  // Extract alphabet characters while preserving format template
  private extractCharacters(
    input: string,
    alphabet: string
  ): { extracted: string; template: (number | string)[] } {
    const extracted: string[] = [];
    const template: (number | string)[] = [];
    let extractedIndex = 0;
    
    for (const char of input) {
      if (alphabet.includes(char)) {
        extracted.push(char);
        template.push(extractedIndex++);
      } else {
        template.push(char); // Non-alphabet char preserved in template
      }
    }
    
    return {
      extracted: extracted.join(''),
      template,
    };
  }
 
  // Apply encrypted characters back into format template
  private applyTemplate(
    chars: string,
    template: (number | string)[]
  ): string {
    let result = '';
    
    for (const item of template) {
      if (typeof item === 'number') {
        result += chars[item];
      } else {
        result += item; // Preserved non-alphabet character
      }
    }
    
    return result;
  }
 
  // Example: Encrypt credit card preserving format
  encryptCreditCard(cardNumber: string, key: Buffer): Promise<string> {
    // Remove spaces/dashes for encryption, preserve positions
    const config: FPEConfig = {
      key,
      tweak: Buffer.from('credit_card'), // Domain-specific tweak
      alphabet: '0123456789',
    };
    
    return this.encryptPreservingFormat(cardNumber, config);
  }
 
  // Example: Encrypt SSN preserving format
  encryptSSN(ssn: string, key: Buffer): Promise<string> {
    const config: FPEConfig = {
      key,
      tweak: Buffer.from('social_security'),
      alphabet: '0123456789',
    };
    
    // Input: 123-45-6789
    // Output: xxx-xx-xxxx (different digits, same format)
    return this.encryptPreservingFormat(ssn, config);
  }
}
 
// Usage example
async function example() {
  const fpe = new FormatPreservingEncryptor();
  const key = crypto.randomBytes(32); // 256-bit key
  
  // Credit card encryption
  const card = '4532-1234-5678-9012';
  const encryptedCard = await fpe.encryptCreditCard(card, key);
  console.log(encryptedCard); // '8291-7456-3012-4567' (same format)
  
  // SSN encryption
  const ssn = '123-45-6789';
  const encryptedSSN = await fpe.encryptSSN(ssn, key);
  console.log(encryptedSSN); // '847-92-1356' (same format)
}

FPE Security Considerations

FPE provides weaker security than AES-GCM for the same key size because it operates on smaller data blocks. It's suitable for format constraints but shouldn't be the primary encryption for highly sensitive data in transit. Use standard encryption where format preservation isn't required.

Generating Safe Test Data

Development and testing environments need realistic data but must never contain actual production sensitive data. Test data generation creates synthetic data that mimics production characteristics—realistic names, valid email formats, properly formatted credit cards—without using real information.

Test Data Generation Strategies:

Test Data Generation Approaches
Approach	Description	Pros	Cons
Synthetic Generation	Create entirely fake data from scratch	No production data exposure; unlimited volume	May miss edge cases; requires careful schema understanding
Production Masking	Copy production, apply static masking	Preserves realistic distributions	Risk of incomplete masking; needs production copy
Subset + Mask	Extract subset of production, mask sensitive fields	Realistic relationships; manageable size	Referential integrity challenges
Schema-Based Generation	Generate from database schema with constraints	Respects constraints; deterministic	May lack realistic distributions

test-data-generator.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
// Realistic Test Data Generation Service
// Generates synthetic data that mimics production characteristics
 
import { faker } from '@faker-js/faker';
 
interface DataGenerationSchema {
  tableName: string;
  count: number;
  fields: FieldGenerator[];
  relationships?: RelationshipSpec[];
}
 
interface FieldGenerator {
  name: string;
  type: GeneratorType;
  options?: GeneratorOptions;
}
 
class TestDataGeneratorService {
  private generatedData: Map<string, any[]> = new Map();
  private relationshipResolver: RelationshipResolver;
 
  constructor() {
    this.relationshipResolver = new RelationshipResolver();
  }
 
  async generateDataset(
    schemas: DataGenerationSchema[]
  ): Promise<Map<string, any[]>> {
    // Sort by dependencies to ensure foreign keys are satisfied
    const sortedSchemas = this.topologicalSort(schemas);
    
    for (const schema of sortedSchemas) {
      const records = await this.generateTable(schema);
      this.generatedData.set(schema.tableName, records);
    }
    
    return this.generatedData;
  }
 
  private async generateTable(schema: DataGenerationSchema): Promise<any[]> {
    const records: any[] = [];
    
    for (let i = 0; i < schema.count; i++) {
      const record: any = {};
      
      for (const field of schema.fields) {
        record[field.name] = await this.generateField(field, i, schema);
      }
      
      // Resolve foreign key relationships
      if (schema.relationships) {
        for (const rel of schema.relationships) {
          record[rel.foreignKey] = this.relationshipResolver.resolve(
            rel,
            this.generatedData,
            i
          );
        }
      }
      
      records.push(record);
    }
    
    return records;
  }
 
  private async generateField(
    field: FieldGenerator,
    index: number,
    schema: DataGenerationSchema
  ): Promise<any> {
    const options = field.options || {};
    
    switch (field.type) {
      // Personal Information (all synthetic)
      case 'firstName':
        return faker.person.firstName();
        
      case 'lastName':
        return faker.person.lastName();
        
      case 'fullName':
        return faker.person.fullName();
        
      case 'email':
        // Generate unique, testable emails
        return `user${index}@testdomain-${schema.tableName}.example.com`;
        
      case 'phone':
        // Use clearly fake patterns
        return faker.phone.number('555-###-####');
        
      // Financial Data (valid format, fake values)
      case 'creditCard':
        // Generate Luhn-valid test card numbers
        return this.generateTestCreditCard(options.cardType);
        
      case 'ssn':
        // Use reserved test SSN ranges (900-999 prefix)
        return `9${faker.string.numeric(2)}-${faker.string.numeric(2)}-${faker.string.numeric(4)}`;
        
      case 'bankAccount':
        return faker.finance.accountNumber();
        
      // Addresses (real-looking but fictional)
      case 'address':
        return {
          street: faker.location.streetAddress(),
          city: faker.location.city(),
          state: faker.location.state(),
          zip: faker.location.zipCode(),
          country: 'Test Country',
        };
        
      // Temporal Data
      case 'dateOfBirth':
        return faker.date.birthdate({
          min: options.minAge || 18,
          max: options.maxAge || 80,
          mode: 'age',
        });
        
      case 'timestamp':
        return faker.date.between({
          from: options.from || new Date('2020-01-01'),
          to: options.to || new Date(),
        });
        
      // Business Data
      case 'companyName':
        return faker.company.name();
        
      case 'jobTitle':
        return faker.person.jobTitle();
        
      case 'amount':
        return faker.number.float({
          min: options.min || 0,
          max: options.max || 10000,
          multipleOf: 0.01,
        });
        
      // Sequential/Reference Fields
      case 'uuid':
        return faker.string.uuid();
        
      case 'sequence':
        return index + (options.startAt || 1);
        
      // Custom Patterns
      case 'pattern':
        return faker.helpers.replaceSymbols(options.pattern);
        
      case 'enum':
        return faker.helpers.arrayElement(options.values);
        
      case 'weighted':
        return this.weightedRandom(options.weights);
        
      default:
        return faker.lorem.word();
    }
  }
 
  private generateTestCreditCard(type?: string): string {
    // Test card numbers that pass Luhn but are clearly fake
    const testCards: Record<string, string> = {
      visa: '4111111111111111',
      mastercard: '5500000000000004',
      amex: '340000000000009',
      discover: '6011000000000004',
    };
    
    return testCards[type || 'visa'];
  }
 
  private weightedRandom(weights: Record<string, number>): string {
    const entries = Object.entries(weights);
    const total = entries.reduce((sum, [_, w]) => sum + w, 0);
    let random = Math.random() * total;
    
    for (const [value, weight] of entries) {
      random -= weight;
      if (random <= 0) return value;
    }
    
    return entries[0][0];
  }
 
  // Generate consistent test datasets for reproducible tests
  generateDeterministic(
    schemas: DataGenerationSchema[],
    seed: number
  ): Promise<Map<string, any[]>> {
    faker.seed(seed);
    return this.generateDataset(schemas);
  }
}
 
// Usage Example
const schema: DataGenerationSchema = {
  tableName: 'customers',
  count: 1000,
  fields: [
    { name: 'id', type: 'uuid' },
    { name: 'firstName', type: 'firstName' },
    { name: 'lastName', type: 'lastName' },
    { name: 'email', type: 'email' },
    { name: 'phone', type: 'phone' },
    { name: 'ssn', type: 'ssn' },
    { name: 'createdAt', type: 'timestamp' },
    { name: 'tier', type: 'weighted', options: {
      weights: { 'free': 0.7, 'premium': 0.25, 'enterprise': 0.05 }
    }},
  ],
};

Use Clearly Fake Values

Make test data obviously fake: emails at @example.com, phone numbers in 555-xxx-xxxx range (reserved for fiction), SSNs starting with 9xx (reserved for testing). This prevents accidental use of test data in production and makes accidental data exposure obvious.

Implementation Patterns and Best Practices

Successfully implementing masking and tokenization requires careful architectural decisions. Here are patterns learned from enterprise deployments.

Key Implementation Considerations:

Architecture Decision Points

•Centralized vs Distributed Vaults — Single vault simplifies security but creates latency and single point of failure. Regional vaults reduce latency but complicate key management.
•Synchronous vs Asynchronous Tokenization — Inline tokenization adds latency to write path. Batch tokenization is faster but creates window of unprotected data.
•Token Format Strategy — Random tokens are most secure. Format-preserving tokens ease integration but reveal data type information.
•Key Rotation Handling — Plan for re-encryption during key rotation. Versioned keys enable gradual migration without access disruption.
•Caching Strategy — Token cache improves performance but creates additional data copies. Cache tokens (not values) with strict TTLs.
•Disaster Recovery — Token vault is critical path. Plan for vault unavailability, including graceful degradation vs hard failure.

Anti-Patterns to Avoid:

Anti-Pattern	Problem	Better Approach
Tokenizing encrypted data	Double transformation adds complexity without security benefit	Tokenize OR encrypt, not both
Storing tokens and values together	Compromise exposes both	Token vault must be separate system
Predictable token generation	Enables token guessing attacks	Use cryptographically secure random generation
Skipping detokenization audit	Cannot detect token abuse	Log every detokenization with context
Infinite token validity	Stale tokens accumulate	Set expiration, implement token refresh
Trusting token metadata	Metadata can be spoofed	Validate token ownership on every access

Tokenization Doesn't Replace Access Control

Tokens protect data at rest and in transit, but detokenization must still be authorized. Ensure robust access control for detokenization APIs, not just the vault's internal storage. A compromised service with detokenization access can extract all values.

Summary: Mastering Data Transformation Techniques

Data masking and tokenization are essential techniques for protecting sensitive data while maintaining system functionality. Understanding when to apply each technique—and how to implement them correctly—is crucial for both security and compliance.

Key Takeaways:

Critical Principles

•Choose the right technique — Masking for irreversible de-identification, tokenization for reversible protection with vault lookup, encryption for cryptographic protection.
•Tokenization reduces compliance scope — Unlike encryption, tokenization removes sensitive data from your systems entirely, dramatically reducing regulatory burden.
•Dynamic masking enables role-based access — Show different users different views of the same data without data duplication.
•Format-preserving techniques ease legacy integration — FPE and format-preserving tokens work with systems that expect specific data formats.
•Generate safe test data — Synthetic data generation eliminates production data exposure in non-production environments.
•Secure the token vault — The vault becomes your highest-value target; apply maximum security controls.

Next Steps:

With data transformation techniques mastered, we'll explore how long to keep data. The next page covers Data Retention Policies—the frameworks and implementations for defining how long data is kept, when it's archived, and when it must be permanently deleted.

Page Complete

You now understand the differences between masking, tokenization, and encryption, and when to apply each. You can implement data masking strategies, design tokenization architectures, use format-preserving encryption, and generate safe test data. Next, we'll explore data retention policies and their implementation.

3 / 5

Loading learning content...

System Design (HLD)Data Protection

Data Protection

LevelAdvanced

Duration90 mins

TopicData Protection

3 / 5

Data Masking and Tokenization

Protecting Data While Maintaining Utility

What You Will Learn

Masking, Tokenization, and Encryption: Key Differences

Before diving into implementation, it's crucial to understand what distinguishes these protection techniques. Each serves different purposes and has different security properties.

Definitions:

Data Protection Techniques

•Data Masking — Irreversibly obscuring portions of data while retaining format and partial information. Original value cannot be recovered. Example: ***-**-1234 for SSN.
•Tokenization — Replacing sensitive data with non-sensitive surrogate values (tokens). Original data stored securely elsewhere and can be retrieved with proper authorization. Example: tok_7d8f3a2b for credit card.
•Encryption — Mathematically transforming data using cryptographic algorithms. Reversible with the correct key. Ciphertext is typically longer and unreadable. Example: aGVsbG8gd29ybGQ=.

Technique Comparison
Property	Masking	Tokenization	Encryption
Reversibility	No - one-way transformation	Yes - with token vault access	Yes - with decryption key
Format Preservation	Yes - maintains structure	Configurable - can preserve format	No - ciphertext is different format
Original Data Location	Destroyed	Token vault (separate system)	Same location (in encrypted form)
Key/Vault Required	No	Yes - vault access needed	Yes - key required
Performance Impact	Minimal	Vault lookup latency	CPU for encrypt/decrypt
Compliance Scope Reduction	Yes - masked data not regulated	Yes - tokens not regulated	No - encrypted data still regulated
Use in Queries/Indexes	Limited - partial matching only	Yes - token can be indexed	No - must decrypt first
Analytics Utility	Limited - lossy	Preserves relationships	Requires decryption

When to Use Each Technique:

Scenario	Recommended Technique	Rationale
Non-production test data	Masking	No need for reversibility; reduces compliance scope
Customer support display	Masking	Verification without full access
Payment processing	Tokenization	PCI DSS scope reduction; need to complete transactions
Cross-system data sharing	Tokenization	Maintain referential integrity without exposing data
Data at rest protection	Encryption	Need full access for authorized users
Database field storage	Encryption or Tokenization	Depends on access patterns and compliance needs

Encryption Doesn't Reduce Compliance Scope

Data Masking: Techniques and Patterns

Masking Technique Taxonomy:

Data Masking Techniques
Technique	Description	Example	Best For
Character Substitution	Replace characters with fixed symbols	John → J***	Display masking, partial reveal
Nulling/Deletion	Remove value entirely	SSN: null	Data not needed for purpose
Shuffling	Randomize values within dataset	Swap names between records	Statistical analysis preservation
Variance	Add random noise to numeric values	Salary: $50,000 → $48,732	Analytics on approximations
Substitution	Replace with realistic fake values	John Smith → James Wilson	Realistic test data
Date Aging	Shift dates by random/fixed offset	1980-03-15 → 1982-07-23	Preserve date relationships
Truncation	Remove portion of value	4532-xxxx-xxxx-1234 → 1234	Last-4 identification
Hashing	One-way cryptographic hash	john@email.com → a3f2c8d...	De-identification with consistency

data-masking-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
// Enterprise Data Masking Service
// Applies configurable masking strategies to sensitive data
 
interface MaskingRule {
  fieldPattern: string;  // Field name pattern (regex)
  dataType: DataType;
  strategy: MaskingStrategy;
  config: MaskingConfig;
}
 
enum MaskingStrategy {
  CHARACTER_MASK = 'character_mask',
  SUBSTITUTION = 'substitution',
  HASH = 'hash',
  SHUFFLE = 'shuffle',
  VARIANCE = 'variance',
  TRUNCATE = 'truncate',
  NULLIFY = 'nullify',
  FORMAT_PRESERVING = 'format_preserving',
}
 
interface MaskingConfig {
  maskChar?: string;           // Character for masking (default: *)
  revealFirst?: number;        // Characters to show at start
  revealLast?: number;         // Characters to show at end
  hashAlgorithm?: string;      // For hash strategy
  variancePercent?: number;    // For numeric variance
  substitutionSource?: string; // Data source for substitutions
  preserveFormat?: boolean;    // Maintain character types (letter→letter)
}
 
class DataMaskingService {
  private rules: Map<string, MaskingRule[]>;
  private substitutionData: SubstitutionDataProvider;
  private deterministicSeed: string;
 
  constructor(config: MaskingServiceConfig) {
    this.rules = this.loadRules(config.rulesPath);
    this.substitutionData = new SubstitutionDataProvider(config.substitutionDb);
    this.deterministicSeed = config.seed || crypto.randomBytes(32).toString('hex');
  }
 
  async maskRecord<T extends object>(
    record: T,
    context: MaskingContext
  ): Promise<T> {
    const masked = { ...record } as any;
    
    for (const [fieldPath, value] of this.traverseObject(record)) {
      const applicableRules = this.findApplicableRules(fieldPath, value);
      
      if (applicableRules.length > 0) {
        const rule = applicableRules[0]; // First matching rule
        const maskedValue = await this.applyMasking(
          value,
          rule,
          context,
          fieldPath
        );
        this.setFieldValue(masked, fieldPath, maskedValue);
      }
    }
    
    return masked;
  }
 
  private async applyMasking(
    value: any,
    rule: MaskingRule,
    context: MaskingContext,
    fieldPath: string
  ): Promise<any> {
    if (value === null || value === undefined) {
      return value;
    }
    
    switch (rule.strategy) {
      case MaskingStrategy.CHARACTER_MASK:
        return this.characterMask(String(value), rule.config);
        
      case MaskingStrategy.SUBSTITUTION:
        return this.substitute(value, rule, context);
        
      case MaskingStrategy.HASH:
        return this.hashMask(String(value), rule.config, fieldPath);
        
      case MaskingStrategy.VARIANCE:
        return this.applyVariance(Number(value), rule.config);
        
      case MaskingStrategy.TRUNCATE:
        return this.truncate(String(value), rule.config);
        
      case MaskingStrategy.NULLIFY:
        return null;
        
      case MaskingStrategy.FORMAT_PRESERVING:
        return this.formatPreservingMask(String(value), rule.config);
        
      case MaskingStrategy.SHUFFLE:
        // Shuffle is applied at dataset level, not individual record
        throw new Error('Shuffle strategy requires dataset-level masking');
        
      default:
        return this.characterMask(String(value), { maskChar: '*' });
    }
  }
 
  private characterMask(value: string, config: MaskingConfig): string {
    const maskChar = config.maskChar || '*';
    const revealFirst = config.revealFirst || 0;
    const revealLast = config.revealLast || 0;
    
    if (value.length <= revealFirst + revealLast) {
      return maskChar.repeat(value.length);
    }
    
    const start = value.substring(0, revealFirst);
    const end = value.substring(value.length - revealLast);
    const middleLength = value.length - revealFirst - revealLast;
    const masked = maskChar.repeat(middleLength);
    
    return start + masked + end;
  }
 
  private async substitute(
    value: any,
    rule: MaskingRule,
    context: MaskingContext
  ): Promise<any> {
    // Deterministic substitution for consistency across records
    const seed = this.generateDeterministicSeed(value, context.recordId);
    
    switch (rule.dataType) {
      case DataType.NAME:
        return this.substitutionData.getRandomName(seed);
      case DataType.EMAIL:
        return this.substitutionData.getRandomEmail(seed);
      case DataType.PHONE:
        return this.substitutionData.getRandomPhone(seed);
      case DataType.ADDRESS:
        return this.substitutionData.getRandomAddress(seed);
      case DataType.SSN:
        return this.substitutionData.getRandomSSN(seed);
      default:
        return value;
    }
  }
 
  private hashMask(
    value: string,
    config: MaskingConfig,
    fieldPath: string
  ): string {
    const algorithm = config.hashAlgorithm || 'sha256';
    // Include field path and seed to prevent cross-field rainbow attacks
    const saltedValue = `${this.deterministicSeed}:${fieldPath}:${value}`;
    return crypto
      .createHash(algorithm)
      .update(saltedValue)
      .digest('hex')
      .substring(0, 16); // Truncate for readability
  }
 
  private formatPreservingMask(value: string, config: MaskingConfig): string {
    // Replace each character with same type (letter→letter, digit→digit)
    let result = '';
    const revealFirst = config.revealFirst || 0;
    const revealLast = config.revealLast || 0;
    
    for (let i = 0; i < value.length; i++) {
      if (i < revealFirst || i >= value.length - revealLast) {
        result += value[i];
      } else {
        result += this.replacePreservingFormat(value[i]);
      }
    }
    
    return result;
  }
 
  private replacePreservingFormat(char: string): string {
    if (/[a-z]/.test(char)) return 'x';
    if (/[A-Z]/.test(char)) return 'X';
    if (/[0-9]/.test(char)) return '0';
    return char; // Preserve special characters, spaces, etc.
  }
 
  private applyVariance(value: number, config: MaskingConfig): number {
    const variancePercent = config.variancePercent || 10;
    const variance = value * (variancePercent / 100);
    const offset = (Math.random() * 2 - 1) * variance;
    return Math.round(value + offset);
  }
 
  // Dataset-level shuffling for referential consistency
  async shuffleDataset(
    records: any[],
    fieldsToShuffle: string[]
  ): Promise<any[]> {
    const shuffled = records.map(r => ({ ...r }));
    
    for (const field of fieldsToShuffle) {
      // Extract all values for this field
      const values = shuffled.map(r => this.getFieldValue(r, field));
      
      // Shuffle the values array
      for (let i = values.length - 1; i > 0; i--) {
        const j = Math.floor(Math.random() * (i + 1));
        [values[i], values[j]] = [values[j], values[i]];
      }
      
      // Apply shuffled values back
      shuffled.forEach((record, index) => {
        this.setFieldValue(record, field, values[index]);
      });
    }
    
    return shuffled;
  }
}

Deterministic vs Random Masking

Dynamic Data Masking

DDM Architecture:

Dynamic Masking Implementation Layers

•Database-Level DDM — Built into database (SQL Server, Azure SQL, Oracle). Transparent to applications but limited customization.
•Proxy-Level DDM — Transparent proxy between application and database intercepts and transforms results. No application changes required.
•Application-Level DDM — Data access layer applies masking based on user context. Most flexible but requires consistent implementation.
•API Gateway DDM — Gateway transforms response payloads based on caller identity. Centralizes masking logic.

DDM Advantages

•Original data preserved for authorized access
•Role-based visibility without data duplication
•No data transformation jobs needed
•Real-time based on current permissions
•Audit trail of actual access vs masked access

DDM Limitations

•Original data still stored (still in compliance scope)
•Performance overhead on every query
•Doesn't protect against privileged access abuse
•Can be bypassed with direct database access
•Inference attacks from masked patterns possible

dynamic-masking-middleware.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
// Application-Level Dynamic Data Masking
// Masks query results based on caller authorization
 
interface MaskingPolicy {
  fieldPath: string;
  defaultMask: MaskingStrategy;
  exemptRoles: string[];      // Roles that see unmasked data
  exemptPurposes: string[];   // Approved purposes for full access
}
 
interface DataAccessContext {
  userId: string;
  roles: string[];
  accessPurpose: string;
  clientType: 'internal' | 'external' | 'api';
}
 
class DynamicMaskingMiddleware {
  private policies: Map<string, MaskingPolicy>;
  private maskingService: DataMaskingService;
  private auditLogger: DataAccessAuditLogger;
 
  async processQueryResult<T>(
    results: T[],
    entityType: string,
    context: DataAccessContext
  ): Promise<T[]> {
    const entityPolicies = this.getPoliciesForEntity(entityType);
    
    if (entityPolicies.length === 0) {
      return results; // No masking policies for this entity
    }
    
    const maskedResults: T[] = [];
    const accessLog: FieldAccessLog[] = [];
    
    for (const record of results) {
      const maskedRecord = { ...record } as any;
      
      for (const policy of entityPolicies) {
        const fieldValue = this.getFieldValue(record, policy.fieldPath);
        
        if (fieldValue === undefined || fieldValue === null) {
          continue;
        }
        
        // Check if user is exempt from masking
        const isExempt = this.checkExemption(policy, context);
        
        if (!isExempt) {
          // Apply masking
          const maskedValue = await this.maskingService.maskValue(
            fieldValue,
            policy.defaultMask
          );
          this.setFieldValue(maskedRecord, policy.fieldPath, maskedValue);
          
          accessLog.push({
            field: policy.fieldPath,
            masked: true,
            reason: 'policy_enforcement',
          });
        } else {
          accessLog.push({
            field: policy.fieldPath,
            masked: false,
            reason: this.getExemptionReason(policy, context),
          });
        }
      }
      
      maskedResults.push(maskedRecord);
    }
    
    // Audit log the access
    await this.auditLogger.logDataAccess({
      userId: context.userId,
      entityType,
      recordCount: results.length,
      fieldAccess: accessLog,
      timestamp: new Date(),
      purpose: context.accessPurpose,
    });
    
    return maskedResults;
  }
 
  private checkExemption(
    policy: MaskingPolicy,
    context: DataAccessContext
  ): boolean {
    // Check role-based exemption
    if (policy.exemptRoles.some(role => context.roles.includes(role))) {
      return true;
    }
    
    // Check purpose-based exemption
    if (policy.exemptPurposes.includes(context.accessPurpose)) {
      return true;
    }
    
    return false;
  }
 
  // SQL View approach for database-level DDM
  generateMaskedView(
    tableName: string,
    policies: MaskingPolicy[],
    roleVariable: string = 'CURRENT_ROLE()'
  ): string {
    const selectClauses = [];
    const columns = this.getTableColumns(tableName);
    
    for (const column of columns) {
      const policy = policies.find(p => p.fieldPath === column.name);
      
      if (!policy) {
        selectClauses.push(column.name);
        continue;
      }
      
      // Generate CASE statement for role-based masking
      const exemptRolesStr = policy.exemptRoles
        .map(r => `'${r}'`)
        .join(', ');
      
      const maskExpression = this.generateSQLMaskExpression(
        column.name,
        column.type,
        policy.defaultMask
      );
      
      selectClauses.push(`
        CASE 
          WHEN ${roleVariable} IN (${exemptRolesStr}) THEN ${column.name}
          ELSE ${maskExpression}
        END AS ${column.name}
      `);
    }
    
    return `
      CREATE OR REPLACE VIEW ${tableName}_masked AS
      SELECT 
        ${selectClauses.join(',\n        ')}
      FROM ${tableName}
    `;
  }
 
  private generateSQLMaskExpression(
    columnName: string,
    columnType: string,
    strategy: MaskingStrategy
  ): string {
    switch (strategy) {
      case MaskingStrategy.CHARACTER_MASK:
        return `CONCAT(SUBSTRING(${columnName}, 1, 1), '****')`;
        
      case MaskingStrategy.TRUNCATE:
        return `RIGHT(${columnName}, 4)`;
        
      case MaskingStrategy.NULLIFY:
        return 'NULL';
        
      case MaskingStrategy.HASH:
        return `LEFT(SHA2(${columnName}, 256), 16)`;
        
      default:
        return `'[MASKED]'`;
    }
  }
}

Defense Against Inference Attacks

Tokenization: Architecture and Implementation

Payment Card Industry (PCI) Tokenization Example:

When you save a credit card for future purchases, the merchant typically doesn't store your actual card number. Instead:

Card number is sent to a Payment Service Provider (tokenization vault)
Vault stores the card securely and returns a token (e.g., tok_7f8d3a2b)
Merchant stores only the token
For future charges, merchant sends token to PSP, which retrieves actual card for processing

Result: Merchant's systems never contain card data, dramatically reducing PCI DSS compliance scope.

tokenization-vault-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
// Enterprise Tokenization Vault Service
// Securely stores sensitive data and issues non-sensitive tokens
 
interface TokenRecord {
  token: string;
  vault: string;                // Logical vault (e.g., 'payment_cards', 'ssn')
  encryptedValue: Buffer;       // AES-256-GCM encrypted original
  keyVersion: number;           // For key rotation
  metadata: TokenMetadata;
  createdAt: Date;
  expiresAt?: Date;
  accessCount: number;
  lastAccessedAt?: Date;
}
 
interface TokenMetadata {
  format: string;               // Original format for validation
  lastFour?: string;            // For display (cards, SSN)
  dataType: string;
  ownerId: string;              // Data subject ID
  createdBy: string;            // System that created token
}
 
interface TokenizationRequest {
  value: string;
  vault: string;
  ownerId: string;
  expirationDays?: number;
  preserveFormat?: boolean;
}
 
class TokenizationVaultService {
  private db: TokenDatabase;
  private keyManager: KeyManagementService;
  private auditLogger: VaultAuditLogger;
  private formatPreserver: FormatPreservingEncryptor;
 
  async tokenize(request: TokenizationRequest): Promise<TokenResult> {
    // Validate input format matches expected pattern
    this.validateFormat(request.value, request.vault);
    
    // Check for existing token (idempotency)
    const existingToken = await this.findExistingToken(
      request.value,
      request.vault,
      request.ownerId
    );
    
    if (existingToken) {
      return {
        token: existingToken.token,
        isNew: false,
        metadata: existingToken.metadata,
      };
    }
    
    // Generate token
    const token = request.preserveFormat
      ? await this.generateFormatPreservingToken(request.value, request.vault)
      : this.generateRandomToken(request.vault);
    
    // Encrypt the sensitive value
    const currentKey = await this.keyManager.getCurrentKey('tokenization');
    const encryptedValue = await this.encrypt(request.value, currentKey);
    
    // Extract safe metadata (last 4 digits, etc.)
    const metadata = this.extractSafeMetadata(request.value, request.vault);
    
    // Store token record
    const record: TokenRecord = {
      token,
      vault: request.vault,
      encryptedValue,
      keyVersion: currentKey.version,
      metadata: {
        ...metadata,
        ownerId: request.ownerId,
        createdBy: this.getCurrentSystemId(),
      },
      createdAt: new Date(),
      expiresAt: request.expirationDays
        ? new Date(Date.now() + request.expirationDays * 24 * 60 * 60 * 1000)
        : undefined,
      accessCount: 0,
    };
    
    await this.db.store(record);
    
    // Audit log
    await this.auditLogger.logTokenization({
      token,
      vault: request.vault,
      ownerId: request.ownerId,
      timestamp: new Date(),
    });
    
    return {
      token,
      isNew: true,
      metadata: record.metadata,
    };
  }
 
  async detokenize(
    token: string,
    context: DetokenizationContext
  ): Promise<string> {
    // Fetch token record
    const record = await this.db.getByToken(token);
    
    if (!record) {
      throw new TokenNotFoundError(`Token ${token} not found`);
    }
    
    // Check expiration
    if (record.expiresAt && record.expiresAt < new Date()) {
      throw new TokenExpiredError(`Token ${token} has expired`);
    }
    
    // Authorize the detokenization request
    await this.authorizeDetokenization(record, context);
    
    // Retrieve encryption key (may need old version for rotation)
    const key = await this.keyManager.getKey(
      'tokenization',
      record.keyVersion
    );
    
    // Decrypt the value
    const value = await this.decrypt(record.encryptedValue, key);
    
    // Update access metadata
    await this.db.updateAccessMetadata(token, {
      accessCount: record.accessCount + 1,
      lastAccessedAt: new Date(),
    });
    
    // Audit log
    await this.auditLogger.logDetokenization({
      token,
      vault: record.vault,
      accessedBy: context.requesterId,
      purpose: context.purpose,
      timestamp: new Date(),
    });
    
    return value;
  }
 
  async deleteToken(token: string, context: DeletionContext): Promise<void> {
    const record = await this.db.getByToken(token);
    
    if (!record) {
      return; // Idempotent deletion
    }
    
    // Verify deletion authorization
    await this.authorizeDeletion(record, context);
    
    // Delete the record (actual value is gone)
    await this.db.delete(token);
    
    // Audit log
    await this.auditLogger.logTokenDeletion({
      token,
      vault: record.vault,
      ownerId: record.metadata.ownerId,
      deletedBy: context.requesterId,
      reason: context.reason,
      timestamp: new Date(),
    });
  }
 
  // Batch operations for efficient processing
  async batchTokenize(
    requests: TokenizationRequest[]
  ): Promise<Map<string, TokenResult>> {
    const results = new Map<string, TokenResult>();
    
    // Use transaction for atomicity
    await this.db.transaction(async (tx) => {
      for (const request of requests) {
        const result = await this.tokenize(request);
        results.set(request.value, result);
      }
    });
    
    return results;
  }
 
  // For data subject rights - delete all tokens for a user
  async deleteAllForOwner(ownerId: string, reason: string): Promise<number> {
    const tokens = await this.db.getByOwner(ownerId);
    
    for (const token of tokens) {
      await this.deleteToken(token.token, {
        requesterId: 'system',
        reason: reason,
      });
    }
    
    return tokens.length;
  }
 
  private generateRandomToken(vault: string): string {
    const prefix = vault.substring(0, 3).toLowerCase();
    const random = crypto.randomBytes(16).toString('hex');
    return `${prefix}_${random}`;
  }
 
  private async generateFormatPreservingToken(
    value: string,
    vault: string
  ): Promise<string> {
    // Format-preserving encryption keeps the token the same format as original
    // Useful for systems expecting specific formats (e.g., 16-digit card number)
    return this.formatPreserver.encrypt(value, vault);
  }
 
  private extractSafeMetadata(value: string, vault: string): Partial<TokenMetadata> {
    switch (vault) {
      case 'payment_cards':
        return {
          format: 'card',
          lastFour: value.slice(-4),
          dataType: 'credit_card',
        };
      case 'ssn':
        return {
          format: 'ssn',
          lastFour: value.replace(/-/g, '').slice(-4),
          dataType: 'ssn',
        };
      default:
        return {
          format: 'general',
          dataType: vault,
        };
    }
  }
}

Token Vault Security is Critical

Format-Preserving Encryption (FPE)

FPE Use Cases:

When to Use Format-Preserving Encryption

•Legacy System Integration — Existing systems expect specific data formats; FPE encrypts without breaking field constraints.
•Database Migration — Encrypt existing data in-place without schema modifications or application changes.
•Tokenization with Original Format — Generate tokens that look like original data for systems with format validation.
•Data Analytics — Preserve statistical properties of data (distributions) while protecting values.
•Cross-System Data Flow — Maintain consistent format across systems with different encryption capabilities.

FPE Algorithm Comparison
Algorithm	Standard	Alphabet Support	Security Level
FF1	NIST SP 800-38G	Any character set	AES-based, well-analyzed
FF3-1	NIST SP 800-38G Rev 1	Any character set	AES-based, corrected from FF3
FE1	Academic	Numeric only	Feistel-based, less common
BPS	Proprietary	Alphanumeric	Vendor-specific

format-preserving-encryption.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
// Format-Preserving Encryption Implementation
// Uses FF1 algorithm per NIST SP 800-38G
 
interface FPEConfig {
  key: Buffer;           // 128, 192, or 256 bits
  tweak: Buffer;         // Context-specific tweak
  alphabet: string;      // Valid characters (e.g., '0123456789')
}
 
class FormatPreservingEncryptor {
  private ff1: FF1Algorithm;
 
  constructor() {
    this.ff1 = new FF1Algorithm();
  }
 
  async encryptPreservingFormat(
    plaintext: string,
    config: FPEConfig
  ): Promise<string> {
    // Validate input contains only alphabet characters
    this.validateAlphabet(plaintext, config.alphabet);
    
    // Extract format-relevant characters and their positions
    const { extracted, template } = this.extractCharacters(
      plaintext,
      config.alphabet
    );
    
    if (extracted.length === 0) {
      return plaintext; // No encryptable characters
    }
    
    // Convert to numeric representation
    const numericPlaintext = this.toNumeric(extracted, config.alphabet);
    
    // Apply FF1 encryption
    const numericCiphertext = await this.ff1.encrypt(
      numericPlaintext,
      config.key,
      config.tweak,
      config.alphabet.length
    );
    
    // Convert back to original alphabet
    const encryptedChars = this.fromNumeric(
      numericCiphertext,
      config.alphabet
    );
    
    // Reinsert into original format template
    return this.applyTemplate(encryptedChars, template);
  }
 
  async decryptPreservingFormat(
    ciphertext: string,
    config: FPEConfig
  ): Promise<string> {
    const { extracted, template } = this.extractCharacters(
      ciphertext,
      config.alphabet
    );
    
    const numericCiphertext = this.toNumeric(extracted, config.alphabet);
    
    const numericPlaintext = await this.ff1.decrypt(
      numericCiphertext,
      config.key,
      config.tweak,
      config.alphabet.length
    );
    
    const decryptedChars = this.fromNumeric(
      numericPlaintext,
      config.alphabet
    );
    
    return this.applyTemplate(decryptedChars, template);
  }
 
  // Extract alphabet characters while preserving format template
  private extractCharacters(
    input: string,
    alphabet: string
  ): { extracted: string; template: (number | string)[] } {
    const extracted: string[] = [];
    const template: (number | string)[] = [];
    let extractedIndex = 0;
    
    for (const char of input) {
      if (alphabet.includes(char)) {
        extracted.push(char);
        template.push(extractedIndex++);
      } else {
        template.push(char); // Non-alphabet char preserved in template
      }
    }
    
    return {
      extracted: extracted.join(''),
      template,
    };
  }
 
  // Apply encrypted characters back into format template
  private applyTemplate(
    chars: string,
    template: (number | string)[]
  ): string {
    let result = '';
    
    for (const item of template) {
      if (typeof item === 'number') {
        result += chars[item];
      } else {
        result += item; // Preserved non-alphabet character
      }
    }
    
    return result;
  }
 
  // Example: Encrypt credit card preserving format
  encryptCreditCard(cardNumber: string, key: Buffer): Promise<string> {
    // Remove spaces/dashes for encryption, preserve positions
    const config: FPEConfig = {
      key,
      tweak: Buffer.from('credit_card'), // Domain-specific tweak
      alphabet: '0123456789',
    };
    
    return this.encryptPreservingFormat(cardNumber, config);
  }
 
  // Example: Encrypt SSN preserving format
  encryptSSN(ssn: string, key: Buffer): Promise<string> {
    const config: FPEConfig = {
      key,
      tweak: Buffer.from('social_security'),
      alphabet: '0123456789',
    };
    
    // Input: 123-45-6789
    // Output: xxx-xx-xxxx (different digits, same format)
    return this.encryptPreservingFormat(ssn, config);
  }
}
 
// Usage example
async function example() {
  const fpe = new FormatPreservingEncryptor();
  const key = crypto.randomBytes(32); // 256-bit key
  
  // Credit card encryption
  const card = '4532-1234-5678-9012';
  const encryptedCard = await fpe.encryptCreditCard(card, key);
  console.log(encryptedCard); // '8291-7456-3012-4567' (same format)
  
  // SSN encryption
  const ssn = '123-45-6789';
  const encryptedSSN = await fpe.encryptSSN(ssn, key);
  console.log(encryptedSSN); // '847-92-1356' (same format)
}

FPE Security Considerations

Generating Safe Test Data

Test Data Generation Strategies:

Test Data Generation Approaches
Approach	Description	Pros	Cons
Synthetic Generation	Create entirely fake data from scratch	No production data exposure; unlimited volume	May miss edge cases; requires careful schema understanding
Production Masking	Copy production, apply static masking	Preserves realistic distributions	Risk of incomplete masking; needs production copy
Subset + Mask	Extract subset of production, mask sensitive fields	Realistic relationships; manageable size	Referential integrity challenges
Schema-Based Generation	Generate from database schema with constraints	Respects constraints; deterministic	May lack realistic distributions

test-data-generator.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
// Realistic Test Data Generation Service
// Generates synthetic data that mimics production characteristics
 
import { faker } from '@faker-js/faker';
 
interface DataGenerationSchema {
  tableName: string;
  count: number;
  fields: FieldGenerator[];
  relationships?: RelationshipSpec[];
}
 
interface FieldGenerator {
  name: string;
  type: GeneratorType;
  options?: GeneratorOptions;
}
 
class TestDataGeneratorService {
  private generatedData: Map<string, any[]> = new Map();
  private relationshipResolver: RelationshipResolver;
 
  constructor() {
    this.relationshipResolver = new RelationshipResolver();
  }
 
  async generateDataset(
    schemas: DataGenerationSchema[]
  ): Promise<Map<string, any[]>> {
    // Sort by dependencies to ensure foreign keys are satisfied
    const sortedSchemas = this.topologicalSort(schemas);
    
    for (const schema of sortedSchemas) {
      const records = await this.generateTable(schema);
      this.generatedData.set(schema.tableName, records);
    }
    
    return this.generatedData;
  }
 
  private async generateTable(schema: DataGenerationSchema): Promise<any[]> {
    const records: any[] = [];
    
    for (let i = 0; i < schema.count; i++) {
      const record: any = {};
      
      for (const field of schema.fields) {
        record[field.name] = await this.generateField(field, i, schema);
      }
      
      // Resolve foreign key relationships
      if (schema.relationships) {
        for (const rel of schema.relationships) {
          record[rel.foreignKey] = this.relationshipResolver.resolve(
            rel,
            this.generatedData,
            i
          );
        }
      }
      
      records.push(record);
    }
    
    return records;
  }
 
  private async generateField(
    field: FieldGenerator,
    index: number,
    schema: DataGenerationSchema
  ): Promise<any> {
    const options = field.options || {};
    
    switch (field.type) {
      // Personal Information (all synthetic)
      case 'firstName':
        return faker.person.firstName();
        
      case 'lastName':
        return faker.person.lastName();
        
      case 'fullName':
        return faker.person.fullName();
        
      case 'email':
        // Generate unique, testable emails
        return `user${index}@testdomain-${schema.tableName}.example.com`;
        
      case 'phone':
        // Use clearly fake patterns
        return faker.phone.number('555-###-####');
        
      // Financial Data (valid format, fake values)
      case 'creditCard':
        // Generate Luhn-valid test card numbers
        return this.generateTestCreditCard(options.cardType);
        
      case 'ssn':
        // Use reserved test SSN ranges (900-999 prefix)
        return `9${faker.string.numeric(2)}-${faker.string.numeric(2)}-${faker.string.numeric(4)}`;
        
      case 'bankAccount':
        return faker.finance.accountNumber();
        
      // Addresses (real-looking but fictional)
      case 'address':
        return {
          street: faker.location.streetAddress(),
          city: faker.location.city(),
          state: faker.location.state(),
          zip: faker.location.zipCode(),
          country: 'Test Country',
        };
        
      // Temporal Data
      case 'dateOfBirth':
        return faker.date.birthdate({
          min: options.minAge || 18,
          max: options.maxAge || 80,
          mode: 'age',
        });
        
      case 'timestamp':
        return faker.date.between({
          from: options.from || new Date('2020-01-01'),
          to: options.to || new Date(),
        });
        
      // Business Data
      case 'companyName':
        return faker.company.name();
        
      case 'jobTitle':
        return faker.person.jobTitle();
        
      case 'amount':
        return faker.number.float({
          min: options.min || 0,
          max: options.max || 10000,
          multipleOf: 0.01,
        });
        
      // Sequential/Reference Fields
      case 'uuid':
        return faker.string.uuid();
        
      case 'sequence':
        return index + (options.startAt || 1);
        
      // Custom Patterns
      case 'pattern':
        return faker.helpers.replaceSymbols(options.pattern);
        
      case 'enum':
        return faker.helpers.arrayElement(options.values);
        
      case 'weighted':
        return this.weightedRandom(options.weights);
        
      default:
        return faker.lorem.word();
    }
  }
 
  private generateTestCreditCard(type?: string): string {
    // Test card numbers that pass Luhn but are clearly fake
    const testCards: Record<string, string> = {
      visa: '4111111111111111',
      mastercard: '5500000000000004',
      amex: '340000000000009',
      discover: '6011000000000004',
    };
    
    return testCards[type || 'visa'];
  }
 
  private weightedRandom(weights: Record<string, number>): string {
    const entries = Object.entries(weights);
    const total = entries.reduce((sum, [_, w]) => sum + w, 0);
    let random = Math.random() * total;
    
    for (const [value, weight] of entries) {
      random -= weight;
      if (random <= 0) return value;
    }
    
    return entries[0][0];
  }
 
  // Generate consistent test datasets for reproducible tests
  generateDeterministic(
    schemas: DataGenerationSchema[],
    seed: number
  ): Promise<Map<string, any[]>> {
    faker.seed(seed);
    return this.generateDataset(schemas);
  }
}
 
// Usage Example
const schema: DataGenerationSchema = {
  tableName: 'customers',
  count: 1000,
  fields: [
    { name: 'id', type: 'uuid' },
    { name: 'firstName', type: 'firstName' },
    { name: 'lastName', type: 'lastName' },
    { name: 'email', type: 'email' },
    { name: 'phone', type: 'phone' },
    { name: 'ssn', type: 'ssn' },
    { name: 'createdAt', type: 'timestamp' },
    { name: 'tier', type: 'weighted', options: {
      weights: { 'free': 0.7, 'premium': 0.25, 'enterprise': 0.05 }
    }},
  ],
};

Use Clearly Fake Values

Implementation Patterns and Best Practices

Successfully implementing masking and tokenization requires careful architectural decisions. Here are patterns learned from enterprise deployments.

Key Implementation Considerations:

Architecture Decision Points

•Centralized vs Distributed Vaults — Single vault simplifies security but creates latency and single point of failure. Regional vaults reduce latency but complicate key management.
•Synchronous vs Asynchronous Tokenization — Inline tokenization adds latency to write path. Batch tokenization is faster but creates window of unprotected data.
•Token Format Strategy — Random tokens are most secure. Format-preserving tokens ease integration but reveal data type information.
•Key Rotation Handling — Plan for re-encryption during key rotation. Versioned keys enable gradual migration without access disruption.
•Caching Strategy — Token cache improves performance but creates additional data copies. Cache tokens (not values) with strict TTLs.
•Disaster Recovery — Token vault is critical path. Plan for vault unavailability, including graceful degradation vs hard failure.

Anti-Patterns to Avoid:

Anti-Pattern	Problem	Better Approach
Tokenizing encrypted data	Double transformation adds complexity without security benefit	Tokenize OR encrypt, not both
Storing tokens and values together	Compromise exposes both	Token vault must be separate system
Predictable token generation	Enables token guessing attacks	Use cryptographically secure random generation
Skipping detokenization audit	Cannot detect token abuse	Log every detokenization with context
Infinite token validity	Stale tokens accumulate	Set expiration, implement token refresh
Trusting token metadata	Metadata can be spoofed	Validate token ownership on every access

Tokenization Doesn't Replace Access Control

Summary: Mastering Data Transformation Techniques

Key Takeaways:

Critical Principles

•Choose the right technique — Masking for irreversible de-identification, tokenization for reversible protection with vault lookup, encryption for cryptographic protection.
•Tokenization reduces compliance scope — Unlike encryption, tokenization removes sensitive data from your systems entirely, dramatically reducing regulatory burden.
•Dynamic masking enables role-based access — Show different users different views of the same data without data duplication.
•Format-preserving techniques ease legacy integration — FPE and format-preserving tokens work with systems that expect specific data formats.
•Generate safe test data — Synthetic data generation eliminates production data exposure in non-production environments.
•Secure the token vault — The vault becomes your highest-value target; apply maximum security controls.

Next Steps:

Page Complete

3 / 5