Data Protection - Learning Module

Loading content...

0/273

PII Handling

The Most Sensitive Data You Handle

Personally Identifiable Information (PII) represents a unique category of sensitive data that demands specialized handling. Unlike proprietary business data—where exposure causes organizational harm—PII exposure causes harm to individuals: identity theft, discrimination, stalking, financial fraud, and loss of privacy. This creates both legal obligations under privacy regulations and ethical responsibilities that go beyond compliance checklists.

Every system that processes user data—which means virtually every modern application—must grapple with PII handling. Whether you're building a simple signup form or a complex data analytics platform, the moment you collect a name, email address, or IP address, you've entered the realm of PII governance.

What You Will Learn

By the end of this page, you will understand how to identify PII across its many forms, learn the legal and regulatory frameworks governing PII, master technical controls for protecting PII in distributed systems, and design architectures that minimize PII exposure while maintaining functionality.

Understanding Personally Identifiable Information

PII is any information that can be used to identify, locate, or contact a specific individual—either directly or when combined with other information. The definition varies slightly across regulations, but the core principle remains: if data can be traced back to a person, it requires protection.

The PII Spectrum:

Not all PII is equally sensitive. Understanding this spectrum helps apply proportionate protection:

PII Sensitivity Categories
Category	Examples	Risk Level	Typical Controls
Direct Identifiers	Full name, SSN, passport number, driver's license	Very High	Encryption, strict access control, audit logging
Contact Information	Email address, phone number, physical address	High	Encryption, need-to-know access, masking in logs
Financial Data	Credit card numbers, bank accounts, income	Very High	PCI DSS compliance, tokenization, isolated storage
Health Information	Medical records, prescriptions, diagnoses	Very High	HIPAA compliance, specialized access controls
Biometrics	Fingerprints, facial recognition, voice prints	Critical	Cannot be changed if compromised, strictest controls
Quasi-Identifiers	ZIP code, birth date, gender, occupation	Medium→High	Low individually, dangerous when combined
Online Identifiers	IP addresses, device IDs, cookies	Medium	Context-dependent sensitivity, session-based handling
Behavioral Data	Purchase history, browsing patterns, location	Medium→High	Aggregation risk, pattern inference protection

The Linkability Problem:

The most challenging aspect of PII is linkability—the ability to combine seemingly innocuous data points to identify individuals. Research has demonstrated:

87% of the US population can be uniquely identified by birth date + gender + ZIP code alone
Anonymous movie rating data was de-anonymized by correlating with public IMDB reviews
Fitness tracker data reveals home addresses through morning running routes
Aggregated location data from mobile apps exposed secret military base locations

This means PII protection extends beyond obvious identifiers. Any data that could potentially be linked to identify individuals requires careful handling.

When In Doubt, Treat as PII

The regulatory trend is toward broader PII definitions. GDPR considers any data that 'relates to' an identified or identifiable person as personal data. If there's any reasonable possibility that data could be linked to an individual, apply PII protections.

The PII Regulatory Landscape

PII handling is governed by a complex patchwork of regulations that vary by jurisdiction, industry, and data type. Modern systems often must comply with multiple overlapping frameworks simultaneously. Understanding these requirements is essential for compliant system design.

Key Global Privacy Regulations:

Major Privacy Frameworks

•GDPR (EU) — General Data Protection Regulation. Applies to any organization processing EU residents' data, regardless of organization location. Requires lawful basis, purpose limitation, data minimization, individual rights (access, deletion, portability), breach notification within 72 hours, and privacy by design.
•CCPA/CPRA (California) — California Consumer Privacy Act / California Privacy Rights Act. Applies to businesses handling California residents' data above revenue/volume thresholds. Provides rights to know, delete, opt-out of sale, and non-discrimination.
•HIPAA (US Healthcare) — Health Insurance Portability and Accountability Act. Applies to healthcare providers, insurers, and their business associates. Protects Protected Health Information (PHI) with strict access, audit, and breach notification requirements.
•LGPD (Brazil) — Lei Geral de Proteção de Dados. GDPR-influenced law protecting personal data of Brazilian residents with similar requirements for consent, purpose limitation, and individual rights.
•PIPEDA (Canada) — Personal Information Protection and Electronic Documents Act. Governs how private sector organizations collect and use personal information in commercial activities.

Regulatory Requirement Comparison
Requirement	GDPR	CCPA/CPRA	HIPAA
Consent Required	Yes (or other lawful basis)	Opt-out model for sale	Authorization for disclosure
Right to Access	Yes	Yes	Yes
Right to Delete	Yes (with exceptions)	Yes (with exceptions)	Amendment right
Data Portability	Yes (machine-readable)	Yes (specific format)	No
Breach Notification	72 hours to authority	General timeliness	60 days
Privacy Officer Required	DPO in certain cases	No	No (Privacy Officer common)
Cross-Border Transfer Limits	Adequacy decisions/SCCs	No	Business Associate Agreements
Penalties	Up to 4% global revenue	Up to $7,500 per violation	Up to $1.5M annually

Key Principles Across Regulations:

Despite variation in specifics, privacy regulations share common principles that guide system design:

Lawfulness and Transparency — Collect and process data only with proper legal basis; inform individuals about processing.
Purpose Limitation — Use data only for stated purposes; don't repurpose without additional consent.
Data Minimization — Collect only what's necessary; don't hoard data 'just in case.'
Accuracy — Keep data current; provide mechanisms for correction.
Storage Limitation — Retain data only as long as necessary; implement automatic deletion.
Integrity and Confidentiality — Protect data with appropriate security measures.
Accountability — Demonstrate compliance through documentation, audits, and governance.

Design for the Strictest Requirement

When operating across jurisdictions, design systems to meet the strictest applicable requirement. It's easier to relax controls where permitted than to retrofit stricter controls later. GDPR is often the benchmark for global systems.

PII Discovery and Inventory

You cannot protect what you don't know exists. PII discovery is the systematic process of identifying where personal data resides across your systems. In modern distributed architectures, PII often spreads across databases, file systems, logs, caches, analytics systems, third-party integrations, and backup storage. Comprehensive discovery requires multiple approaches.

Structured Data Discovery

•Schema Analysis — Examine database schemas for sensitive field names (ssn, email, phone, dob)
•Data Sampling — Scan actual data values using pattern matching (regex for SSN, credit card formats)
•Metadata Cataloging — Document data lineage, ownership, and classification in central registry
•API Contract Review — Analyze API specifications for PII in request/response payloads

Unstructured Data Discovery

•Log Analysis — Scan application logs for accidentally logged PII (common source of exposure)
•Document Scanning — Use OCR and NLP to identify PII in documents, images, PDFs
•Email/Chat Archives — Search communication archives for personal data
•Cloud Storage Audit — Scan object storage for files containing sensitive data

pii-discovery-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
// PII Discovery and Inventory Service
// Automatically discovers and catalogs PII across data sources
 
interface PIIInventoryItem {
  dataSourceId: string;
  location: string; // e.g., "users.email", "logs/app.log:line:1234"
  piiType: PIIType;
  confidence: number;
  discoveryMethod: 'schema' | 'pattern' | 'ml' | 'manual';
  dataSubjectCategory: string; // e.g., 'customer', 'employee', 'prospect'
  processingPurpose: string[];
  retentionPeriod: string;
  lastScanned: Date;
}
 
enum PIIType {
  EMAIL = 'email',
  PHONE = 'phone_number',
  SSN = 'social_security_number',
  NAME = 'full_name',
  ADDRESS = 'physical_address',
  DOB = 'date_of_birth',
  IP_ADDRESS = 'ip_address',
  CREDIT_CARD = 'credit_card',
  HEALTH_DATA = 'health_information',
  BIOMETRIC = 'biometric_data',
  FINANCIAL = 'financial_data',
  LOCATION = 'location_data',
}
 
class PIIDiscoveryService {
  private patternDetectors: Map<PIIType, RegExp[]>;
  private mlClassifier: NERModel;
  private inventoryStore: PIIInventoryRepository;
 
  constructor() {
    this.patternDetectors = this.initializePatterns();
    this.mlClassifier = new NERModel('pii-detection-v3');
    this.inventoryStore = new PIIInventoryRepository();
  }
 
  private initializePatterns(): Map<PIIType, RegExp[]> {
    return new Map([
      [PIIType.EMAIL, [
        /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g
      ]],
      [PIIType.SSN, [
        /\b\d{3}-\d{2}-\d{4}\b/g,
        /\b\d{9}\b/g  // Unformatted SSN
      ]],
      [PIIType.CREDIT_CARD, [
        /\b4[0-9]{12}(?:[0-9]{3})?\b/g,  // Visa
        /\b5[1-5][0-9]{14}\b/g,  // Mastercard
        /\b3[47][0-9]{13}\b/g,  // Amex
      ]],
      [PIIType.PHONE, [
        /\b\+?1?[-.]?\(?\d{3}\)?[-.]?\d{3}[-.]?\d{4}\b/g
      ]],
      [PIIType.IP_ADDRESS, [
        /\b(?:\d{1,3}\.){3}\d{1,3}\b/g,
        /\b(?:[a-fA-F0-9]{1,4}:){7}[a-fA-F0-9]{1,4}\b/g
      ]],
    ]);
  }
 
  async scanDatabaseSchema(
    connectionInfo: DatabaseConnection
  ): Promise<PIIInventoryItem[]> {
    const discoveries: PIIInventoryItem[] = [];
    const schema = await this.fetchDatabaseSchema(connectionInfo);
    
    for (const table of schema.tables) {
      for (const column of table.columns) {
        const piiIndicators = this.analyzeColumnName(column.name);
        
        if (piiIndicators.length > 0) {
          // Schema name suggests PII - verify with data sampling
          const sampleData = await this.sampleColumn(
            connectionInfo, 
            table.name, 
            column.name,
            100  // Sample size
          );
          
          const confirmedTypes = await this.confirmPIIWithPatterns(sampleData);
          
          for (const piiType of confirmedTypes) {
            discoveries.push({
              dataSourceId: connectionInfo.id,
              location: `${table.name}.${column.name}`,
              piiType: piiType.type,
              confidence: piiType.confidence,
              discoveryMethod: 'schema',
              dataSubjectCategory: this.inferDataSubject(table.name),
              processingPurpose: [],  // Requires manual annotation
              retentionPeriod: 'unspecified',
              lastScanned: new Date(),
            });
          }
        }
      }
    }
    
    return discoveries;
  }
 
  async scanLogs(
    logSource: LogSource,
    timeRange: TimeRange
  ): Promise<PIIDiscoveryResult> {
    const discoveries: PIIInventoryItem[] = [];
    const alertableFindings: PIIAlert[] = [];
    
    const logStream = await this.createLogStream(logSource, timeRange);
    
    for await (const logEntry of logStream) {
      const lineFindings = await this.scanText(logEntry.message);
      
      for (const finding of lineFindings) {
        // PII in logs is always concerning - alert immediately
        alertableFindings.push({
          severity: 'high',
          location: `${logSource.name}:${logEntry.timestamp}`,
          piiType: finding.type,
          recommendation: 'Implement log sanitization',
          sample: this.redactForAlert(finding.match),
        });
        
        discoveries.push({
          dataSourceId: logSource.id,
          location: `logs/${logSource.name}`,
          piiType: finding.type,
          confidence: finding.confidence,
          discoveryMethod: 'pattern',
          dataSubjectCategory: 'unknown',
          processingPurpose: ['logging'],
          retentionPeriod: logSource.retentionPolicy,
          lastScanned: new Date(),
        });
      }
    }
    
    return { discoveries, alerts: alertableFindings };
  }
 
  private analyzeColumnName(name: string): PIIType[] {
    const indicators: PIIType[] = [];
    const lowerName = name.toLowerCase();
    
    const mappings: [string[], PIIType][] = [
      [['email', 'mail', 'e_mail'], PIIType.EMAIL],
      [['phone', 'mobile', 'cell', 'tel'], PIIType.PHONE],
      [['ssn', 'social_security', 'taxpayer_id'], PIIType.SSN],
      [['name', 'first_name', 'last_name', 'full_name'], PIIType.NAME],
      [['address', 'street', 'city', 'zip', 'postal'], PIIType.ADDRESS],
      [['dob', 'birth', 'birthday', 'date_of_birth'], PIIType.DOB],
      [['ip', 'ip_address', 'client_ip'], PIIType.IP_ADDRESS],
      [['card', 'credit', 'payment'], PIIType.CREDIT_CARD],
    ];
    
    for (const [patterns, piiType] of mappings) {
      if (patterns.some(p => lowerName.includes(p))) {
        indicators.push(piiType);
      }
    }
    
    return indicators;
  }
 
  async generateInventoryReport(): Promise<PIIInventoryReport> {
    const inventory = await this.inventoryStore.getAll();
    
    return {
      totalPIILocations: inventory.length,
      byType: this.groupBy(inventory, 'piiType'),
      byDataSource: this.groupBy(inventory, 'dataSourceId'),
      byDataSubject: this.groupBy(inventory, 'dataSubjectCategory'),
      highRiskLocations: inventory.filter(
        i => i.piiType === PIIType.SSN || 
             i.piiType === PIIType.HEALTH_DATA ||
             i.piiType === PIIType.BIOMETRIC
      ),
      unclassifiedLocations: inventory.filter(
        i => !i.processingPurpose || i.processingPurpose.length === 0
      ),
      reportGeneratedAt: new Date(),
    };
  }
}

Continuous Discovery

PII discovery isn't a one-time activity. New features add new data fields, schema migrations change storage locations, and developers inadvertently log sensitive data. Implement continuous scanning with alerts for newly discovered PII.

PII Protection Controls

Once you've identified where PII exists, you must apply appropriate protection controls. These controls operate at multiple layers, providing defense in depth against various threat vectors.

Layered Protection Architecture:

PII Protection Control Layers

•Access Control — Limit who can access PII to those with legitimate business need. Implement role-based access with least-privilege principles.
•Encryption — Protect PII at rest and in transit using strong encryption. Consider field-level encryption for highly sensitive data.
•Data Masking — Hide full PII values from users who don't need complete data. Show partial values (xxx-xx-1234) where possible.
•Tokenization — Replace PII with tokens for processing that doesn't require actual values. Only detokenize when necessary.
•Anonymization/Pseudonymization — Remove or replace direct identifiers for analytics use cases while preserving data utility.
•Audit Logging — Record all access to PII for compliance and forensic investigation. Include who, what, when, and why.
•Data Minimization — Don't collect PII you don't need. Review existing collections and purge unnecessary data.

pii-protection-middleware.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
// PII-Aware API Middleware
// Applies protection controls based on data sensitivity and user authorization
 
interface PIIAccessContext {
  userId: string;
  userRoles: string[];
  accessPurpose: string;
  clientIP: string;
  requestId: string;
  consentScope?: string[];
}
 
interface PIIField {
  fieldPath: string;
  piiType: PIIType;
  accessLevel: 'full' | 'masked' | 'tokenized' | 'denied';
}
 
class PIIProtectionMiddleware {
  private accessPolicy: PIIAccessPolicyEngine;
  private maskingService: DataMaskingService;
  private tokenService: TokenizationService;
  private auditLogger: PIIAuditLogger;
 
  async processRequest(
    request: APIRequest,
    context: PIIAccessContext
  ): Promise<APIRequest> {
    // Check if request contains PII
    const piiFields = await this.detectPIIInRequest(request.body);
    
    if (piiFields.length > 0) {
      // Verify consent for each PII field
      for (const field of piiFields) {
        const consentValid = await this.verifyConsent(
          field,
          context.accessPurpose,
          context.consentScope
        );
        
        if (!consentValid) {
          throw new ConsentRequiredError(
            `Consent not provided for ${field.piiType} processing`
          );
        }
      }
      
      // Log PII write operation
      await this.auditLogger.logPIIWrite({
        requestId: context.requestId,
        userId: context.userId,
        piiTypes: piiFields.map(f => f.piiType),
        purpose: context.accessPurpose,
        timestamp: new Date(),
      });
    }
    
    return request;
  }
 
  async processResponse(
    response: APIResponse,
    context: PIIAccessContext
  ): Promise<APIResponse> {
    const piiFields = await this.detectPIIInResponse(response.body);
    
    if (piiFields.length === 0) {
      return response;
    }
    
    // Determine access level for each PII field
    const fieldAccessLevels = await Promise.all(
      piiFields.map(field => 
        this.determineAccessLevel(field, context)
      )
    );
    
    // Apply appropriate protection to each field
    let protectedBody = response.body;
    
    for (let i = 0; i < piiFields.length; i++) {
      const field = piiFields[i];
      const accessLevel = fieldAccessLevels[i];
      
      switch (accessLevel) {
        case 'denied':
          protectedBody = this.removeField(protectedBody, field.fieldPath);
          break;
        case 'masked':
          protectedBody = await this.maskField(
            protectedBody, 
            field.fieldPath, 
            field.piiType
          );
          break;
        case 'tokenized':
          protectedBody = await this.tokenizeField(
            protectedBody, 
            field.fieldPath
          );
          break;
        case 'full':
          // No transformation needed, but log access
          break;
      }
    }
    
    // Log PII access
    await this.auditLogger.logPIIAccess({
      requestId: context.requestId,
      userId: context.userId,
      piiFields: piiFields.map((f, i) => ({
        type: f.piiType,
        accessLevel: fieldAccessLevels[i],
      })),
      purpose: context.accessPurpose,
      timestamp: new Date(),
    });
    
    return { ...response, body: protectedBody };
  }
 
  private async determineAccessLevel(
    field: PIIField,
    context: PIIAccessContext
  ): Promise<'full' | 'masked' | 'tokenized' | 'denied'> {
    // Check explicit permission for this PII type
    const permission = await this.accessPolicy.checkPermission(
      context.userRoles,
      field.piiType,
      context.accessPurpose
    );
    
    if (!permission.allowed) {
      return 'denied';
    }
    
    // Determine appropriate access level based on permission scope
    if (permission.scope === 'full') {
      return 'full';
    }
    
    // For partial access, mask or tokenize based on use case
    if (context.accessPurpose === 'analytics') {
      return 'tokenized'; // Preserve linkability without exposing value
    }
    
    return 'masked'; // Show partial value for verification
  }
 
  private async maskField(
    data: any,
    fieldPath: string,
    piiType: PIIType
  ): Promise<any> {
    const value = this.getFieldValue(data, fieldPath);
    if (!value) return data;
    
    const maskedValue = this.maskingService.mask(value, piiType);
    return this.setFieldValue(data, fieldPath, maskedValue);
  }
}
 
// Masking strategies by PII type
class DataMaskingService {
  mask(value: string, piiType: PIIType): string {
    switch (piiType) {
      case PIIType.EMAIL:
        // j***@example.com
        const [local, domain] = value.split('@');
        return `${local[0]}${'*'.repeat(3)}@${domain}`;
        
      case PIIType.SSN:
        // ***-**-1234
        return `***-**-${value.slice(-4)}`;
        
      case PIIType.CREDIT_CARD:
        // **** **** **** 1234
        return `**** **** **** ${value.slice(-4)}`;
        
      case PIIType.PHONE:
        // (***) ***-1234
        return `(***) ***-${value.slice(-4)}`;
        
      case PIIType.NAME:
        // J*** D**
        return value.split(' ')
          .map(part => `${part[0]}${'*'.repeat(Math.min(part.length - 1, 3))}`)
          .join(' ');
        
      default:
        // Generic redaction
        return '*'.repeat(value.length);
    }
  }
}

Masking Is Not Anonymization

Masked data (showing last 4 digits of SSN, for example) is still PII. The partial data plus context may still identify individuals. Use masking for display purposes but implement proper access controls and logging as if dealing with full values.

Consent Management Architecture

Privacy regulations require documented consent for processing personal data (or other lawful bases under GDPR). Managing consent at scale requires dedicated infrastructure that captures, stores, and enforces user preferences across distributed systems.

Consent Management Requirements:

Key Consent Management Capabilities

•Granular Consent Capture — Obtain specific consent for each processing purpose, not blanket consent for everything.
•Consent Versioning — Track which version of privacy policy/terms the user consented to, enabling compliance with changing requirements.
•Easy Withdrawal — Provide simple mechanisms to withdraw consent, and propagate withdrawal across systems.
•Consent Evidence — Maintain auditable proof of consent with timestamp, version, and mechanism (check box, API call, etc.).
•Purpose Processing Mapping — Link specific data processing activities to the consent that authorizes them.
•Real-time Enforcement — Check consent status before processing, not just at collection time.

consent-management-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
// Enterprise Consent Management Platform
// Captures, stores, and enforces user consent across distributed systems
 
interface ConsentRecord {
  subjectId: string;  // The data subject (user)
  purpose: ProcessingPurpose;
  status: 'granted' | 'withdrawn' | 'expired';
  grantedAt: Date;
  withdrawnAt?: Date;
  expiresAt?: Date;
  policyVersion: string;
  captureMethod: 'web_form' | 'api' | 'verbal' | 'written';
  captureContext: {
    ipAddress: string;
    userAgent: string;
    formId?: string;
    legalText: string;
  };
}
 
enum ProcessingPurpose {
  SERVICE_DELIVERY = 'service_delivery',
  MARKETING_EMAIL = 'marketing_email',
  ANALYTICS = 'analytics',
  PERSONALIZATION = 'personalization',
  THIRD_PARTY_SHARING = 'third_party_sharing',
  RESEARCH = 'research',
}
 
class ConsentManagementService {
  private consentStore: ConsentRepository;
  private eventBus: EventBus;
  private policyService: PrivacyPolicyService;
 
  async grantConsent(
    subjectId: string,
    purposes: ProcessingPurpose[],
    captureContext: ConsentCaptureContext
  ): Promise<ConsentGrantResult> {
    const currentPolicy = await this.policyService.getCurrentPolicy();
    const timestamp = new Date();
    
    const records: ConsentRecord[] = purposes.map(purpose => ({
      subjectId,
      purpose,
      status: 'granted',
      grantedAt: timestamp,
      policyVersion: currentPolicy.version,
      captureMethod: captureContext.method,
      captureContext: {
        ipAddress: captureContext.ipAddress,
        userAgent: captureContext.userAgent,
        formId: captureContext.formId,
        legalText: currentPolicy.getConsentText(purpose),
      },
    }));
    
    // Atomically store all consent records
    await this.consentStore.storeConsents(records);
    
    // Publish consent events for downstream systems
    for (const record of records) {
      await this.eventBus.publish('consent.granted', {
        subjectId,
        purpose: record.purpose,
        timestamp,
        policyVersion: currentPolicy.version,
      });
    }
    
    return {
      success: true,
      consentIds: records.map(r => r.id),
      effectiveDate: timestamp,
    };
  }
 
  async withdrawConsent(
    subjectId: string,
    purposes: ProcessingPurpose[]
  ): Promise<ConsentWithdrawalResult> {
    const timestamp = new Date();
    
    // Mark consents as withdrawn
    await this.consentStore.withdrawConsents(subjectId, purposes, timestamp);
    
    // Publish withdrawal events - downstream systems must react
    for (const purpose of purposes) {
      await this.eventBus.publish('consent.withdrawn', {
        subjectId,
        purpose,
        timestamp,
        // Include action requirements for downstream systems
        requiredActions: this.getRequiredActionsForWithdrawal(purpose),
      });
    }
    
    return {
      success: true,
      withdrawnPurposes: purposes,
      effectiveDate: timestamp,
    };
  }
 
  async checkConsent(
    subjectId: string,
    purpose: ProcessingPurpose
  ): Promise<ConsentCheckResult> {
    const consent = await this.consentStore.getActiveConsent(
      subjectId,
      purpose
    );
    
    if (!consent) {
      return {
        hasConsent: false,
        reason: 'no_consent_on_record',
      };
    }
    
    if (consent.status === 'withdrawn') {
      return {
        hasConsent: false,
        reason: 'consent_withdrawn',
        withdrawnAt: consent.withdrawnAt,
      };
    }
    
    if (consent.expiresAt && consent.expiresAt < new Date()) {
      return {
        hasConsent: false,
        reason: 'consent_expired',
        expiredAt: consent.expiresAt,
      };
    }
    
    // Check if policy version requires re-consent
    const currentPolicy = await this.policyService.getCurrentPolicy();
    if (this.requiresReconsent(consent.policyVersion, currentPolicy.version)) {
      return {
        hasConsent: false,
        reason: 'policy_updated_requires_reconsent',
        previousPolicyVersion: consent.policyVersion,
        currentPolicyVersion: currentPolicy.version,
      };
    }
    
    return {
      hasConsent: true,
      grantedAt: consent.grantedAt,
      policyVersion: consent.policyVersion,
    };
  }
 
  async getSubjectConsents(
    subjectId: string
  ): Promise<ConsentRecord[]> {
    // For data subject access requests (DSAR)
    return this.consentStore.getAllConsents(subjectId);
  }
 
  private getRequiredActionsForWithdrawal(
    purpose: ProcessingPurpose
  ): RequiredAction[] {
    const actionMap: Record<ProcessingPurpose, RequiredAction[]> = {
      [ProcessingPurpose.MARKETING_EMAIL]: [
        { system: 'email_platform', action: 'unsubscribe' },
        { system: 'crm', action: 'update_preferences' },
      ],
      [ProcessingPurpose.ANALYTICS]: [
        { system: 'analytics', action: 'stop_tracking' },
        { system: 'data_warehouse', action: 'exclude_from_analysis' },
      ],
      [ProcessingPurpose.THIRD_PARTY_SHARING]: [
        { system: 'integration_platform', action: 'revoke_data_shares' },
        { system: 'partners', action: 'notify_data_deletion' },
      ],
      // ... other mappings
    };
    
    return actionMap[purpose] || [];
  }
}

Consent Propagation Challenge:

In distributed systems, consent granted in one service must be respected across all services that process the data. This requires:

Centralized Consent Store — Single source of truth for consent status, queried by all processing systems.
Event-Driven Updates — Publish consent changes to event bus for real-time propagation.
Pre-Processing Checks — Services must verify consent before processing, not assume based on data presence.
Graceful Degradation — Define behavior when consent store is unavailable (fail closed vs. cached consent).

Implementing Data Subject Rights

Privacy regulations grant individuals rights over their personal data. Implementing these rights at scale requires systematic processes, cross-system coordination, and careful handling to meet regulatory timelines.

Key Data Subject Rights:

Data Subject Rights Implementation
Right	Description	Regulatory Source	Implementation Challenge
Right to Access	Obtain copy of all personal data held	GDPR Art. 15, CCPA	Data discovery across distributed systems
Right to Rectification	Correct inaccurate personal data	GDPR Art. 16	Propagating corrections to replicas/caches
Right to Erasure	'Right to be forgotten' - delete personal data	GDPR Art. 17, CCPA	Complete deletion from all systems/backups
Right to Portability	Receive data in machine-readable format	GDPR Art. 20	Standardized export format
Right to Restriction	Limit processing of disputed data	GDPR Art. 18	Processing flags across systems
Right to Object	Object to certain processing (e.g., marketing)	GDPR Art. 21	Processing purpose enforcement

dsar-processing-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
// Data Subject Access Request (DSAR) Processing
// Orchestrates cross-system requests for data subject rights
 
interface DSARRequest {
  id: string;
  subjectId: string;
  requestType: DSARType;
  requestedAt: Date;
  deadline: Date;  // Regulatory deadline
  status: DSARStatus;
  verificationStatus: 'pending' | 'verified' | 'failed';
}
 
enum DSARType {
  ACCESS = 'access',
  ERASURE = 'erasure',
  RECTIFICATION = 'rectification',
  PORTABILITY = 'portability',
  RESTRICTION = 'restriction',
  OBJECTION = 'objection',
}
 
class DSARProcessingService {
  private dsarStore: DSARRepository;
  private identityVerifier: IdentityVerificationService;
  private dataSources: DataSourceRegistry;
  private notificationService: NotificationService;
 
  async submitRequest(
    subjectEmail: string,
    requestType: DSARType,
    details: DSARDetails
  ): Promise<DSARSubmissionResult> {
    // Create request record
    const request: DSARRequest = {
      id: generateId(),
      subjectId: await this.resolveSubjectId(subjectEmail),
      requestType,
      requestedAt: new Date(),
      deadline: this.calculateDeadline(requestType),
      status: 'received',
      verificationStatus: 'pending',
    };
    
    await this.dsarStore.create(request);
    
    // Initiate identity verification
    await this.identityVerifier.initiateVerification(
      request.id,
      subjectEmail,
      requestType
    );
    
    // Notify privacy team
    await this.notificationService.notifyPrivacyTeam(
      'dsar_received',
      request
    );
    
    return {
      requestId: request.id,
      deadline: request.deadline,
      nextStep: 'identity_verification',
    };
  }
 
  async processAccessRequest(requestId: string): Promise<AccessRequestResult> {
    const request = await this.dsarStore.getById(requestId);
    
    if (request.verificationStatus !== 'verified') {
      throw new Error('Identity not verified');
    }
    
    // Discover all data sources containing this subject's data
    const dataSources = await this.dataSources.getSourcesWithSubjectData(
      request.subjectId
    );
    
    // Collect data from each source
    const collectedData: SubjectDataPackage = {
      requestId,
      subjectId: request.subjectId,
      generatedAt: new Date(),
      categories: [],
    };
    
    for (const source of dataSources) {
      try {
        const sourceData = await source.extractSubjectData(request.subjectId);
        
        collectedData.categories.push({
          sourceName: source.name,
          sourceType: source.type,
          dataCategory: source.dataCategory,
          records: sourceData.records,
          retentionPeriod: source.retentionPolicy,
          processingPurpose: source.processingPurpose,
        });
      } catch (error) {
        // Log extraction failure but continue with other sources
        await this.logExtractionError(requestId, source.id, error);
      }
    }
    
    // Generate exportable package
    const exportPackage = await this.generateExportPackage(
      collectedData,
      'json'  // Machine-readable format for portability
    );
    
    // Update request status
    await this.dsarStore.updateStatus(requestId, 'completed');
    
    return {
      requestId,
      dataPackage: exportPackage,
      sourceCount: dataSources.length,
      recordCount: this.countRecords(collectedData),
    };
  }
 
  async processErasureRequest(requestId: string): Promise<ErasureRequestResult> {
    const request = await this.dsarStore.getById(requestId);
    
    // Check for legal exceptions preventing deletion
    const exceptions = await this.checkErasureExceptions(request.subjectId);
    
    if (exceptions.length > 0) {
      return {
        requestId,
        status: 'partially_completed',
        deletedSources: [],
        retainedSources: exceptions.map(e => ({
          source: e.source,
          reason: e.legalBasis,
          retentionPeriod: e.retentionRequired,
        })),
      };
    }
    
    // Discover and delete from all sources
    const dataSources = await this.dataSources.getSourcesWithSubjectData(
      request.subjectId
    );
    
    const deletionResults: DeletionResult[] = [];
    
    for (const source of dataSources) {
      const result = await source.deleteSubjectData(request.subjectId, {
        includeBackups: true,
        includeLogs: true,
        cryptographicErasure: source.supportsCryptoErasure,
      });
      
      deletionResults.push({
        sourceId: source.id,
        recordsDeleted: result.deletedCount,
        backupsScheduled: result.backupDeletionScheduled,
        completedAt: result.completedAt,
      });
    }
    
    // Notify third parties who received the data
    const thirdParties = await this.getThirdPartyRecipients(request.subjectId);
    for (const party of thirdParties) {
      await this.notifyThirdPartyDeletion(party, request.subjectId);
    }
    
    return {
      requestId,
      status: 'completed',
      deletedSources: deletionResults,
      thirdPartiesNotified: thirdParties.length,
    };
  }
 
  private calculateDeadline(requestType: DSARType): Date {
    const now = new Date();
    // GDPR: 30 days, extendable to 90 for complex requests
    // CCPA: 45 days, extendable to 90
    const baseDays = 30;
    return new Date(now.getTime() + baseDays * 24 * 60 * 60 * 1000);
  }
 
  private async checkErasureExceptions(
    subjectId: string
  ): Promise<ErasureException[]> {
    const exceptions: ErasureException[] = [];
    
    // Check for legal holds
    const legalHolds = await this.legalHoldService.getActiveHolds(subjectId);
    exceptions.push(...legalHolds.map(h => ({
      source: h.dataSource,
      legalBasis: 'legal_hold',
      retentionRequired: h.holdPeriod,
    })));
    
    // Check for regulatory retention requirements
    const retentionRequirements = await this.getRetentionRequirements(subjectId);
    exceptions.push(...retentionRequirements);
    
    // Check for ongoing contractual obligations
    const contracts = await this.getActiveContracts(subjectId);
    exceptions.push(...contracts.map(c => ({
      source: c.dataSource,
      legalBasis: 'contract_performance',
      retentionRequired: c.contractEndDate,
    })));
    
    return exceptions;
  }
}

Erasure Is Harder Than It Sounds

True data erasure requires deletion from production databases, read replicas, caches, search indexes, analytics systems, log files, backup tapes, and any third parties who received the data. Missing even one location can result in regulatory non-compliance. Implement comprehensive data lineage tracking to ensure complete erasure.

PII in Logs and Analytics

Logs and analytics systems are often overlooked as PII repositories, yet they frequently contain personal data that was inadvertently included. A single logging statement that includes req.body can expose user data to anyone with log access. Protecting PII in these systems requires proactive sanitization and careful architecture.

Common PII Exposure Points:

Where PII Leaks into Logs

•Request/Response Logging — Full HTTP payloads logged for debugging contain form data, API bodies, and headers with tokens.
•Error Messages — Exception messages often include user context, IDs, and data values that triggered the error.
•Query Logging — Database query logs include parameter values, often including search terms and filter values.
•URL Parameters — Even 'clean' access logs include URLs, which may contain email addresses, IDs, or tokens as query parameters.
•Stack Traces — Debug traces include variable values at each stack level, potentially exposing in-memory PII.
•Analytics Events — User behavior tracking includes identifiers, session data, and contextual information.

log-sanitization.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
// PII-Safe Logging Service
// Automatically sanitizes logs before persistence
 
interface SanitizationRule {
  pattern: RegExp;
  replacement: string;
  description: string;
}
 
class PIISafeLogger {
  private sanitizationRules: SanitizationRule[];
  private sensitiveFieldPaths: Set<string>;
  private baseLogger: Logger;
 
  constructor(config: LoggerConfig) {
    this.sanitizationRules = this.initializeSanitizationRules();
    this.sensitiveFieldPaths = new Set(config.sensitiveFields || [
      'password', 'token', 'secret', 'apiKey', 'authorization',
      'ssn', 'creditCard', 'email', 'phone', 'dob', 'address',
      'firstName', 'lastName', 'fullName', 'ipAddress',
    ]);
    this.baseLogger = config.baseLogger;
  }
 
  private initializeSanitizationRules(): SanitizationRule[] {
    return [
      // Email addresses
      {
        pattern: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
        replacement: '[EMAIL_REDACTED]',
        description: 'Email address',
      },
      // SSN patterns
      {
        pattern: /\b\d{3}-\d{2}-\d{4}\b/g,
        replacement: '[SSN_REDACTED]',
        description: 'Social Security Number',
      },
      // Credit card numbers
      {
        pattern: /\b(?:\d{4}[-\s]?){3}\d{4}\b/g,
        replacement: '[CARD_REDACTED]',
        description: 'Credit card number',
      },
      // Phone numbers
      {
        pattern: /\b\+?1?[-.]?\(?\d{3}\)?[-.]?\d{3}[-.]?\d{4}\b/g,
        replacement: '[PHONE_REDACTED]',
        description: 'Phone number',
      },
      // JWT tokens
      {
        pattern: /eyJ[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*/g,
        replacement: '[JWT_REDACTED]',
        description: 'JWT token',
      },
      // API keys (common patterns)
      {
        pattern: /api[_-]?key[=:]["']?[a-zA-Z0-9]{20,}["']?/gi,
        replacement: '[API_KEY_REDACTED]',
        description: 'API key',
      },
      // Bearer tokens
      {
        pattern: /Bearer\s+[a-zA-Z0-9._-]+/gi,
        replacement: 'Bearer [TOKEN_REDACTED]',
        description: 'Bearer token',
      },
    ];
  }
 
  log(level: LogLevel, message: string, context?: any): void {
    const sanitizedMessage = this.sanitizeString(message);
    const sanitizedContext = context ? 
      this.sanitizeObject(context) : undefined;
    
    this.baseLogger.log(level, sanitizedMessage, sanitizedContext);
  }
 
  private sanitizeString(input: string): string {
    let result = input;
    
    for (const rule of this.sanitizationRules) {
      result = result.replace(rule.pattern, rule.replacement);
    }
    
    return result;
  }
 
  private sanitizeObject(obj: any, path: string = ''): any {
    if (obj === null || obj === undefined) {
      return obj;
    }
    
    if (typeof obj === 'string') {
      return this.sanitizeString(obj);
    }
    
    if (typeof obj !== 'object') {
      return obj;
    }
    
    if (Array.isArray(obj)) {
      return obj.map((item, index) => 
        this.sanitizeObject(item, `${path}[${index}]`)
      );
    }
    
    const result: any = {};
    
    for (const [key, value] of Object.entries(obj)) {
      const fieldPath = path ? `${path}.${key}` : key;
      const keyLower = key.toLowerCase();
      
      // Check if field name indicates sensitive data
      if (this.isSensitiveFieldName(keyLower)) {
        result[key] = '[REDACTED]';
        continue;
      }
      
      // Recursively sanitize nested objects
      result[key] = this.sanitizeObject(value, fieldPath);
    }
    
    return result;
  }
 
  private isSensitiveFieldName(fieldName: string): boolean {
    for (const sensitive of this.sensitiveFieldPaths) {
      if (fieldName.includes(sensitive.toLowerCase())) {
        return true;
      }
    }
    return false;
  }
 
  // Create sanitized request logging middleware
  createRequestLogger(): RequestHandler {
    return (req, res, next) => {
      const startTime = Date.now();
      
      // Capture sanitized request info
      const requestLog = {
        method: req.method,
        path: this.sanitizeString(req.path),
        query: this.sanitizeObject(req.query),
        // Never log full body - only structure
        bodyKeys: req.body ? Object.keys(req.body) : [],
        headers: this.sanitizeHeaders(req.headers),
        ip: '[IP_LOGGED_SEPARATELY]',  // Log to separate restricted store
      };
      
      res.on('finish', () => {
        this.log('info', 'HTTP Request', {
          ...requestLog,
          status: res.statusCode,
          duration: Date.now() - startTime,
        });
      });
      
      next();
    };
  }
 
  private sanitizeHeaders(headers: any): any {
    const sensitiveHeaders = [
      'authorization', 'cookie', 'x-api-key', 'x-auth-token',
    ];
    
    const result: any = {};
    for (const [key, value] of Object.entries(headers)) {
      if (sensitiveHeaders.includes(key.toLowerCase())) {
        result[key] = '[REDACTED]';
      } else {
        result[key] = this.sanitizeString(String(value));
      }
    }
    return result;
  }
}

Analytics Privacy Patterns:

For analytics, where you need behavioral insights without individual tracking:

Technique	Description	Trade-off
Aggregation	Report only aggregate metrics (counts, averages)	Loses individual behavior patterns
Bucketing	Group continuous values into ranges (age: 25-34)	Reduces precision
K-Anonymity	Ensure each record is indistinguishable from k-1 others	May require data suppression
Differential Privacy	Add calibrated noise to queries	Reduces query accuracy slightly
Pseudonymization	Replace identifiers with reversible tokens	Still counts as personal data
Full Anonymization	Irreversibly remove all identifiers	May reduce data utility significantly

Prevention Over Remediation

It's far easier to prevent PII from entering logs than to remove it afterward. Implement sanitization at the logging framework level, not as an afterthought. Train developers to never log full request bodies, user objects, or query parameters without explicit sanitization.

Summary: PII Handling Excellence

Handling PII properly is both a legal requirement and an ethical responsibility. As systems process ever-more personal data, the consequences of mishandling grow more severe—for individuals and organizations alike. Mastering PII protection is essential for any system designer working with user data.

Key Principles:

PII Handling Principles

•Understand the PII spectrum — From direct identifiers to quasi-identifiers, sensitivity varies. Design protections proportionate to risk.
•Know your regulatory landscape — GDPR, CCPA, HIPAA, and other frameworks impose specific requirements. Design for the strictest applicable regulation.
•Discovery before protection — You cannot protect what you don't know exists. Implement continuous PII discovery across all data stores.
•Layer protection controls — Access control, encryption, masking, tokenization, and audit logging provide defense in depth.
•Manage consent systematically — Consent capture, storage, and enforcement require dedicated infrastructure, not ad-hoc solutions.
•Implement data subject rights — Access, deletion, correction, and portability require cross-system orchestration and careful process design.
•Protect logs and analytics — These systems often contain more PII than realized. Implement proactive sanitization and privacy-preserving analytics.

Next Steps:

With PII handling understood, we'll explore techniques for protecting data while maintaining utility. The next page covers Data Masking and Tokenization—methods for using data in non-production environments and across systems without exposing actual sensitive values.

Page Complete

You now understand PII as a category of data requiring specialized protection. You can identify PII across its many forms, navigate the regulatory landscape, implement discovery and protection controls, manage consent, and fulfill data subject rights. Next, we'll explore data masking and tokenization as techniques for protecting data while maintaining functionality.