Loading content...
Personally Identifiable Information (PII) represents a unique category of sensitive data that demands specialized handling. Unlike proprietary business data—where exposure causes organizational harm—PII exposure causes harm to individuals: identity theft, discrimination, stalking, financial fraud, and loss of privacy. This creates both legal obligations under privacy regulations and ethical responsibilities that go beyond compliance checklists.
Every system that processes user data—which means virtually every modern application—must grapple with PII handling. Whether you're building a simple signup form or a complex data analytics platform, the moment you collect a name, email address, or IP address, you've entered the realm of PII governance.
By the end of this page, you will understand how to identify PII across its many forms, learn the legal and regulatory frameworks governing PII, master technical controls for protecting PII in distributed systems, and design architectures that minimize PII exposure while maintaining functionality.
PII is any information that can be used to identify, locate, or contact a specific individual—either directly or when combined with other information. The definition varies slightly across regulations, but the core principle remains: if data can be traced back to a person, it requires protection.
The PII Spectrum:
Not all PII is equally sensitive. Understanding this spectrum helps apply proportionate protection:
| Category | Examples | Risk Level | Typical Controls |
|---|---|---|---|
| Direct Identifiers | Full name, SSN, passport number, driver's license | Very High | Encryption, strict access control, audit logging |
| Contact Information | Email address, phone number, physical address | High | Encryption, need-to-know access, masking in logs |
| Financial Data | Credit card numbers, bank accounts, income | Very High | PCI DSS compliance, tokenization, isolated storage |
| Health Information | Medical records, prescriptions, diagnoses | Very High | HIPAA compliance, specialized access controls |
| Biometrics | Fingerprints, facial recognition, voice prints | Critical | Cannot be changed if compromised, strictest controls |
| Quasi-Identifiers | ZIP code, birth date, gender, occupation | Medium→High | Low individually, dangerous when combined |
| Online Identifiers | IP addresses, device IDs, cookies | Medium | Context-dependent sensitivity, session-based handling |
| Behavioral Data | Purchase history, browsing patterns, location | Medium→High | Aggregation risk, pattern inference protection |
The Linkability Problem:
The most challenging aspect of PII is linkability—the ability to combine seemingly innocuous data points to identify individuals. Research has demonstrated:
This means PII protection extends beyond obvious identifiers. Any data that could potentially be linked to identify individuals requires careful handling.
The regulatory trend is toward broader PII definitions. GDPR considers any data that 'relates to' an identified or identifiable person as personal data. If there's any reasonable possibility that data could be linked to an individual, apply PII protections.
PII handling is governed by a complex patchwork of regulations that vary by jurisdiction, industry, and data type. Modern systems often must comply with multiple overlapping frameworks simultaneously. Understanding these requirements is essential for compliant system design.
Key Global Privacy Regulations:
| Requirement | GDPR | CCPA/CPRA | HIPAA |
|---|---|---|---|
| Consent Required | Yes (or other lawful basis) | Opt-out model for sale | Authorization for disclosure |
| Right to Access | Yes | Yes | Yes |
| Right to Delete | Yes (with exceptions) | Yes (with exceptions) | Amendment right |
| Data Portability | Yes (machine-readable) | Yes (specific format) | No |
| Breach Notification | 72 hours to authority | General timeliness | 60 days |
| Privacy Officer Required | DPO in certain cases | No | No (Privacy Officer common) |
| Cross-Border Transfer Limits | Adequacy decisions/SCCs | No | Business Associate Agreements |
| Penalties | Up to 4% global revenue | Up to $7,500 per violation | Up to $1.5M annually |
Key Principles Across Regulations:
Despite variation in specifics, privacy regulations share common principles that guide system design:
When operating across jurisdictions, design systems to meet the strictest applicable requirement. It's easier to relax controls where permitted than to retrofit stricter controls later. GDPR is often the benchmark for global systems.
You cannot protect what you don't know exists. PII discovery is the systematic process of identifying where personal data resides across your systems. In modern distributed architectures, PII often spreads across databases, file systems, logs, caches, analytics systems, third-party integrations, and backup storage. Comprehensive discovery requires multiple approaches.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189
// PII Discovery and Inventory Service// Automatically discovers and catalogs PII across data sources interface PIIInventoryItem { dataSourceId: string; location: string; // e.g., "users.email", "logs/app.log:line:1234" piiType: PIIType; confidence: number; discoveryMethod: 'schema' | 'pattern' | 'ml' | 'manual'; dataSubjectCategory: string; // e.g., 'customer', 'employee', 'prospect' processingPurpose: string[]; retentionPeriod: string; lastScanned: Date;} enum PIIType { EMAIL = 'email', PHONE = 'phone_number', SSN = 'social_security_number', NAME = 'full_name', ADDRESS = 'physical_address', DOB = 'date_of_birth', IP_ADDRESS = 'ip_address', CREDIT_CARD = 'credit_card', HEALTH_DATA = 'health_information', BIOMETRIC = 'biometric_data', FINANCIAL = 'financial_data', LOCATION = 'location_data',} class PIIDiscoveryService { private patternDetectors: Map<PIIType, RegExp[]>; private mlClassifier: NERModel; private inventoryStore: PIIInventoryRepository; constructor() { this.patternDetectors = this.initializePatterns(); this.mlClassifier = new NERModel('pii-detection-v3'); this.inventoryStore = new PIIInventoryRepository(); } private initializePatterns(): Map<PIIType, RegExp[]> { return new Map([ [PIIType.EMAIL, [ /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g ]], [PIIType.SSN, [ /\b\d{3}-\d{2}-\d{4}\b/g, /\b\d{9}\b/g // Unformatted SSN ]], [PIIType.CREDIT_CARD, [ /\b4[0-9]{12}(?:[0-9]{3})?\b/g, // Visa /\b5[1-5][0-9]{14}\b/g, // Mastercard /\b3[47][0-9]{13}\b/g, // Amex ]], [PIIType.PHONE, [ /\b\+?1?[-.]?\(?\d{3}\)?[-.]?\d{3}[-.]?\d{4}\b/g ]], [PIIType.IP_ADDRESS, [ /\b(?:\d{1,3}\.){3}\d{1,3}\b/g, /\b(?:[a-fA-F0-9]{1,4}:){7}[a-fA-F0-9]{1,4}\b/g ]], ]); } async scanDatabaseSchema( connectionInfo: DatabaseConnection ): Promise<PIIInventoryItem[]> { const discoveries: PIIInventoryItem[] = []; const schema = await this.fetchDatabaseSchema(connectionInfo); for (const table of schema.tables) { for (const column of table.columns) { const piiIndicators = this.analyzeColumnName(column.name); if (piiIndicators.length > 0) { // Schema name suggests PII - verify with data sampling const sampleData = await this.sampleColumn( connectionInfo, table.name, column.name, 100 // Sample size ); const confirmedTypes = await this.confirmPIIWithPatterns(sampleData); for (const piiType of confirmedTypes) { discoveries.push({ dataSourceId: connectionInfo.id, location: `${table.name}.${column.name}`, piiType: piiType.type, confidence: piiType.confidence, discoveryMethod: 'schema', dataSubjectCategory: this.inferDataSubject(table.name), processingPurpose: [], // Requires manual annotation retentionPeriod: 'unspecified', lastScanned: new Date(), }); } } } } return discoveries; } async scanLogs( logSource: LogSource, timeRange: TimeRange ): Promise<PIIDiscoveryResult> { const discoveries: PIIInventoryItem[] = []; const alertableFindings: PIIAlert[] = []; const logStream = await this.createLogStream(logSource, timeRange); for await (const logEntry of logStream) { const lineFindings = await this.scanText(logEntry.message); for (const finding of lineFindings) { // PII in logs is always concerning - alert immediately alertableFindings.push({ severity: 'high', location: `${logSource.name}:${logEntry.timestamp}`, piiType: finding.type, recommendation: 'Implement log sanitization', sample: this.redactForAlert(finding.match), }); discoveries.push({ dataSourceId: logSource.id, location: `logs/${logSource.name}`, piiType: finding.type, confidence: finding.confidence, discoveryMethod: 'pattern', dataSubjectCategory: 'unknown', processingPurpose: ['logging'], retentionPeriod: logSource.retentionPolicy, lastScanned: new Date(), }); } } return { discoveries, alerts: alertableFindings }; } private analyzeColumnName(name: string): PIIType[] { const indicators: PIIType[] = []; const lowerName = name.toLowerCase(); const mappings: [string[], PIIType][] = [ [['email', 'mail', 'e_mail'], PIIType.EMAIL], [['phone', 'mobile', 'cell', 'tel'], PIIType.PHONE], [['ssn', 'social_security', 'taxpayer_id'], PIIType.SSN], [['name', 'first_name', 'last_name', 'full_name'], PIIType.NAME], [['address', 'street', 'city', 'zip', 'postal'], PIIType.ADDRESS], [['dob', 'birth', 'birthday', 'date_of_birth'], PIIType.DOB], [['ip', 'ip_address', 'client_ip'], PIIType.IP_ADDRESS], [['card', 'credit', 'payment'], PIIType.CREDIT_CARD], ]; for (const [patterns, piiType] of mappings) { if (patterns.some(p => lowerName.includes(p))) { indicators.push(piiType); } } return indicators; } async generateInventoryReport(): Promise<PIIInventoryReport> { const inventory = await this.inventoryStore.getAll(); return { totalPIILocations: inventory.length, byType: this.groupBy(inventory, 'piiType'), byDataSource: this.groupBy(inventory, 'dataSourceId'), byDataSubject: this.groupBy(inventory, 'dataSubjectCategory'), highRiskLocations: inventory.filter( i => i.piiType === PIIType.SSN || i.piiType === PIIType.HEALTH_DATA || i.piiType === PIIType.BIOMETRIC ), unclassifiedLocations: inventory.filter( i => !i.processingPurpose || i.processingPurpose.length === 0 ), reportGeneratedAt: new Date(), }; }}PII discovery isn't a one-time activity. New features add new data fields, schema migrations change storage locations, and developers inadvertently log sensitive data. Implement continuous scanning with alerts for newly discovered PII.
Once you've identified where PII exists, you must apply appropriate protection controls. These controls operate at multiple layers, providing defense in depth against various threat vectors.
Layered Protection Architecture:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196
// PII-Aware API Middleware// Applies protection controls based on data sensitivity and user authorization interface PIIAccessContext { userId: string; userRoles: string[]; accessPurpose: string; clientIP: string; requestId: string; consentScope?: string[];} interface PIIField { fieldPath: string; piiType: PIIType; accessLevel: 'full' | 'masked' | 'tokenized' | 'denied';} class PIIProtectionMiddleware { private accessPolicy: PIIAccessPolicyEngine; private maskingService: DataMaskingService; private tokenService: TokenizationService; private auditLogger: PIIAuditLogger; async processRequest( request: APIRequest, context: PIIAccessContext ): Promise<APIRequest> { // Check if request contains PII const piiFields = await this.detectPIIInRequest(request.body); if (piiFields.length > 0) { // Verify consent for each PII field for (const field of piiFields) { const consentValid = await this.verifyConsent( field, context.accessPurpose, context.consentScope ); if (!consentValid) { throw new ConsentRequiredError( `Consent not provided for ${field.piiType} processing` ); } } // Log PII write operation await this.auditLogger.logPIIWrite({ requestId: context.requestId, userId: context.userId, piiTypes: piiFields.map(f => f.piiType), purpose: context.accessPurpose, timestamp: new Date(), }); } return request; } async processResponse( response: APIResponse, context: PIIAccessContext ): Promise<APIResponse> { const piiFields = await this.detectPIIInResponse(response.body); if (piiFields.length === 0) { return response; } // Determine access level for each PII field const fieldAccessLevels = await Promise.all( piiFields.map(field => this.determineAccessLevel(field, context) ) ); // Apply appropriate protection to each field let protectedBody = response.body; for (let i = 0; i < piiFields.length; i++) { const field = piiFields[i]; const accessLevel = fieldAccessLevels[i]; switch (accessLevel) { case 'denied': protectedBody = this.removeField(protectedBody, field.fieldPath); break; case 'masked': protectedBody = await this.maskField( protectedBody, field.fieldPath, field.piiType ); break; case 'tokenized': protectedBody = await this.tokenizeField( protectedBody, field.fieldPath ); break; case 'full': // No transformation needed, but log access break; } } // Log PII access await this.auditLogger.logPIIAccess({ requestId: context.requestId, userId: context.userId, piiFields: piiFields.map((f, i) => ({ type: f.piiType, accessLevel: fieldAccessLevels[i], })), purpose: context.accessPurpose, timestamp: new Date(), }); return { ...response, body: protectedBody }; } private async determineAccessLevel( field: PIIField, context: PIIAccessContext ): Promise<'full' | 'masked' | 'tokenized' | 'denied'> { // Check explicit permission for this PII type const permission = await this.accessPolicy.checkPermission( context.userRoles, field.piiType, context.accessPurpose ); if (!permission.allowed) { return 'denied'; } // Determine appropriate access level based on permission scope if (permission.scope === 'full') { return 'full'; } // For partial access, mask or tokenize based on use case if (context.accessPurpose === 'analytics') { return 'tokenized'; // Preserve linkability without exposing value } return 'masked'; // Show partial value for verification } private async maskField( data: any, fieldPath: string, piiType: PIIType ): Promise<any> { const value = this.getFieldValue(data, fieldPath); if (!value) return data; const maskedValue = this.maskingService.mask(value, piiType); return this.setFieldValue(data, fieldPath, maskedValue); }} // Masking strategies by PII typeclass DataMaskingService { mask(value: string, piiType: PIIType): string { switch (piiType) { case PIIType.EMAIL: // j***@example.com const [local, domain] = value.split('@'); return `${local[0]}${'*'.repeat(3)}@${domain}`; case PIIType.SSN: // ***-**-1234 return `***-**-${value.slice(-4)}`; case PIIType.CREDIT_CARD: // **** **** **** 1234 return `**** **** **** ${value.slice(-4)}`; case PIIType.PHONE: // (***) ***-1234 return `(***) ***-${value.slice(-4)}`; case PIIType.NAME: // J*** D** return value.split(' ') .map(part => `${part[0]}${'*'.repeat(Math.min(part.length - 1, 3))}`) .join(' '); default: // Generic redaction return '*'.repeat(value.length); } }}Masked data (showing last 4 digits of SSN, for example) is still PII. The partial data plus context may still identify individuals. Use masking for display purposes but implement proper access controls and logging as if dealing with full values.
Privacy regulations require documented consent for processing personal data (or other lawful bases under GDPR). Managing consent at scale requires dedicated infrastructure that captures, stores, and enforces user preferences across distributed systems.
Consent Management Requirements:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183
// Enterprise Consent Management Platform// Captures, stores, and enforces user consent across distributed systems interface ConsentRecord { subjectId: string; // The data subject (user) purpose: ProcessingPurpose; status: 'granted' | 'withdrawn' | 'expired'; grantedAt: Date; withdrawnAt?: Date; expiresAt?: Date; policyVersion: string; captureMethod: 'web_form' | 'api' | 'verbal' | 'written'; captureContext: { ipAddress: string; userAgent: string; formId?: string; legalText: string; };} enum ProcessingPurpose { SERVICE_DELIVERY = 'service_delivery', MARKETING_EMAIL = 'marketing_email', ANALYTICS = 'analytics', PERSONALIZATION = 'personalization', THIRD_PARTY_SHARING = 'third_party_sharing', RESEARCH = 'research',} class ConsentManagementService { private consentStore: ConsentRepository; private eventBus: EventBus; private policyService: PrivacyPolicyService; async grantConsent( subjectId: string, purposes: ProcessingPurpose[], captureContext: ConsentCaptureContext ): Promise<ConsentGrantResult> { const currentPolicy = await this.policyService.getCurrentPolicy(); const timestamp = new Date(); const records: ConsentRecord[] = purposes.map(purpose => ({ subjectId, purpose, status: 'granted', grantedAt: timestamp, policyVersion: currentPolicy.version, captureMethod: captureContext.method, captureContext: { ipAddress: captureContext.ipAddress, userAgent: captureContext.userAgent, formId: captureContext.formId, legalText: currentPolicy.getConsentText(purpose), }, })); // Atomically store all consent records await this.consentStore.storeConsents(records); // Publish consent events for downstream systems for (const record of records) { await this.eventBus.publish('consent.granted', { subjectId, purpose: record.purpose, timestamp, policyVersion: currentPolicy.version, }); } return { success: true, consentIds: records.map(r => r.id), effectiveDate: timestamp, }; } async withdrawConsent( subjectId: string, purposes: ProcessingPurpose[] ): Promise<ConsentWithdrawalResult> { const timestamp = new Date(); // Mark consents as withdrawn await this.consentStore.withdrawConsents(subjectId, purposes, timestamp); // Publish withdrawal events - downstream systems must react for (const purpose of purposes) { await this.eventBus.publish('consent.withdrawn', { subjectId, purpose, timestamp, // Include action requirements for downstream systems requiredActions: this.getRequiredActionsForWithdrawal(purpose), }); } return { success: true, withdrawnPurposes: purposes, effectiveDate: timestamp, }; } async checkConsent( subjectId: string, purpose: ProcessingPurpose ): Promise<ConsentCheckResult> { const consent = await this.consentStore.getActiveConsent( subjectId, purpose ); if (!consent) { return { hasConsent: false, reason: 'no_consent_on_record', }; } if (consent.status === 'withdrawn') { return { hasConsent: false, reason: 'consent_withdrawn', withdrawnAt: consent.withdrawnAt, }; } if (consent.expiresAt && consent.expiresAt < new Date()) { return { hasConsent: false, reason: 'consent_expired', expiredAt: consent.expiresAt, }; } // Check if policy version requires re-consent const currentPolicy = await this.policyService.getCurrentPolicy(); if (this.requiresReconsent(consent.policyVersion, currentPolicy.version)) { return { hasConsent: false, reason: 'policy_updated_requires_reconsent', previousPolicyVersion: consent.policyVersion, currentPolicyVersion: currentPolicy.version, }; } return { hasConsent: true, grantedAt: consent.grantedAt, policyVersion: consent.policyVersion, }; } async getSubjectConsents( subjectId: string ): Promise<ConsentRecord[]> { // For data subject access requests (DSAR) return this.consentStore.getAllConsents(subjectId); } private getRequiredActionsForWithdrawal( purpose: ProcessingPurpose ): RequiredAction[] { const actionMap: Record<ProcessingPurpose, RequiredAction[]> = { [ProcessingPurpose.MARKETING_EMAIL]: [ { system: 'email_platform', action: 'unsubscribe' }, { system: 'crm', action: 'update_preferences' }, ], [ProcessingPurpose.ANALYTICS]: [ { system: 'analytics', action: 'stop_tracking' }, { system: 'data_warehouse', action: 'exclude_from_analysis' }, ], [ProcessingPurpose.THIRD_PARTY_SHARING]: [ { system: 'integration_platform', action: 'revoke_data_shares' }, { system: 'partners', action: 'notify_data_deletion' }, ], // ... other mappings }; return actionMap[purpose] || []; }}Consent Propagation Challenge:
In distributed systems, consent granted in one service must be respected across all services that process the data. This requires:
Privacy regulations grant individuals rights over their personal data. Implementing these rights at scale requires systematic processes, cross-system coordination, and careful handling to meet regulatory timelines.
Key Data Subject Rights:
| Right | Description | Regulatory Source | Implementation Challenge |
|---|---|---|---|
| Right to Access | Obtain copy of all personal data held | GDPR Art. 15, CCPA | Data discovery across distributed systems |
| Right to Rectification | Correct inaccurate personal data | GDPR Art. 16 | Propagating corrections to replicas/caches |
| Right to Erasure | 'Right to be forgotten' - delete personal data | GDPR Art. 17, CCPA | Complete deletion from all systems/backups |
| Right to Portability | Receive data in machine-readable format | GDPR Art. 20 | Standardized export format |
| Right to Restriction | Limit processing of disputed data | GDPR Art. 18 | Processing flags across systems |
| Right to Object | Object to certain processing (e.g., marketing) | GDPR Art. 21 | Processing purpose enforcement |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212
// Data Subject Access Request (DSAR) Processing// Orchestrates cross-system requests for data subject rights interface DSARRequest { id: string; subjectId: string; requestType: DSARType; requestedAt: Date; deadline: Date; // Regulatory deadline status: DSARStatus; verificationStatus: 'pending' | 'verified' | 'failed';} enum DSARType { ACCESS = 'access', ERASURE = 'erasure', RECTIFICATION = 'rectification', PORTABILITY = 'portability', RESTRICTION = 'restriction', OBJECTION = 'objection',} class DSARProcessingService { private dsarStore: DSARRepository; private identityVerifier: IdentityVerificationService; private dataSources: DataSourceRegistry; private notificationService: NotificationService; async submitRequest( subjectEmail: string, requestType: DSARType, details: DSARDetails ): Promise<DSARSubmissionResult> { // Create request record const request: DSARRequest = { id: generateId(), subjectId: await this.resolveSubjectId(subjectEmail), requestType, requestedAt: new Date(), deadline: this.calculateDeadline(requestType), status: 'received', verificationStatus: 'pending', }; await this.dsarStore.create(request); // Initiate identity verification await this.identityVerifier.initiateVerification( request.id, subjectEmail, requestType ); // Notify privacy team await this.notificationService.notifyPrivacyTeam( 'dsar_received', request ); return { requestId: request.id, deadline: request.deadline, nextStep: 'identity_verification', }; } async processAccessRequest(requestId: string): Promise<AccessRequestResult> { const request = await this.dsarStore.getById(requestId); if (request.verificationStatus !== 'verified') { throw new Error('Identity not verified'); } // Discover all data sources containing this subject's data const dataSources = await this.dataSources.getSourcesWithSubjectData( request.subjectId ); // Collect data from each source const collectedData: SubjectDataPackage = { requestId, subjectId: request.subjectId, generatedAt: new Date(), categories: [], }; for (const source of dataSources) { try { const sourceData = await source.extractSubjectData(request.subjectId); collectedData.categories.push({ sourceName: source.name, sourceType: source.type, dataCategory: source.dataCategory, records: sourceData.records, retentionPeriod: source.retentionPolicy, processingPurpose: source.processingPurpose, }); } catch (error) { // Log extraction failure but continue with other sources await this.logExtractionError(requestId, source.id, error); } } // Generate exportable package const exportPackage = await this.generateExportPackage( collectedData, 'json' // Machine-readable format for portability ); // Update request status await this.dsarStore.updateStatus(requestId, 'completed'); return { requestId, dataPackage: exportPackage, sourceCount: dataSources.length, recordCount: this.countRecords(collectedData), }; } async processErasureRequest(requestId: string): Promise<ErasureRequestResult> { const request = await this.dsarStore.getById(requestId); // Check for legal exceptions preventing deletion const exceptions = await this.checkErasureExceptions(request.subjectId); if (exceptions.length > 0) { return { requestId, status: 'partially_completed', deletedSources: [], retainedSources: exceptions.map(e => ({ source: e.source, reason: e.legalBasis, retentionPeriod: e.retentionRequired, })), }; } // Discover and delete from all sources const dataSources = await this.dataSources.getSourcesWithSubjectData( request.subjectId ); const deletionResults: DeletionResult[] = []; for (const source of dataSources) { const result = await source.deleteSubjectData(request.subjectId, { includeBackups: true, includeLogs: true, cryptographicErasure: source.supportsCryptoErasure, }); deletionResults.push({ sourceId: source.id, recordsDeleted: result.deletedCount, backupsScheduled: result.backupDeletionScheduled, completedAt: result.completedAt, }); } // Notify third parties who received the data const thirdParties = await this.getThirdPartyRecipients(request.subjectId); for (const party of thirdParties) { await this.notifyThirdPartyDeletion(party, request.subjectId); } return { requestId, status: 'completed', deletedSources: deletionResults, thirdPartiesNotified: thirdParties.length, }; } private calculateDeadline(requestType: DSARType): Date { const now = new Date(); // GDPR: 30 days, extendable to 90 for complex requests // CCPA: 45 days, extendable to 90 const baseDays = 30; return new Date(now.getTime() + baseDays * 24 * 60 * 60 * 1000); } private async checkErasureExceptions( subjectId: string ): Promise<ErasureException[]> { const exceptions: ErasureException[] = []; // Check for legal holds const legalHolds = await this.legalHoldService.getActiveHolds(subjectId); exceptions.push(...legalHolds.map(h => ({ source: h.dataSource, legalBasis: 'legal_hold', retentionRequired: h.holdPeriod, }))); // Check for regulatory retention requirements const retentionRequirements = await this.getRetentionRequirements(subjectId); exceptions.push(...retentionRequirements); // Check for ongoing contractual obligations const contracts = await this.getActiveContracts(subjectId); exceptions.push(...contracts.map(c => ({ source: c.dataSource, legalBasis: 'contract_performance', retentionRequired: c.contractEndDate, }))); return exceptions; }}True data erasure requires deletion from production databases, read replicas, caches, search indexes, analytics systems, log files, backup tapes, and any third parties who received the data. Missing even one location can result in regulatory non-compliance. Implement comprehensive data lineage tracking to ensure complete erasure.
Logs and analytics systems are often overlooked as PII repositories, yet they frequently contain personal data that was inadvertently included. A single logging statement that includes req.body can expose user data to anyone with log access. Protecting PII in these systems requires proactive sanitization and careful architecture.
Common PII Exposure Points:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180
// PII-Safe Logging Service// Automatically sanitizes logs before persistence interface SanitizationRule { pattern: RegExp; replacement: string; description: string;} class PIISafeLogger { private sanitizationRules: SanitizationRule[]; private sensitiveFieldPaths: Set<string>; private baseLogger: Logger; constructor(config: LoggerConfig) { this.sanitizationRules = this.initializeSanitizationRules(); this.sensitiveFieldPaths = new Set(config.sensitiveFields || [ 'password', 'token', 'secret', 'apiKey', 'authorization', 'ssn', 'creditCard', 'email', 'phone', 'dob', 'address', 'firstName', 'lastName', 'fullName', 'ipAddress', ]); this.baseLogger = config.baseLogger; } private initializeSanitizationRules(): SanitizationRule[] { return [ // Email addresses { pattern: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, replacement: '[EMAIL_REDACTED]', description: 'Email address', }, // SSN patterns { pattern: /\b\d{3}-\d{2}-\d{4}\b/g, replacement: '[SSN_REDACTED]', description: 'Social Security Number', }, // Credit card numbers { pattern: /\b(?:\d{4}[-\s]?){3}\d{4}\b/g, replacement: '[CARD_REDACTED]', description: 'Credit card number', }, // Phone numbers { pattern: /\b\+?1?[-.]?\(?\d{3}\)?[-.]?\d{3}[-.]?\d{4}\b/g, replacement: '[PHONE_REDACTED]', description: 'Phone number', }, // JWT tokens { pattern: /eyJ[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*/g, replacement: '[JWT_REDACTED]', description: 'JWT token', }, // API keys (common patterns) { pattern: /api[_-]?key[=:]["']?[a-zA-Z0-9]{20,}["']?/gi, replacement: '[API_KEY_REDACTED]', description: 'API key', }, // Bearer tokens { pattern: /Bearer\s+[a-zA-Z0-9._-]+/gi, replacement: 'Bearer [TOKEN_REDACTED]', description: 'Bearer token', }, ]; } log(level: LogLevel, message: string, context?: any): void { const sanitizedMessage = this.sanitizeString(message); const sanitizedContext = context ? this.sanitizeObject(context) : undefined; this.baseLogger.log(level, sanitizedMessage, sanitizedContext); } private sanitizeString(input: string): string { let result = input; for (const rule of this.sanitizationRules) { result = result.replace(rule.pattern, rule.replacement); } return result; } private sanitizeObject(obj: any, path: string = ''): any { if (obj === null || obj === undefined) { return obj; } if (typeof obj === 'string') { return this.sanitizeString(obj); } if (typeof obj !== 'object') { return obj; } if (Array.isArray(obj)) { return obj.map((item, index) => this.sanitizeObject(item, `${path}[${index}]`) ); } const result: any = {}; for (const [key, value] of Object.entries(obj)) { const fieldPath = path ? `${path}.${key}` : key; const keyLower = key.toLowerCase(); // Check if field name indicates sensitive data if (this.isSensitiveFieldName(keyLower)) { result[key] = '[REDACTED]'; continue; } // Recursively sanitize nested objects result[key] = this.sanitizeObject(value, fieldPath); } return result; } private isSensitiveFieldName(fieldName: string): boolean { for (const sensitive of this.sensitiveFieldPaths) { if (fieldName.includes(sensitive.toLowerCase())) { return true; } } return false; } // Create sanitized request logging middleware createRequestLogger(): RequestHandler { return (req, res, next) => { const startTime = Date.now(); // Capture sanitized request info const requestLog = { method: req.method, path: this.sanitizeString(req.path), query: this.sanitizeObject(req.query), // Never log full body - only structure bodyKeys: req.body ? Object.keys(req.body) : [], headers: this.sanitizeHeaders(req.headers), ip: '[IP_LOGGED_SEPARATELY]', // Log to separate restricted store }; res.on('finish', () => { this.log('info', 'HTTP Request', { ...requestLog, status: res.statusCode, duration: Date.now() - startTime, }); }); next(); }; } private sanitizeHeaders(headers: any): any { const sensitiveHeaders = [ 'authorization', 'cookie', 'x-api-key', 'x-auth-token', ]; const result: any = {}; for (const [key, value] of Object.entries(headers)) { if (sensitiveHeaders.includes(key.toLowerCase())) { result[key] = '[REDACTED]'; } else { result[key] = this.sanitizeString(String(value)); } } return result; }}Analytics Privacy Patterns:
For analytics, where you need behavioral insights without individual tracking:
| Technique | Description | Trade-off |
|---|---|---|
| Aggregation | Report only aggregate metrics (counts, averages) | Loses individual behavior patterns |
| Bucketing | Group continuous values into ranges (age: 25-34) | Reduces precision |
| K-Anonymity | Ensure each record is indistinguishable from k-1 others | May require data suppression |
| Differential Privacy | Add calibrated noise to queries | Reduces query accuracy slightly |
| Pseudonymization | Replace identifiers with reversible tokens | Still counts as personal data |
| Full Anonymization | Irreversibly remove all identifiers | May reduce data utility significantly |
It's far easier to prevent PII from entering logs than to remove it afterward. Implement sanitization at the logging framework level, not as an afterthought. Train developers to never log full request bodies, user objects, or query parameters without explicit sanitization.
Handling PII properly is both a legal requirement and an ethical responsibility. As systems process ever-more personal data, the consequences of mishandling grow more severe—for individuals and organizations alike. Mastering PII protection is essential for any system designer working with user data.
Key Principles:
Next Steps:
With PII handling understood, we'll explore techniques for protecting data while maintaining utility. The next page covers Data Masking and Tokenization—methods for using data in non-production environments and across systems without exposing actual sensitive values.
You now understand PII as a category of data requiring specialized protection. You can identify PII across its many forms, navigate the regulatory landscape, implement discovery and protection controls, manage consent, and fulfill data subject rights. Next, we'll explore data masking and tokenization as techniques for protecting data while maintaining functionality.