System Design (HLD)Data Protection

Data Protection

LevelAdvanced

Duration90 mins

TopicData Protection

1 / 5

Data Classification

The Foundation of Data Protection

Every organization handles data—but not all data is created equal. A public blog post requires different protection than a customer's Social Security number. A marketing campaign document needs different controls than production database credentials. Data classification is the systematic process of categorizing data based on its sensitivity level and the impact of its unauthorized disclosure, modification, or destruction.

Without proper classification, organizations face a dangerous binary choice: either protect everything with maximum security (impractical and expensive) or treat all data the same (inevitably leaving sensitive data under-protected). Classification provides the foundation for proportionate protection—applying the right level of security to the right data at the right cost.

What You Will Learn

By the end of this page, you will understand the principles of data classification, learn to design and implement classification frameworks suitable for enterprise systems, recognize the relationship between classification and other security controls, and master strategies for maintaining accurate classification at scale across distributed systems.

Why Data Classification Matters

Data classification might seem like bureaucratic overhead—another checkbox in a compliance audit. But in practice, classification is the decision framework that enables every other data security control. Without knowing what data you have and how sensitive it is, you cannot make informed decisions about:

Where data can be stored (regulatory requirements)
Who can access it (authorization policies)
How it should be transmitted (encryption requirements)
How long to keep it (retention policies)
How to destroy it (secure deletion procedures)
What happens when it's breached (incident response priority)

Classification transforms vague security intentions into concrete, enforceable policies.

Critical Functions of Data Classification

•Risk Prioritization — Enables organizations to focus security investments where they matter most, protecting crown jewels while maintaining operational efficiency for less sensitive data.
•Access Control Foundation — Provides the basis for role-based and attribute-based access control decisions, ensuring users access only data appropriate for their role.
•Compliance Enablement — Maps directly to regulatory requirements (GDPR Article 9 for special categories, HIPAA for PHI, PCI DSS for cardholder data), enabling automated compliance verification.
•Incident Response Prioritization — Determines response urgency and escalation paths when breaches occur, ensuring critical exposures receive immediate attention.
•Cost Optimization — Prevents over-engineering security for non-sensitive data while ensuring adequate protection for sensitive data—balancing security with operational cost.

The Classification Paradox

Without classification, organizations often err in both directions simultaneously: over-protecting trivial data (creating friction and cost) while under-protecting critical data (creating breach risk). Classification resolves this paradox by enabling proportionate, defensible security decisions.

Designing Classification Frameworks

A classification framework defines the categories used to classify data, the criteria for assignment, and the handling requirements for each level. Well-designed frameworks share common characteristics: they are simple enough for consistent application, comprehensive enough to cover all data types, and aligned with business and regulatory requirements.

Framework Design Principles:

Parsimony — Use the minimum number of levels necessary. Each additional level increases complexity and classification errors.
Mutual Exclusivity — Every data element should fit into exactly one classification level.
Collective Exhaustiveness — All possible data types must have a classification home.
Actionability — Each level must have clear, enforceable handling requirements.
Stability — Framework changes should be rare and carefully managed.

Common Classification Level Structure
Level	Also Known As	Description	Example Data
Public	Unclassified, Open	Data intended for public consumption with no confidentiality requirements	Marketing materials, public APIs, open-source code
Internal	Private, Internal Use Only	Data for internal use that shouldn't be public but poses low risk if disclosed	Internal documentation, organization charts, non-sensitive policies
Confidential	Sensitive, Restricted	Business-sensitive data whose disclosure could harm the organization	Financial reports, strategic plans, customer lists, source code
Highly Confidential	Secret, Strictly Confidential	Most sensitive data requiring maximum protection	Trade secrets, M&A information, encryption keys, PII, credentials

Industry-Standard Frameworks:

Rather than designing from scratch, organizations often adapt established frameworks:

Government Classification (Unclassified → Confidential → Secret → Top Secret): Used by military and intelligence agencies, with strict legal requirements.
ISO 27001 Annex A.8.2: Provides guidance for information classification but leaves specific levels to organizational discretion.
NIST SP 800-60: Maps information types to confidentiality, integrity, and availability impact levels (Low/Moderate/High).
Enterprise Four-Tier Model: The most common commercial approach, using Public/Internal/Confidential/Highly Confidential levels.

The four-tier model balances simplicity with granularity. Fewer levels reduce classification errors; more levels enable finer-grained controls. Four levels work well for most organizations.

Start Simple, Evolve Carefully

It's easier to add classification levels than to remove them. Start with three or four levels and only add more when you have clear evidence that existing levels don't provide sufficient granularity for meaningful policy differentiation.

Defining Classification Criteria

The hardest part of classification isn't defining levels—it's determining which level applies to specific data. Classification criteria provide objective guidelines that enable consistent classification across the organization, reducing reliance on subjective judgment.

Effective criteria consider multiple dimensions of data sensitivity:

Primary Classification Dimensions

•Regulatory Requirements — Legal mandates that specify handling requirements. PII under GDPR, PHI under HIPAA, and cardholder data under PCI DSS automatically require elevated classification regardless of other factors.
•Business Impact — The financial, operational, and reputational consequences of disclosure. Would exposure cost thousands or millions? Would it require public notification? Would competitors gain advantage?
•Confidentiality Expectations — Explicit agreements (NDAs, contracts) or implicit expectations of privacy. Customer data shared in confidence differs from publicly posted information.
•Aggregation Risk — Individual data elements may be low-sensitivity, but combinations become high-sensitivity. Individual transaction timestamps are benign; patterns reveal business intelligence.
•Temporal Sensitivity — Some data is sensitive only for a period. Pre-announcement M&A details are highly confidential until the deal closes, then may become public.

classification-decision-tree.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# Classification Decision Tree
# Apply rules in order; first matching rule determines classification
 
classification_rules:
  - name: "Regulatory PII"
    description: "Personal data protected by GDPR, CCPA, or similar"
    criteria:
      - data_type: ["SSN", "national_id", "passport", "biometric", "health_data"]
      - contains_direct_identifier: true
    classification: "highly_confidential"
    handling:
      encryption: "required_at_rest_and_transit"
      access_logging: "required"
      retention: "per_regulation"
      deletion: "cryptographic_erase"
 
  - name: "Authentication Credentials"
    description: "Secrets that grant system access"
    criteria:
      - data_type: ["password", "api_key", "certificate", "private_key", "token"]
    classification: "highly_confidential"
    handling:
      encryption: "required_at_rest_and_transit"
      access_logging: "required"
      storage: "secrets_manager_only"
      rotation: "mandatory"
 
  - name: "Financial Data"
    description: "Data with direct financial impact"
    criteria:
      - data_type: ["payment_card", "bank_account", "salary", "revenue"]
    classification: "confidential"
    handling:
      encryption: "required_at_rest_and_transit"
      access_control: "need_to_know"
      audit_trail: "required"
 
  - name: "Business Sensitive"
    description: "Proprietary business information"
    criteria:
      - data_type: ["strategic_plan", "customer_list", "pricing", "source_code"]
    classification: "confidential"
    handling:
      encryption: "required_in_transit"
      access_control: "role_based"
      external_sharing: "restricted"
 
  - name: "Internal Operations"
    description: "Internal documents and communications"
    criteria:
      - internal_only: true
      - public_impact: "low"
    classification: "internal"
    handling:
      encryption: "recommended_in_transit"
      access_control: "employee_only"
 
  - name: "Default Public"
    description: "All other data"
    criteria:
      - default: true
    classification: "public"
    handling:
      encryption: "optional"
      access_control: "open"

The Role of Data Owners:

Ultimately, classification requires human judgment. Data owners—typically business stakeholders responsible for data domains—make final classification decisions. The framework provides guidance, but owners understand the business context that technology cannot capture.

Effective programs establish:

Clear ownership assignment for all data domains
Owner training on classification criteria and implications
Escalation paths for ambiguous cases
Periodic reviews to validate ongoing accuracy

Automated Classification Systems

Manual classification doesn't scale. In systems processing millions of records daily, human review is impossible. Automated classification uses pattern matching, machine learning, and metadata analysis to classify data at scale, with human oversight for edge cases.

Automated Classification Approaches:

Pattern-Based Classification

•Regular expressions for structured data (SSNs, credit cards, phone numbers)
•Keyword matching for document classification (confidential, proprietary, internal)
•Schema analysis for database fields (column names like 'ssn', 'password_hash')
•High precision for well-defined patterns
•Limited flexibility for novel formats

ML-Based Classification

•NLP models for unstructured text classification
•Named entity recognition for PII detection
•Content analysis for image classification (documents, IDs, faces)
•Adapts to complex, variable content
•Requires training data and ongoing tuning

data-classifier-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
// Enterprise Data Classification Service
// Combines pattern matching, ML inference, and policy rules
 
interface ClassificationResult {
  classification: ClassificationLevel;
  confidence: number;
  matchedRules: string[];
  dataTypes: string[];
  requiresReview: boolean;
}
 
enum ClassificationLevel {
  PUBLIC = 'public',
  INTERNAL = 'internal',
  CONFIDENTIAL = 'confidential',
  HIGHLY_CONFIDENTIAL = 'highly_confidential',
}
 
interface DetectedDataType {
  type: string;
  confidence: number;
  location: string;
  sample?: string; // Masked for logging
}
 
class DataClassificationService {
  private patternDetectors: PatternDetector[];
  private mlClassifier: MLClassificationModel;
  private policyEngine: PolicyEngine;
 
  constructor(config: ClassificationConfig) {
    this.patternDetectors = [
      new SSNDetector(),
      new CreditCardDetector(),
      new EmailDetector(),
      new PhoneNumberDetector(),
      new APIKeyDetector(),
      new PasswordPatternDetector(),
    ];
    this.mlClassifier = new MLClassificationModel(config.modelEndpoint);
    this.policyEngine = new PolicyEngine(config.policies);
  }
 
  async classifyData(
    content: string | Buffer,
    metadata: DataMetadata
  ): Promise<ClassificationResult> {
    // Layer 1: Pattern-based detection (fast, high-precision)
    const patternMatches = await this.runPatternDetection(content);
    
    // Layer 2: ML-based detection (comprehensive, moderate precision)
    const mlDetections = await this.mlClassifier.analyze(content);
    
    // Layer 3: Metadata analysis (schema, location, origin)
    const metadataSignals = this.analyzeMetadata(metadata);
    
    // Combine detections with deduplication
    const allDetections = this.mergeDetections(
      patternMatches,
      mlDetections,
      metadataSignals
    );
    
    // Apply policy rules to determine classification
    const classification = await this.policyEngine.evaluate(
      allDetections,
      metadata
    );
    
    // Flag low-confidence results for human review
    const requiresReview = 
      classification.confidence < 0.85 ||
      allDetections.some(d => d.confidence < 0.7);
    
    // Emit classification event for audit and monitoring
    await this.emitClassificationEvent({
      dataId: metadata.id,
      classification: classification.level,
      confidence: classification.confidence,
      timestamp: new Date(),
      requiresReview,
    });
    
    return {
      classification: classification.level,
      confidence: classification.confidence,
      matchedRules: classification.matchedRules,
      dataTypes: allDetections.map(d => d.type),
      requiresReview,
    };
  }
 
  private async runPatternDetection(
    content: string | Buffer
  ): Promise<DetectedDataType[]> {
    const results: DetectedDataType[] = [];
    const contentStr = content.toString();
    
    for (const detector of this.patternDetectors) {
      const matches = await detector.detect(contentStr);
      results.push(...matches.map(m => ({
        type: detector.dataType,
        confidence: 0.95, // Pattern matches have high confidence
        location: m.location,
        sample: this.maskSample(m.value),
      })));
    }
    
    return results;
  }
 
  private analyzeMetadata(metadata: DataMetadata): DetectedDataType[] {
    const signals: DetectedDataType[] = [];
    
    // Column/field name analysis
    const sensitiveNamePatterns = {
      'ssn': 'social_security_number',
      'password': 'credential',
      'secret': 'credential',
      'credit_card': 'payment_card',
      'salary': 'financial_data',
      'dob': 'personal_data',
      'birth': 'personal_data',
    };
    
    for (const [pattern, dataType] of Object.entries(sensitiveNamePatterns)) {
      if (metadata.fieldName?.toLowerCase().includes(pattern)) {
        signals.push({
          type: dataType,
          confidence: 0.8,
          location: `field:${metadata.fieldName}`,
        });
      }
    }
    
    // Source system analysis
    if (metadata.sourceSystem) {
      const systemClassifications = this.policyEngine
        .getSystemDefaultClassification(metadata.sourceSystem);
      if (systemClassifications) {
        signals.push(...systemClassifications);
      }
    }
    
    return signals;
  }
 
  private maskSample(value: string): string {
    // Never log actual sensitive data
    if (value.length <= 4) return '****';
    return value.substring(0, 2) + '*'.repeat(value.length - 4) + 
           value.substring(value.length - 2);
  }
}

Defense in Depth Through Multiple Detectors

Combining pattern matching (high precision, limited recall) with ML classification (broader coverage, lower precision) provides better overall detection. Pattern matchers catch well-defined sensitive data; ML catches novel variations and unstructured content.

Classification Inheritance and Aggregation

When classified data is combined, copied, or transformed, the resulting data inherits classification from its sources. This classification inheritance follows a critical principle: the highest classification wins.

Inheritance Rules:

Classification Propagation Principles

•Container Inheritance — Any container (file, database, API response) containing highly confidential data becomes highly confidential, regardless of other content.
•Aggregate Escalation — Combining individually low-sensitivity data may create higher-sensitivity aggregates. Ten public data points may reveal confidential patterns.
•Transformation Persistence — Processed, derived, or transformed data retains source classification unless specific declassification procedures are applied.
•Copy Propagation — Copies, backups, and replicas inherit classification from originals and must maintain equivalent protection.
•Metadata Classification — Even metadata (filenames, access logs, timestamps) may require classification if it reveals sensitive information.

Classification Aggregation Examples
Data Source A	Data Source B	Combined Result	Reason
Public customer names	Public company names	Public	No escalation—both inputs are public
Internal performance data	Public market data	Internal	Highest classification wins
Confidential revenue data	Internal department list	Confidential	Highest classification wins
Internal individual purchases	Internal individual purchases (many)	Confidential	Aggregation reveals patterns
Confidential unencrypted data	Highly confidential encryption key	Highly Confidential	Key exposure enables data access

The Aggregation Risk Challenge:

The most insidious classification challenge is aggregation risk—when individually innocuous data becomes sensitive when combined. Classic examples:

Individual location check-ins are low-risk; patterns reveal home addresses and daily routines.
Individual product views are internal; aggregated patterns reveal strategic market shifts.
Individual salary data points are scattered; combined data reveals pay inequities or enables identity theft.

Mitigating aggregation risk requires:

Pre-defined aggregation rules that automatically elevate classification when thresholds are crossed
Data minimization to limit unnecessary combination of data sources
Differential privacy techniques that add noise to aggregates while preserving utility
Access control boundaries that prevent unauthorized cross-dataset analysis

Declassification Is Rare

Upgrading classification is automatic; downgrading requires explicit, audited procedures. Declassification typically requires documented business justification, owner approval, verification that sensitivity has genuinely reduced, and audit trail maintenance.

Classification Governance and Lifecycle

Classification is not a one-time activity—it's an ongoing governance responsibility. Data sensitivity changes over time, regulations evolve, and systems grow. A mature classification program requires continuous governance to remain accurate and effective.

Classification Lifecycle:

Classification Governance Activities

•Initial Classification — Applied when data is created or ingested, either through automated detection or owner designation.
•Ongoing Validation — Periodic reviews (annual or trigger-based) to verify classification accuracy, typically driven by data owners.
•Reclassification — Formal process to upgrade or downgrade classification when conditions change, with full audit trail.
•Exception Management — Documented processes for handling edge cases that don't fit standard criteria, with approval workflows.
•Framework Updates — Periodic review of the classification framework itself to accommodate new data types and regulatory requirements.

classification-governance-api.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
// Classification Governance Service
// Manages classification lifecycle with full audit trail
 
interface ClassificationChange {
  dataAssetId: string;
  previousClassification: ClassificationLevel;
  newClassification: ClassificationLevel;
  reason: string;
  justification: string;
  approver?: string;
  effectiveDate: Date;
}
 
interface ClassificationReview {
  dataAssetId: string;
  reviewer: string;
  reviewDate: Date;
  outcome: 'confirmed' | 'reclassification_needed' | 'escalated';
  notes: string;
  nextReviewDate: Date;
}
 
class ClassificationGovernanceService {
  private auditLog: AuditLogService;
  private notificationService: NotificationService;
  private approvalWorkflow: ApprovalWorkflowService;
 
  async requestReclassification(
    request: ReclassificationRequest
  ): Promise<ReclassificationResult> {
    // Validate requester authorization
    await this.validateRequesterAccess(request.requesterId, request.dataAssetId);
    
    const currentClassification = await this.getCurrentClassification(
      request.dataAssetId
    );
    
    // Determine if approval is required
    const requiresApproval = this.determineApprovalRequirement(
      currentClassification,
      request.newClassification
    );
    
    if (requiresApproval) {
      // Initiate approval workflow for classification changes
      const workflow = await this.approvalWorkflow.initiate({
        type: 'classification_change',
        dataAssetId: request.dataAssetId,
        currentState: currentClassification,
        requestedState: request.newClassification,
        justification: request.justification,
        requester: request.requesterId,
        approvers: await this.getRequiredApprovers(
          request.dataAssetId,
          currentClassification,
          request.newClassification
        ),
      });
      
      return {
        status: 'pending_approval',
        workflowId: workflow.id,
        estimatedCompletionTime: workflow.estimatedCompletionTime,
      };
    }
    
    // Apply classification change immediately for non-approval cases
    return this.applyClassificationChange({
      dataAssetId: request.dataAssetId,
      previousClassification: currentClassification,
      newClassification: request.newClassification,
      reason: 'owner_request',
      justification: request.justification,
      effectiveDate: new Date(),
    });
  }
 
  async conductPeriodicReview(
    dataAssetId: string,
    reviewerId: string
  ): Promise<ClassificationReview> {
    const asset = await this.getDataAsset(dataAssetId);
    const currentClassification = asset.classification;
    
    // Run automated classification check
    const automatedAssessment = await this.classificationService
      .assessCurrentClassification(dataAssetId);
    
    // Prepare review with automated findings
    const review: ClassificationReview = {
      dataAssetId,
      reviewer: reviewerId,
      reviewDate: new Date(),
      outcome: 'confirmed',
      notes: '',
      nextReviewDate: this.calculateNextReviewDate(currentClassification),
    };
    
    // Flag discrepancies for human review
    if (automatedAssessment.recommendedClassification !== currentClassification) {
      review.outcome = 'reclassification_needed';
      review.notes = `Automated assessment suggests ${automatedAssessment.recommendedClassification}. Current: ${currentClassification}. Reason: ${automatedAssessment.reasoning}`;
      
      await this.notificationService.notifyDataOwner(
        asset.ownerId,
        'classification_review_discrepancy',
        { dataAssetId, finding: review.notes }
      );
    }
    
    // Log review in audit trail
    await this.auditLog.log({
      event: 'classification_review',
      dataAssetId,
      reviewer: reviewerId,
      outcome: review.outcome,
      timestamp: review.reviewDate,
    });
    
    return review;
  }
 
  private determineApprovalRequirement(
    current: ClassificationLevel,
    requested: ClassificationLevel
  ): boolean {
    // Downgrading always requires approval
    if (this.classificationOrdinal(requested) < this.classificationOrdinal(current)) {
      return true;
    }
    // Upgrading to highest level requires verification
    if (requested === ClassificationLevel.HIGHLY_CONFIDENTIAL) {
      return true;
    }
    return false;
  }
 
  private classificationOrdinal(level: ClassificationLevel): number {
    const order = {
      [ClassificationLevel.PUBLIC]: 0,
      [ClassificationLevel.INTERNAL]: 1,
      [ClassificationLevel.CONFIDENTIAL]: 2,
      [ClassificationLevel.HIGHLY_CONFIDENTIAL]: 3,
    };
    return order[level];
  }
}

Key Governance Metrics:

Effective classification programs track:

Metric	Description	Target
Classification Coverage	% of data assets with assigned classification	>95%
Review Currency	% of assets reviewed within required period	>90%
Automated Classification Rate	% of classifications assigned automatically	>70%
Exception Rate	% of assets requiring manual exception handling	<10%
Reclassification Frequency	Rate of classification changes over time	Stable or declining
Detection Lag	Time between data creation and classification	<24 hours

Integration with Security Controls

Classification is only valuable when it drives security controls. The classification level must translate into concrete, enforceable requirements across the technology stack. This integration occurs at multiple layers:

Classification-Driven Control Matrix:

Security Controls by Classification Level
Control	Public	Internal	Confidential	Highly Confidential
Encryption in Transit	Optional (HTTPS)	TLS 1.2+ required	TLS 1.3 required	mTLS + TLS 1.3
Encryption at Rest	Not required	Recommended	AES-256 required	AES-256 + envelope encryption
Access Control	Open	Authenticated users	Role-based + need-to-know	MFA + explicit approval
Access Logging	Basic (errors only)	Standard logging	Detailed audit logging	Full audit + real-time alerts
Data Masking	Not required	Not required	Non-production environments	All non-authorized access
Retention Limit	Business discretion	5 years default	Per data type policy	Minimum necessary + legal hold
Deletion Method	Standard delete	Logical delete	Secure overwrite	Cryptographic erasure + verification
Backup Requirements	Standard	Standard + offsite	Encrypted + tested recovery	Encrypted + geographic isolation

Policy Enforcement Architecture:

Modern systems implement classification-aware controls at multiple layers:

Data Layer — Databases enforce classification-based access controls, encryption policies, and masking rules directly on data storage.
API Layer — API gateways check caller authorization against data classification before returning responses.
Application Layer — Business logic respects classification when aggregating or transforming data.
Network Layer — Network segmentation isolates traffic for different classification levels.
Monitoring Layer — SIEM and monitoring tools alert on classification policy violations.

The goal is defense in depth: if one layer fails, others continue enforcing classification requirements.

classification-aware-data-access.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
// Classification-Aware Data Access Layer
// Enforces classification policies on all data operations
 
interface DataAccessContext {
  userId: string;
  userRole: string;
  userClearance: ClassificationLevel;
  accessReason: string;
  requestId: string;
}
 
class ClassificationAwareRepository<T> {
  private baseRepository: Repository<T>;
  private policyEngine: ClassificationPolicyEngine;
  private auditLogger: DataAccessAuditLogger;
  private maskingService: DataMaskingService;
 
  async findById(
    id: string,
    context: DataAccessContext
  ): Promise<T | null> {
    const record = await this.baseRepository.findById(id);
    if (!record) return null;
    
    const classification = await this.getRecordClassification(record);
    
    // Check access authorization
    const accessDecision = await this.policyEngine.evaluateAccess({
      requester: context,
      resource: { id, classification },
      operation: 'read',
    });
    
    if (!accessDecision.allowed) {
      await this.auditLogger.logDeniedAccess({
        userId: context.userId,
        resourceId: id,
        classification,
        reason: accessDecision.reason,
        timestamp: new Date(),
      });
      throw new AccessDeniedError(
        `Access denied to ${classification} data: ${accessDecision.reason}`
      );
    }
    
    // Apply masking based on clearance level delta
    const maskedRecord = await this.applyAppropriateeMasking(
      record,
      classification,
      context.userClearance
    );
    
    // Log successful access for audit trail
    await this.auditLogger.logDataAccess({
      userId: context.userId,
      resourceId: id,
      classification,
      operation: 'read',
      masked: maskedRecord !== record,
      timestamp: new Date(),
      requestId: context.requestId,
    });
    
    return maskedRecord;
  }
 
  private async applyAppropriateeMasking(
    record: T,
    resourceClassification: ClassificationLevel,
    userClearance: ClassificationLevel
  ): Promise<T> {
    // Users with matching or higher clearance see unmasked data
    if (this.classificationOrdinal(userClearance) >= 
        this.classificationOrdinal(resourceClassification)) {
      return record;
    }
    
    // Apply field-level masking based on classification delta
    return this.maskingService.applyMasking(record, {
      resourceClassification,
      viewerClearance: userClearance,
    });
  }
 
  async query(
    criteria: QueryCriteria,
    context: DataAccessContext
  ): Promise<T[]> {
    // Add classification filter to prevent returning inaccessible records
    const classificationFilter = this.buildClassificationFilter(
      context.userClearance
    );
    
    const enhancedCriteria = {
      ...criteria,
      filters: [...(criteria.filters || []), classificationFilter],
    };
    
    const results = await this.baseRepository.query(enhancedCriteria);
    
    // Log bulk access
    await this.auditLogger.logBulkAccess({
      userId: context.userId,
      query: this.sanitizeQueryForLogging(criteria),
      resultCount: results.length,
      timestamp: new Date(),
      requestId: context.requestId,
    });
    
    return results;
  }
 
  private buildClassificationFilter(
    userClearance: ClassificationLevel
  ): QueryFilter {
    // Return only records at or below user's clearance level
    const accessibleLevels = this.getAccessibleLevels(userClearance);
    return {
      field: 'classification',
      operator: 'in',
      value: accessibleLevels,
    };
  }
}

Fail Closed, Not Open

When classification cannot be determined (system errors, missing metadata), default to the highest classification level. It's better to over-protect temporarily than to under-protect even briefly. Unclassified data should be treated as confidential until properly assessed.

Summary: Data Classification Mastery

Data classification is the foundational discipline that enables proportionate, effective data protection. Without it, organizations either over-invest in protecting trivial data or under-invest in protecting critical data. With it, security investments align with actual risk.

Key Takeaways:

Classification Principles to Remember

•Classification enables proportionate protection — It's the decision framework that drives all other security controls, ensuring the right protection for the right data.
•Simple frameworks outperform complex ones — Three to four classification levels work for most organizations; more levels increase error rates without adding value.
•Automation is essential at scale — Pattern matching and ML classification enable consistent classification across millions of records with human oversight for edge cases.
•Highest classification wins — When data is combined or copied, the result inherits the highest classification of any input; downgrading requires explicit approval.
•Classification drives controls — Each level must map to concrete requirements for encryption, access control, logging, retention, and deletion.
•Governance is continuous — Classification accuracy requires ongoing validation, periodic reviews, and lifecycle management, not one-time effort.

Next Steps:

With classification as our foundation, we can now explore specific data protection challenges. The next page examines PII Handling—the specialized requirements for personal data that create both legal obligations and ethical responsibilities for system designers.

Page Complete

You now understand data classification as the foundation of data protection. You can design classification frameworks, implement automated classification systems, manage classification inheritance, and integrate classification with security controls. Next, we'll explore the specific requirements for handling personally identifiable information (PII).

1 / 5

Loading learning content...

System Design (HLD)Data Protection

Data Protection

LevelAdvanced

Duration90 mins

TopicData Protection

1 / 5

Data Classification

The Foundation of Data Protection

What You Will Learn

Why Data Classification Matters

Where data can be stored (regulatory requirements)
Who can access it (authorization policies)
How it should be transmitted (encryption requirements)
How long to keep it (retention policies)
How to destroy it (secure deletion procedures)
What happens when it's breached (incident response priority)

Classification transforms vague security intentions into concrete, enforceable policies.

Critical Functions of Data Classification

•Risk Prioritization — Enables organizations to focus security investments where they matter most, protecting crown jewels while maintaining operational efficiency for less sensitive data.
•Access Control Foundation — Provides the basis for role-based and attribute-based access control decisions, ensuring users access only data appropriate for their role.
•Compliance Enablement — Maps directly to regulatory requirements (GDPR Article 9 for special categories, HIPAA for PHI, PCI DSS for cardholder data), enabling automated compliance verification.
•Incident Response Prioritization — Determines response urgency and escalation paths when breaches occur, ensuring critical exposures receive immediate attention.
•Cost Optimization — Prevents over-engineering security for non-sensitive data while ensuring adequate protection for sensitive data—balancing security with operational cost.

The Classification Paradox

Designing Classification Frameworks

Framework Design Principles:

Parsimony — Use the minimum number of levels necessary. Each additional level increases complexity and classification errors.
Mutual Exclusivity — Every data element should fit into exactly one classification level.
Collective Exhaustiveness — All possible data types must have a classification home.
Actionability — Each level must have clear, enforceable handling requirements.
Stability — Framework changes should be rare and carefully managed.

Common Classification Level Structure
Level	Also Known As	Description	Example Data
Public	Unclassified, Open	Data intended for public consumption with no confidentiality requirements	Marketing materials, public APIs, open-source code
Internal	Private, Internal Use Only	Data for internal use that shouldn't be public but poses low risk if disclosed	Internal documentation, organization charts, non-sensitive policies
Confidential	Sensitive, Restricted	Business-sensitive data whose disclosure could harm the organization	Financial reports, strategic plans, customer lists, source code
Highly Confidential	Secret, Strictly Confidential	Most sensitive data requiring maximum protection	Trade secrets, M&A information, encryption keys, PII, credentials

Industry-Standard Frameworks:

Rather than designing from scratch, organizations often adapt established frameworks:

Government Classification (Unclassified → Confidential → Secret → Top Secret): Used by military and intelligence agencies, with strict legal requirements.
ISO 27001 Annex A.8.2: Provides guidance for information classification but leaves specific levels to organizational discretion.
NIST SP 800-60: Maps information types to confidentiality, integrity, and availability impact levels (Low/Moderate/High).
Enterprise Four-Tier Model: The most common commercial approach, using Public/Internal/Confidential/Highly Confidential levels.

The four-tier model balances simplicity with granularity. Fewer levels reduce classification errors; more levels enable finer-grained controls. Four levels work well for most organizations.

Start Simple, Evolve Carefully

Defining Classification Criteria

Effective criteria consider multiple dimensions of data sensitivity:

Primary Classification Dimensions

•Regulatory Requirements — Legal mandates that specify handling requirements. PII under GDPR, PHI under HIPAA, and cardholder data under PCI DSS automatically require elevated classification regardless of other factors.
•Business Impact — The financial, operational, and reputational consequences of disclosure. Would exposure cost thousands or millions? Would it require public notification? Would competitors gain advantage?
•Confidentiality Expectations — Explicit agreements (NDAs, contracts) or implicit expectations of privacy. Customer data shared in confidence differs from publicly posted information.
•Aggregation Risk — Individual data elements may be low-sensitivity, but combinations become high-sensitivity. Individual transaction timestamps are benign; patterns reveal business intelligence.
•Temporal Sensitivity — Some data is sensitive only for a period. Pre-announcement M&A details are highly confidential until the deal closes, then may become public.

classification-decision-tree.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# Classification Decision Tree
# Apply rules in order; first matching rule determines classification
 
classification_rules:
  - name: "Regulatory PII"
    description: "Personal data protected by GDPR, CCPA, or similar"
    criteria:
      - data_type: ["SSN", "national_id", "passport", "biometric", "health_data"]
      - contains_direct_identifier: true
    classification: "highly_confidential"
    handling:
      encryption: "required_at_rest_and_transit"
      access_logging: "required"
      retention: "per_regulation"
      deletion: "cryptographic_erase"
 
  - name: "Authentication Credentials"
    description: "Secrets that grant system access"
    criteria:
      - data_type: ["password", "api_key", "certificate", "private_key", "token"]
    classification: "highly_confidential"
    handling:
      encryption: "required_at_rest_and_transit"
      access_logging: "required"
      storage: "secrets_manager_only"
      rotation: "mandatory"
 
  - name: "Financial Data"
    description: "Data with direct financial impact"
    criteria:
      - data_type: ["payment_card", "bank_account", "salary", "revenue"]
    classification: "confidential"
    handling:
      encryption: "required_at_rest_and_transit"
      access_control: "need_to_know"
      audit_trail: "required"
 
  - name: "Business Sensitive"
    description: "Proprietary business information"
    criteria:
      - data_type: ["strategic_plan", "customer_list", "pricing", "source_code"]
    classification: "confidential"
    handling:
      encryption: "required_in_transit"
      access_control: "role_based"
      external_sharing: "restricted"
 
  - name: "Internal Operations"
    description: "Internal documents and communications"
    criteria:
      - internal_only: true
      - public_impact: "low"
    classification: "internal"
    handling:
      encryption: "recommended_in_transit"
      access_control: "employee_only"
 
  - name: "Default Public"
    description: "All other data"
    criteria:
      - default: true
    classification: "public"
    handling:
      encryption: "optional"
      access_control: "open"

The Role of Data Owners:

Effective programs establish:

Clear ownership assignment for all data domains
Owner training on classification criteria and implications
Escalation paths for ambiguous cases
Periodic reviews to validate ongoing accuracy

Automated Classification Systems

Automated Classification Approaches:

Pattern-Based Classification

•Regular expressions for structured data (SSNs, credit cards, phone numbers)
•Keyword matching for document classification (confidential, proprietary, internal)
•Schema analysis for database fields (column names like 'ssn', 'password_hash')
•High precision for well-defined patterns
•Limited flexibility for novel formats

ML-Based Classification

•NLP models for unstructured text classification
•Named entity recognition for PII detection
•Content analysis for image classification (documents, IDs, faces)
•Adapts to complex, variable content
•Requires training data and ongoing tuning

data-classifier-service.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
// Enterprise Data Classification Service
// Combines pattern matching, ML inference, and policy rules
 
interface ClassificationResult {
  classification: ClassificationLevel;
  confidence: number;
  matchedRules: string[];
  dataTypes: string[];
  requiresReview: boolean;
}
 
enum ClassificationLevel {
  PUBLIC = 'public',
  INTERNAL = 'internal',
  CONFIDENTIAL = 'confidential',
  HIGHLY_CONFIDENTIAL = 'highly_confidential',
}
 
interface DetectedDataType {
  type: string;
  confidence: number;
  location: string;
  sample?: string; // Masked for logging
}
 
class DataClassificationService {
  private patternDetectors: PatternDetector[];
  private mlClassifier: MLClassificationModel;
  private policyEngine: PolicyEngine;
 
  constructor(config: ClassificationConfig) {
    this.patternDetectors = [
      new SSNDetector(),
      new CreditCardDetector(),
      new EmailDetector(),
      new PhoneNumberDetector(),
      new APIKeyDetector(),
      new PasswordPatternDetector(),
    ];
    this.mlClassifier = new MLClassificationModel(config.modelEndpoint);
    this.policyEngine = new PolicyEngine(config.policies);
  }
 
  async classifyData(
    content: string | Buffer,
    metadata: DataMetadata
  ): Promise<ClassificationResult> {
    // Layer 1: Pattern-based detection (fast, high-precision)
    const patternMatches = await this.runPatternDetection(content);
    
    // Layer 2: ML-based detection (comprehensive, moderate precision)
    const mlDetections = await this.mlClassifier.analyze(content);
    
    // Layer 3: Metadata analysis (schema, location, origin)
    const metadataSignals = this.analyzeMetadata(metadata);
    
    // Combine detections with deduplication
    const allDetections = this.mergeDetections(
      patternMatches,
      mlDetections,
      metadataSignals
    );
    
    // Apply policy rules to determine classification
    const classification = await this.policyEngine.evaluate(
      allDetections,
      metadata
    );
    
    // Flag low-confidence results for human review
    const requiresReview = 
      classification.confidence < 0.85 ||
      allDetections.some(d => d.confidence < 0.7);
    
    // Emit classification event for audit and monitoring
    await this.emitClassificationEvent({
      dataId: metadata.id,
      classification: classification.level,
      confidence: classification.confidence,
      timestamp: new Date(),
      requiresReview,
    });
    
    return {
      classification: classification.level,
      confidence: classification.confidence,
      matchedRules: classification.matchedRules,
      dataTypes: allDetections.map(d => d.type),
      requiresReview,
    };
  }
 
  private async runPatternDetection(
    content: string | Buffer
  ): Promise<DetectedDataType[]> {
    const results: DetectedDataType[] = [];
    const contentStr = content.toString();
    
    for (const detector of this.patternDetectors) {
      const matches = await detector.detect(contentStr);
      results.push(...matches.map(m => ({
        type: detector.dataType,
        confidence: 0.95, // Pattern matches have high confidence
        location: m.location,
        sample: this.maskSample(m.value),
      })));
    }
    
    return results;
  }
 
  private analyzeMetadata(metadata: DataMetadata): DetectedDataType[] {
    const signals: DetectedDataType[] = [];
    
    // Column/field name analysis
    const sensitiveNamePatterns = {
      'ssn': 'social_security_number',
      'password': 'credential',
      'secret': 'credential',
      'credit_card': 'payment_card',
      'salary': 'financial_data',
      'dob': 'personal_data',
      'birth': 'personal_data',
    };
    
    for (const [pattern, dataType] of Object.entries(sensitiveNamePatterns)) {
      if (metadata.fieldName?.toLowerCase().includes(pattern)) {
        signals.push({
          type: dataType,
          confidence: 0.8,
          location: `field:${metadata.fieldName}`,
        });
      }
    }
    
    // Source system analysis
    if (metadata.sourceSystem) {
      const systemClassifications = this.policyEngine
        .getSystemDefaultClassification(metadata.sourceSystem);
      if (systemClassifications) {
        signals.push(...systemClassifications);
      }
    }
    
    return signals;
  }
 
  private maskSample(value: string): string {
    // Never log actual sensitive data
    if (value.length <= 4) return '****';
    return value.substring(0, 2) + '*'.repeat(value.length - 4) + 
           value.substring(value.length - 2);
  }
}

Defense in Depth Through Multiple Detectors

Classification Inheritance and Aggregation

Inheritance Rules:

Classification Propagation Principles

•Container Inheritance — Any container (file, database, API response) containing highly confidential data becomes highly confidential, regardless of other content.
•Aggregate Escalation — Combining individually low-sensitivity data may create higher-sensitivity aggregates. Ten public data points may reveal confidential patterns.
•Transformation Persistence — Processed, derived, or transformed data retains source classification unless specific declassification procedures are applied.
•Copy Propagation — Copies, backups, and replicas inherit classification from originals and must maintain equivalent protection.
•Metadata Classification — Even metadata (filenames, access logs, timestamps) may require classification if it reveals sensitive information.

Classification Aggregation Examples
Data Source A	Data Source B	Combined Result	Reason
Public customer names	Public company names	Public	No escalation—both inputs are public
Internal performance data	Public market data	Internal	Highest classification wins
Confidential revenue data	Internal department list	Confidential	Highest classification wins
Internal individual purchases	Internal individual purchases (many)	Confidential	Aggregation reveals patterns
Confidential unencrypted data	Highly confidential encryption key	Highly Confidential	Key exposure enables data access

The Aggregation Risk Challenge:

The most insidious classification challenge is aggregation risk—when individually innocuous data becomes sensitive when combined. Classic examples:

Individual location check-ins are low-risk; patterns reveal home addresses and daily routines.
Individual product views are internal; aggregated patterns reveal strategic market shifts.
Individual salary data points are scattered; combined data reveals pay inequities or enables identity theft.

Mitigating aggregation risk requires:

Pre-defined aggregation rules that automatically elevate classification when thresholds are crossed
Data minimization to limit unnecessary combination of data sources
Differential privacy techniques that add noise to aggregates while preserving utility
Access control boundaries that prevent unauthorized cross-dataset analysis

Declassification Is Rare

Classification Governance and Lifecycle

Classification Lifecycle:

Classification Governance Activities

•Initial Classification — Applied when data is created or ingested, either through automated detection or owner designation.
•Ongoing Validation — Periodic reviews (annual or trigger-based) to verify classification accuracy, typically driven by data owners.
•Reclassification — Formal process to upgrade or downgrade classification when conditions change, with full audit trail.
•Exception Management — Documented processes for handling edge cases that don't fit standard criteria, with approval workflows.
•Framework Updates — Periodic review of the classification framework itself to accommodate new data types and regulatory requirements.

classification-governance-api.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
// Classification Governance Service
// Manages classification lifecycle with full audit trail
 
interface ClassificationChange {
  dataAssetId: string;
  previousClassification: ClassificationLevel;
  newClassification: ClassificationLevel;
  reason: string;
  justification: string;
  approver?: string;
  effectiveDate: Date;
}
 
interface ClassificationReview {
  dataAssetId: string;
  reviewer: string;
  reviewDate: Date;
  outcome: 'confirmed' | 'reclassification_needed' | 'escalated';
  notes: string;
  nextReviewDate: Date;
}
 
class ClassificationGovernanceService {
  private auditLog: AuditLogService;
  private notificationService: NotificationService;
  private approvalWorkflow: ApprovalWorkflowService;
 
  async requestReclassification(
    request: ReclassificationRequest
  ): Promise<ReclassificationResult> {
    // Validate requester authorization
    await this.validateRequesterAccess(request.requesterId, request.dataAssetId);
    
    const currentClassification = await this.getCurrentClassification(
      request.dataAssetId
    );
    
    // Determine if approval is required
    const requiresApproval = this.determineApprovalRequirement(
      currentClassification,
      request.newClassification
    );
    
    if (requiresApproval) {
      // Initiate approval workflow for classification changes
      const workflow = await this.approvalWorkflow.initiate({
        type: 'classification_change',
        dataAssetId: request.dataAssetId,
        currentState: currentClassification,
        requestedState: request.newClassification,
        justification: request.justification,
        requester: request.requesterId,
        approvers: await this.getRequiredApprovers(
          request.dataAssetId,
          currentClassification,
          request.newClassification
        ),
      });
      
      return {
        status: 'pending_approval',
        workflowId: workflow.id,
        estimatedCompletionTime: workflow.estimatedCompletionTime,
      };
    }
    
    // Apply classification change immediately for non-approval cases
    return this.applyClassificationChange({
      dataAssetId: request.dataAssetId,
      previousClassification: currentClassification,
      newClassification: request.newClassification,
      reason: 'owner_request',
      justification: request.justification,
      effectiveDate: new Date(),
    });
  }
 
  async conductPeriodicReview(
    dataAssetId: string,
    reviewerId: string
  ): Promise<ClassificationReview> {
    const asset = await this.getDataAsset(dataAssetId);
    const currentClassification = asset.classification;
    
    // Run automated classification check
    const automatedAssessment = await this.classificationService
      .assessCurrentClassification(dataAssetId);
    
    // Prepare review with automated findings
    const review: ClassificationReview = {
      dataAssetId,
      reviewer: reviewerId,
      reviewDate: new Date(),
      outcome: 'confirmed',
      notes: '',
      nextReviewDate: this.calculateNextReviewDate(currentClassification),
    };
    
    // Flag discrepancies for human review
    if (automatedAssessment.recommendedClassification !== currentClassification) {
      review.outcome = 'reclassification_needed';
      review.notes = `Automated assessment suggests ${automatedAssessment.recommendedClassification}. Current: ${currentClassification}. Reason: ${automatedAssessment.reasoning}`;
      
      await this.notificationService.notifyDataOwner(
        asset.ownerId,
        'classification_review_discrepancy',
        { dataAssetId, finding: review.notes }
      );
    }
    
    // Log review in audit trail
    await this.auditLog.log({
      event: 'classification_review',
      dataAssetId,
      reviewer: reviewerId,
      outcome: review.outcome,
      timestamp: review.reviewDate,
    });
    
    return review;
  }
 
  private determineApprovalRequirement(
    current: ClassificationLevel,
    requested: ClassificationLevel
  ): boolean {
    // Downgrading always requires approval
    if (this.classificationOrdinal(requested) < this.classificationOrdinal(current)) {
      return true;
    }
    // Upgrading to highest level requires verification
    if (requested === ClassificationLevel.HIGHLY_CONFIDENTIAL) {
      return true;
    }
    return false;
  }
 
  private classificationOrdinal(level: ClassificationLevel): number {
    const order = {
      [ClassificationLevel.PUBLIC]: 0,
      [ClassificationLevel.INTERNAL]: 1,
      [ClassificationLevel.CONFIDENTIAL]: 2,
      [ClassificationLevel.HIGHLY_CONFIDENTIAL]: 3,
    };
    return order[level];
  }
}

Key Governance Metrics:

Effective classification programs track:

Metric	Description	Target
Classification Coverage	% of data assets with assigned classification	>95%
Review Currency	% of assets reviewed within required period	>90%
Automated Classification Rate	% of classifications assigned automatically	>70%
Exception Rate	% of assets requiring manual exception handling	<10%
Reclassification Frequency	Rate of classification changes over time	Stable or declining
Detection Lag	Time between data creation and classification	<24 hours

Integration with Security Controls

Classification-Driven Control Matrix:

Security Controls by Classification Level
Control	Public	Internal	Confidential	Highly Confidential
Encryption in Transit	Optional (HTTPS)	TLS 1.2+ required	TLS 1.3 required	mTLS + TLS 1.3
Encryption at Rest	Not required	Recommended	AES-256 required	AES-256 + envelope encryption
Access Control	Open	Authenticated users	Role-based + need-to-know	MFA + explicit approval
Access Logging	Basic (errors only)	Standard logging	Detailed audit logging	Full audit + real-time alerts
Data Masking	Not required	Not required	Non-production environments	All non-authorized access
Retention Limit	Business discretion	5 years default	Per data type policy	Minimum necessary + legal hold
Deletion Method	Standard delete	Logical delete	Secure overwrite	Cryptographic erasure + verification
Backup Requirements	Standard	Standard + offsite	Encrypted + tested recovery	Encrypted + geographic isolation

Policy Enforcement Architecture:

Modern systems implement classification-aware controls at multiple layers:

Data Layer — Databases enforce classification-based access controls, encryption policies, and masking rules directly on data storage.
API Layer — API gateways check caller authorization against data classification before returning responses.
Application Layer — Business logic respects classification when aggregating or transforming data.
Network Layer — Network segmentation isolates traffic for different classification levels.
Monitoring Layer — SIEM and monitoring tools alert on classification policy violations.

The goal is defense in depth: if one layer fails, others continue enforcing classification requirements.

classification-aware-data-access.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
// Classification-Aware Data Access Layer
// Enforces classification policies on all data operations
 
interface DataAccessContext {
  userId: string;
  userRole: string;
  userClearance: ClassificationLevel;
  accessReason: string;
  requestId: string;
}
 
class ClassificationAwareRepository<T> {
  private baseRepository: Repository<T>;
  private policyEngine: ClassificationPolicyEngine;
  private auditLogger: DataAccessAuditLogger;
  private maskingService: DataMaskingService;
 
  async findById(
    id: string,
    context: DataAccessContext
  ): Promise<T | null> {
    const record = await this.baseRepository.findById(id);
    if (!record) return null;
    
    const classification = await this.getRecordClassification(record);
    
    // Check access authorization
    const accessDecision = await this.policyEngine.evaluateAccess({
      requester: context,
      resource: { id, classification },
      operation: 'read',
    });
    
    if (!accessDecision.allowed) {
      await this.auditLogger.logDeniedAccess({
        userId: context.userId,
        resourceId: id,
        classification,
        reason: accessDecision.reason,
        timestamp: new Date(),
      });
      throw new AccessDeniedError(
        `Access denied to ${classification} data: ${accessDecision.reason}`
      );
    }
    
    // Apply masking based on clearance level delta
    const maskedRecord = await this.applyAppropriateeMasking(
      record,
      classification,
      context.userClearance
    );
    
    // Log successful access for audit trail
    await this.auditLogger.logDataAccess({
      userId: context.userId,
      resourceId: id,
      classification,
      operation: 'read',
      masked: maskedRecord !== record,
      timestamp: new Date(),
      requestId: context.requestId,
    });
    
    return maskedRecord;
  }
 
  private async applyAppropriateeMasking(
    record: T,
    resourceClassification: ClassificationLevel,
    userClearance: ClassificationLevel
  ): Promise<T> {
    // Users with matching or higher clearance see unmasked data
    if (this.classificationOrdinal(userClearance) >= 
        this.classificationOrdinal(resourceClassification)) {
      return record;
    }
    
    // Apply field-level masking based on classification delta
    return this.maskingService.applyMasking(record, {
      resourceClassification,
      viewerClearance: userClearance,
    });
  }
 
  async query(
    criteria: QueryCriteria,
    context: DataAccessContext
  ): Promise<T[]> {
    // Add classification filter to prevent returning inaccessible records
    const classificationFilter = this.buildClassificationFilter(
      context.userClearance
    );
    
    const enhancedCriteria = {
      ...criteria,
      filters: [...(criteria.filters || []), classificationFilter],
    };
    
    const results = await this.baseRepository.query(enhancedCriteria);
    
    // Log bulk access
    await this.auditLogger.logBulkAccess({
      userId: context.userId,
      query: this.sanitizeQueryForLogging(criteria),
      resultCount: results.length,
      timestamp: new Date(),
      requestId: context.requestId,
    });
    
    return results;
  }
 
  private buildClassificationFilter(
    userClearance: ClassificationLevel
  ): QueryFilter {
    // Return only records at or below user's clearance level
    const accessibleLevels = this.getAccessibleLevels(userClearance);
    return {
      field: 'classification',
      operator: 'in',
      value: accessibleLevels,
    };
  }
}

Fail Closed, Not Open

Summary: Data Classification Mastery

Key Takeaways:

Classification Principles to Remember

•Classification enables proportionate protection — It's the decision framework that drives all other security controls, ensuring the right protection for the right data.
•Simple frameworks outperform complex ones — Three to four classification levels work for most organizations; more levels increase error rates without adding value.
•Automation is essential at scale — Pattern matching and ML classification enable consistent classification across millions of records with human oversight for edge cases.
•Highest classification wins — When data is combined or copied, the result inherits the highest classification of any input; downgrading requires explicit approval.
•Classification drives controls — Each level must map to concrete requirements for encryption, access control, logging, retention, and deletion.
•Governance is continuous — Classification accuracy requires ongoing validation, periodic reviews, and lifecycle management, not one-time effort.

Next Steps:

Page Complete

1 / 5