System Design HLDRefactoring Toward Better Abstractions

Refactoring Toward Better Abstractions

LevelIntermediate

Duration90 mins

TopicRefactoring Toward Better Abstractions

1 / 4

Identifying Abstraction Opportunities

The Art of Seeing What Isn't There Yet

Every great abstraction in software history began as a pattern that someone recognized in concrete code. The HashMap, the Iterator, the Observer—none of these were invented in isolation. They emerged when thoughtful engineers looked at repetitive, similar-but-different code and asked: What is the essential shape of this problem?

Identifying abstraction opportunities is the first and most critical step in any refactoring journey. Before you can extract an interface or introduce an abstract class, you must develop the ability to see the potential for abstraction hiding within concrete implementations. This skill separates engineers who merely maintain code from those who evolve systems into more elegant, flexible architectures.

What You Will Learn

By the end of this page, you will understand how to recognize code patterns that signal abstraction opportunities, learn the specific code smells that indicate missing abstractions, master techniques for distinguishing essential complexity from accidental duplication, and develop a systematic approach to identifying where abstractions will provide genuine value.

Why Identification Comes First

Refactoring toward better abstractions is a high-leverage activity—when done correctly, it dramatically improves code maintainability, testability, and extensibility. But abstraction is also dangerous. Wrong abstractions are worse than no abstractions at all, because they impose the cognitive overhead of indirection without providing the benefits of genuine simplification.

This is why identification is so critical. Before you invest effort in extracting interfaces or creating abstract classes, you must ensure:

There is genuine commonality — Not just superficial similarity, but true shared structure or behavior
The abstraction will be used — Abstracting for hypothetical future needs often creates dead code
The abstraction captures the right thing — Missing the essence leads to leaky or awkward abstractions
The timing is right — Premature abstraction is as problematic as no abstraction

The Wrong Abstraction Trap

Sandi Metz famously stated: 'Duplication is far cheaper than the wrong abstraction.' When you extract the wrong commonality, every team member who touches the code must understand the flawed abstraction, work around its limitations, and resist the temptation to add more special cases. Wrong abstractions accumulate complexity faster than they eliminate it.

The recognition skill:

Identifying abstraction opportunities is fundamentally a pattern recognition skill. You are looking for:

Structural patterns — Similar shapes appearing across different parts of the codebase
Behavioral patterns — Similar sequences of operations with minor variations
Conceptual patterns — Different implementations representing the same abstract concept
Evolution patterns — Places where requirements changes consistently cause cascading modifications

Let's examine each of these in depth, building a practical toolkit for spotting abstraction opportunities in real codebases.

Code Smells That Signal Missing Abstractions

Code smells are surface-level symptoms of deeper design problems. Several specific smells strongly indicate that an abstraction is missing. Learning to recognize these gives you a systematic way to identify refactoring opportunities.

Understanding the relationship: A code smell is not itself a problem—it's a signal that a problem might exist. The smell of duplicated code doesn't automatically mean you need an abstraction; it means you should investigate whether an abstraction would be beneficial. Always apply judgment before acting on a smell.

Primary Smells Indicating Missing Abstraction

•Duplicated Code Blocks — The most obvious smell. When you see nearly identical code in multiple places, there's likely an abstraction that could unify them. Key distinction: Look for structural similarity, not just textual similarity. Two loops that look different but implement the same pattern are candidates for abstraction.
•Parallel Inheritance Hierarchies — When adding a class to one hierarchy requires adding a corresponding class to another hierarchy, you're missing an abstraction that could connect them. This often indicates a need for a bridge pattern or strategy extraction.
•Switch/If-Else Chains on Type — When you see code that checks the type of an object and performs different operations based on the type, you're looking at polymorphism waiting to be extracted. Each case in the switch is a potential subtype of an abstraction.
•Feature Envy — When a method accesses more data from another object than from its own, it suggests the behavior should be on that other object, or there's a missing abstraction that should own this responsibility.
•Shotgun Surgery — When a single change requires modifications to many different classes, the cross-cutting concern should likely be abstracted into a single location.
•Divergent Change — When a class is changed for many different reasons, it likely contains multiple responsibilities that should be separated into distinct abstractions.

smell_examples.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// SMELL: Switch on Type — Classic missing abstraction signal
class ReportGenerator {
    generateReport(data: any, format: string): string {
        // Every new format requires modifying this method
        switch (format) {
            case 'pdf':
                return this.formatAsPdf(data);
            case 'html':
                return this.formatAsHtml(data);
            case 'csv':
                return this.formatAsCsv(data);
            case 'json':
                return this.formatAsJson(data);
            // Adding 'xml' requires changing this class!
            default:
                throw new Error(`Unknown format: ${format}`);
        }
    }
}
 
// SMELL: Duplicated Structural Pattern — Same shape, different details
class OrderProcessor {
    processOnlineOrder(order: OnlineOrder): void {
        this.validateOrder(order);
        this.calculateTotal(order);
        this.applyOnlineDiscount(order);    // Only variation
        this.chargeCard(order);
        this.sendConfirmation(order);
    }
 
    processPhoneOrder(order: PhoneOrder): void {
        this.validateOrder(order);
        this.calculateTotal(order);
        this.applyPhoneDiscount(order);     // Only variation
        this.chargeCard(order);
        this.sendConfirmation(order);
    }
 
    processInStoreOrder(order: InStoreOrder): void {
        this.validateOrder(order);
        this.calculateTotal(order);
        this.applyInStoreDiscount(order);   // Only variation
        this.chargeCash(order);             // Slight variation
        this.printReceipt(order);           // Slight variation
    }
}
 
// SMELL: Feature Envy — Method uses another object's data extensively
class InvoiceCalculator {
    calculateTotal(invoice: Invoice): number {
        // This method knows too much about Invoice internals
        let total = 0;
        for (const item of invoice.items) {
            total += item.price * item.quantity;
        }
        total -= invoice.discount;
        total += total * invoice.taxRate;
        total += invoice.shippingCost;
        total -= invoice.loyaltyPoints * 0.01;
        return total;
    }
    // The calculation logic likely belongs ON Invoice
}

Reading the smells:

Each smell tells a different story:

The switch on format reveals a missing ReportFormatter abstraction. New formats should be added by creating new classes, not modifying existing code.
The duplicated order processing reveals a missing DiscountStrategy abstraction. The overall workflow is identical; only the discount calculation varies.
The feature envy suggests that calculateTotal should be a method on Invoice itself, or there's a missing TotalCalculator abstraction that both invoice and calculator could use.

These smells are your starting points for deeper investigation, not automatic triggers for refactoring.

The Rule of Three

One of the most practical heuristics for identifying abstraction opportunities is the Rule of Three: you abstract when you see the same pattern three times. Not twice—three times.

Why three?

With two examples, you can't reliably distinguish:

Genuine shared structure from coincidental similarity
The stable core from the variable parts
Which variations matter from which don't

Two Instances (Risky)

•Similarity might be coincidental
•Can't determine which parts vary
•Abstraction might be premature
•Risk of wrong abstraction is high
•Future instances may not fit the pattern

Three+ Instances (Clearer)

•Pattern confirmed across multiple cases
•Stable core becomes visible
•Variable parts become clear
•Abstraction design is informed by evidence
•Confidence in the abstraction's applicability

Applying the rule:

When you encounter duplication for the first time, note it but don't act. When you see it again, mark it as a candidate for abstraction. When you see it a third time, you have enough evidence to design an abstraction that genuinely captures the common pattern.

Example in practice:

Suppose you're building an e-commerce system:

First occurrence: You write validation logic for credit card payments
Second occurrence: You need validation for PayPal payments—similar structure, different details
Third occurrence: You add Apple Pay validation—now the pattern is unmistakable

At this point, you have three concrete examples to inform your abstraction:

What do all three share? (validation workflow, error handling, logging)
Where do they differ? (API calls, credential handling, response parsing)
What interface would accommodate all three plus future payment methods?

The Rule of Three gives you the evidence to answer these questions confidently.

The Rule Is a Guideline, Not a Law

Sometimes you know from domain experience that more instances are coming. A payment processor who adds 'credit card' and 'PayPal' knows 'Stripe' and 'Apple Pay' are inevitable. In such cases, abstracting at two instances is reasonable. The rule exists to prevent premature abstraction, not to block informed judgment.

Structural vs Conceptual Similarity

One of the most subtle skills in identifying abstraction opportunities is distinguishing between structural similarity (code that looks the same) and conceptual similarity (code that represents the same abstract idea).

The critical distinction:

Structural similarity is about syntax—the actual characters and statements appearing in the code
Conceptual similarity is about semantics—the meaning and purpose of the code

Good abstractions capture conceptual similarity. Bad abstractions are often created to eliminate structural similarity that is actually coincidental.

similarity_distinction.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
// TRAP: Structural similarity that is NOT conceptual
// These loops look similar but represent unrelated concepts
 
function calculateAverageAge(users: User[]): number {
    let sum = 0;
    for (const user of users) {
        sum += user.age;
    }
    return users.length > 0 ? sum / users.length : 0;
}
 
function calculateTotalRevenue(orders: Order[]): number {
    let sum = 0;
    for (const order of orders) {
        sum += order.amount;
    }
    return sum;  // Different return logic!
}
 
// BAD: Forcing an abstraction due to structural similarity
function aggregate<T>(items: T[], getValue: (item: T) => number): number {
    let sum = 0;
    for (const item of items) {
        sum += getValue(item);
    }
    return sum;  // Forces both functions into same mold
}
// calculateAverageAge now needs awkward post-processing!
 
// OPPORTUNITY: Conceptual similarity that SHOULD be abstracted
// These represent the same concept: validating an entity before persistence
 
function validateUserBeforeSave(user: User): ValidationResult {
    const errors: string[] = [];
    if (!user.email || !isValidEmail(user.email)) {
        errors.push('Invalid email');
    }
    if (!user.name || user.name.length < 2) {
        errors.push('Name too short');
    }
    return { isValid: errors.length === 0, errors };
}
 
function validateOrderBeforeSave(order: Order): ValidationResult {
    const errors: string[] = [];
    if (!order.items || order.items.length === 0) {
        errors.push('Order must have items');
    }
    if (order.total < 0) {
        errors.push('Total cannot be negative');
    }
    return { isValid: errors.length === 0, errors };
}
 
// GOOD: Abstraction captures the concept, not just the structure
interface Validatable {
    validate(): ValidationResult;
}
 
class User implements Validatable {
    validate(): ValidationResult {
        const errors: string[] = [];
        // User-specific validation rules
        return { isValid: errors.length === 0, errors };
    }
}
 
class Order implements Validatable {
    validate(): ValidationResult {
        const errors: string[] = [];
        // Order-specific validation rules  
        return { isValid: errors.length === 0, errors };
    }
}
 
// Now we can write code that works with ANY validatable entity
function saveIfValid<T extends Validatable>(entity: T, repository: Repository<T>): boolean {
    const result = entity.validate();
    if (result.isValid) {
        repository.save(entity);
        return true;
    }
    return false;
}

How to tell the difference:

Ask these questions to distinguish structural from conceptual similarity:

Would the similar code change for the same reason? — If changes in user validation requirements would also trigger changes in order validation, they likely share a conceptual basis.
Do the implementations serve the same purpose in their respective contexts? — Both calculateAverageAge and calculateTotalRevenue perform aggregation, but they serve fundamentally different business purposes with different semantics.
Would combining them require artificial parameters or flags? — If unifying the code requires adding includeAverage: boolean or similar flags, the similarity is likely superficial.
Do domain experts use the same language for both? — If stakeholders talk about 'validating before save' for both users and orders, there's conceptual alignment.

The key insight: Conceptual similarity is about roles and responsibilities, not about loops and conditionals. Two pieces of code can look completely different but represent the same abstraction. Two pieces of code can look identical but have nothing conceptually in common.

Change Patterns as Abstraction Signals

Some of the strongest signals for missing abstractions come not from examining code at rest, but from observing how code changes over time. Your version control history is a goldmine of abstraction opportunities.

The insight: If changes consistently require touching multiple files in the same pattern, there's likely an abstraction that could consolidate that change pattern into a single location.

Change Patterns That Signal Missing Abstractions

•Co-changing Files — When files A, B, and C are always modified together in commits, they likely share a responsibility that should be abstracted. Run 'git log' analysis to find these clusters.
•Copy-Paste-Modify Commits — When commits frequently show the same change applied to multiple locations with slight variations, you're maintaining duplicate code that begs for unification.
•Cascading Additions — When adding a new feature requires adding parallel implementations in multiple places (e.g., new payment type requires updates to PaymentProcessor, PaymentValidator, PaymentLogger), there's a missing abstraction that could encapsulate all aspects of a 'payment method.'
•Regression Patterns — When fixing a bug in one place frequently introduces (or reveals) the same bug in similar locations, the duplicated logic should be abstracted to a single point of truth.
•Hesitant Refactoring — When developers repeatedly start refactoring duplicated code but abandon the effort because 'it's too risky,' it often means the right abstraction isn't clear—time to study the code more deeply.

git_analysis_commands.bash
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Find files that frequently change together
# This reveals hidden coupling that might benefit from abstraction
 
# List file pairs that appear in the same commits
git log --name-only --pretty=format: | \
  awk 'NF' | \
  sort | \
  uniq -c | \
  sort -rn | \
  head -20
 
# Find files with high churn (frequently modified)
# High-churn files often contain multiple responsibilities
git log --name-only --pretty=format: --since="6 months ago" | \
  sort | \
  uniq -c | \
  sort -rn | \
  head -20
 
# Analyze commit patterns for specific directory
git log --oneline --name-only -- src/payments/ | \
  grep -E '.ts$' | \
  sort | \
  uniq -c | \
  sort -rn
 
# Look for "copy-paste" patterns in commit messages
git log --oneline --grep="similar" --grep="same as" --grep="like" | \
  head -20

The 'Changelog Test'

Try writing a changelog entry for a hypothetical change. If you find yourself writing 'Updated PaymentProcessor, PaymentValidator, PaymentLogger, and PaymentReporter to support Venmo,' the abstraction is screaming to exist. A well-abstracted system's changelog would simply read: 'Added Venmo payment method.'

Domain-Driven Abstraction Discovery

Some of the most valuable abstraction opportunities aren't visible in the code at all—they're visible in the language of the domain. Domain-Driven Design (DDD) teaches us that the vocabulary used by domain experts often reveals abstractions that should exist in code.

The principle: When domain experts consistently use a term that has no corresponding type in your codebase, you've likely found a missing abstraction.

Domain Language → Missing Abstractions
Domain Expert Says	Code Currently Has	Missing Abstraction
'The customer's order history shows...'	List<Order> scattered across multiple services	OrderHistory value object or aggregate
'Apply the discount policy to the cart'	Multiple if-else blocks checking conditions	DiscountPolicy interface with implementations
'The shipment tracking shows three events'	String[] with parsing logic everywhere	ShipmentEvent entity with TrackingTimeline aggregate
'Check if the user has permission'	Boolean checks duplicated across methods	Permission or AccessControl abstraction
'The pricing tier determines the rate'	Switch statements on tier names	PricingTier abstraction with polymorphic pricing

Listening for abstraction opportunities:

Pay attention when domain experts:

Use nouns that aren't types — 'Campaign,' 'Workflow,' 'Subscription Period' — if these words appear in meetings but not in code, consider adding them.
Describe behaviors that span multiple objects — 'The checkout process validates, reserves, and charges' — the 'checkout process' might deserve to be a first-class object.
Distinguish cases you've conflated — 'That's a promotional discount, not a loyalty discount' — you might have one Discount class where you need two.
Name implicit concepts — 'The SLA for premium customers is different' — 'SLA' is a concept that should probably exist in code.

The ubiquitous language principle: Successful abstractions use the same vocabulary as the domain. When your code says applyRateModifier but stakeholders say apply discount policy, the abstraction isn't just missing—it's concealed behind technical jargon.

The Whiteboard Test

Invite a domain expert to explain a business process while you sketch it on a whiteboard. Every box they draw is a potential class. Every arrow is a potential method. Every label is a potential type name. The domain expert is drawing your missing abstractions without knowing it.

Anti-Patterns in Abstraction Identification

Just as there are signals that indicate genuine abstraction opportunities, there are traps that lead to false positives—situations that look like abstraction opportunities but actually aren't. Recognizing these anti-patterns protects you from creating wrong abstractions.

False Positives to Avoid

•Speculative Generality — 'We might need to support multiple databases someday, so let's abstract the data layer now.' If the need isn't demonstrated, the abstraction is speculative and often wrong when real requirements arrive.
•Utility Function Creep — 'These three services all call the same utility function, so there must be an abstraction.' Shared utilities don't imply abstraction opportunities; sometimes a function is just a function.
•Configuration Similarity — 'These config files have the same structure, so let's create a generic config handler.' Configuration similarity is often coincidental, not conceptual.
•Line Count Obsession — 'This function is 100 lines, so it needs to be abstracted.' Long functions sometimes indicate missing abstractions, but not always. Some procedures are inherently sequential and simpler as one unit.
•Framework-Driven Abstraction — 'The framework uses Repository pattern, so everything needs a repository.' Following framework patterns blindly leads to abstractions that don't match your domain.
•Premature DRY — 'These two lines are the same, let's extract a method.' Small-scale duplication is often fine; the overhead of abstraction outweighs the cost of the duplication.

false_positives.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// ANTI-PATTERN: Speculative Generality
// "We might need other notification channels someday..."
 
// Current reality: Only email is used
// Created abstraction:
interface NotificationChannel {
    send(message: Message): Promise<void>;
}
 
class EmailChannel implements NotificationChannel { /* ... */ }
class SmsChannel implements NotificationChannel { /* ... */ }      // Never used
class PushChannel implements NotificationChannel { /* ... */ }    // Never used  
class SlackChannel implements NotificationChannel { /* ... */ }   // Never used
 
// Result: Four classes, three of which are dead code
// Maintenance burden for no benefit
 
// ANTI-PATTERN: Premature DRY
// "These two functions both validate email format..."
 
// Before (simple, clear):
function validateUserEmail(email: string): boolean {
    return /^[^@]+@[^@]+\.[^@]+$/.test(email);
}
 
function validateOrderContactEmail(email: string): boolean {
    return /^[^@]+@[^@]+\.[^@]+$/.test(email);
}
 
// After "DRY" refactoring (unnecessary abstraction):
const emailValidator = new EmailValidator({
    pattern: /^[^@]+@[^@]+\.[^@]+$/,
    errorMessage: "Invalid email format"
});
 
function validateUserEmail(email: string): boolean {
    return emailValidator.validate(email);
}
 
function validateOrderContactEmail(email: string): boolean {
    return emailValidator.validate(email);
}
 
// Result: More code, more indirection, no real benefit
// The duplication was FINE—two lines of identical regex
 
// BETTER APPROACH: Wait for evidence
// When you ACTUALLY need different email validation rules:
// - User emails: Must be from non-disposable domains
// - Order contact: Can be any valid email
// - Marketing: Must have opt-in confirmed
// THEN abstract, with real requirements informing the design

The YAGNI Principle

You Aren't Gonna Need It. Abstractions created for hypothetical future requirements almost never match actual future requirements. The best time to abstract is when you have concrete evidence of need—when the third example appears, when the same change pattern recurs, when domain experts name the missing concept.

A Systematic Discovery Process

Let's consolidate everything into a systematic process for identifying abstraction opportunities. This process can be applied during code reviews, dedicated refactoring sessions, or whenever you sense that code could be improved.

Abstraction Opportunity Discovery Checklist

•Smell Detection — Scan for code smells: duplication, type-switching, feature envy, parallel hierarchies. Mark candidates without immediately acting.
•Rule of Three Check — For each candidate, count concrete instances. If fewer than three exist, monitor rather than abstract.
•Conceptual Analysis — Determine if similarity is structural or conceptual. Ask: 'Would these change for the same reason? Do they serve the same purpose?'
•Change History Review — Check version control for co-changing files, copy-paste-modify patterns, and cascading changes.
•Domain Language Audit — Compare domain expert vocabulary to codebase types. Note concepts that are discussed but not represented.
•False Positive Filter — Challenge each candidate: Is this speculative? Is the duplication actually fine? Am I following a framework blindly?
•Priority Assessment — For surviving candidates, estimate impact: How often is this code changed? How much complexity would the abstraction remove?
•Document Decision — Record whether to abstract now, later, or never—and why. This prevents rehashing the same analysis.

When to run this process:

During code review — Quick smell detection and Rule of Three check
Before major feature work — Full process for the affected area
Quarterly tech debt review — Systematic application across the codebase
After repeated 'this code is hard to change' complaints — Deep analysis of the problematic area

What comes next:

Once you've identified a genuine abstraction opportunity, the next steps are:

Extract Interface — Define the contract that the abstraction provides
Introduce Abstract Class — If shared implementation is needed
Iterate and Refine — Improve the abstraction based on usage feedback

These topics are covered in the following pages of this module.

Page Complete

You now understand how to identify abstraction opportunities systematically—recognizing code smells, applying the Rule of Three, distinguishing structural from conceptual similarity, reading change patterns, and avoiding false positives. The next page covers extracting interfaces: the first concrete step in realizing an abstraction opportunity.

1 / 4

Loading learning content...

System Design HLDRefactoring Toward Better Abstractions

Refactoring Toward Better Abstractions

LevelIntermediate

Duration90 mins

TopicRefactoring Toward Better Abstractions

1 / 4

Identifying Abstraction Opportunities

The Art of Seeing What Isn't There Yet

What You Will Learn

Why Identification Comes First

This is why identification is so critical. Before you invest effort in extracting interfaces or creating abstract classes, you must ensure:

There is genuine commonality — Not just superficial similarity, but true shared structure or behavior
The abstraction will be used — Abstracting for hypothetical future needs often creates dead code
The abstraction captures the right thing — Missing the essence leads to leaky or awkward abstractions
The timing is right — Premature abstraction is as problematic as no abstraction

The Wrong Abstraction Trap

The recognition skill:

Identifying abstraction opportunities is fundamentally a pattern recognition skill. You are looking for:

Structural patterns — Similar shapes appearing across different parts of the codebase
Behavioral patterns — Similar sequences of operations with minor variations
Conceptual patterns — Different implementations representing the same abstract concept
Evolution patterns — Places where requirements changes consistently cause cascading modifications

Let's examine each of these in depth, building a practical toolkit for spotting abstraction opportunities in real codebases.

Code Smells That Signal Missing Abstractions

Primary Smells Indicating Missing Abstraction

•Duplicated Code Blocks — The most obvious smell. When you see nearly identical code in multiple places, there's likely an abstraction that could unify them. Key distinction: Look for structural similarity, not just textual similarity. Two loops that look different but implement the same pattern are candidates for abstraction.
•Parallel Inheritance Hierarchies — When adding a class to one hierarchy requires adding a corresponding class to another hierarchy, you're missing an abstraction that could connect them. This often indicates a need for a bridge pattern or strategy extraction.
•Switch/If-Else Chains on Type — When you see code that checks the type of an object and performs different operations based on the type, you're looking at polymorphism waiting to be extracted. Each case in the switch is a potential subtype of an abstraction.
•Feature Envy — When a method accesses more data from another object than from its own, it suggests the behavior should be on that other object, or there's a missing abstraction that should own this responsibility.
•Shotgun Surgery — When a single change requires modifications to many different classes, the cross-cutting concern should likely be abstracted into a single location.
•Divergent Change — When a class is changed for many different reasons, it likely contains multiple responsibilities that should be separated into distinct abstractions.

smell_examples.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// SMELL: Switch on Type — Classic missing abstraction signal
class ReportGenerator {
    generateReport(data: any, format: string): string {
        // Every new format requires modifying this method
        switch (format) {
            case 'pdf':
                return this.formatAsPdf(data);
            case 'html':
                return this.formatAsHtml(data);
            case 'csv':
                return this.formatAsCsv(data);
            case 'json':
                return this.formatAsJson(data);
            // Adding 'xml' requires changing this class!
            default:
                throw new Error(`Unknown format: ${format}`);
        }
    }
}
 
// SMELL: Duplicated Structural Pattern — Same shape, different details
class OrderProcessor {
    processOnlineOrder(order: OnlineOrder): void {
        this.validateOrder(order);
        this.calculateTotal(order);
        this.applyOnlineDiscount(order);    // Only variation
        this.chargeCard(order);
        this.sendConfirmation(order);
    }
 
    processPhoneOrder(order: PhoneOrder): void {
        this.validateOrder(order);
        this.calculateTotal(order);
        this.applyPhoneDiscount(order);     // Only variation
        this.chargeCard(order);
        this.sendConfirmation(order);
    }
 
    processInStoreOrder(order: InStoreOrder): void {
        this.validateOrder(order);
        this.calculateTotal(order);
        this.applyInStoreDiscount(order);   // Only variation
        this.chargeCash(order);             // Slight variation
        this.printReceipt(order);           // Slight variation
    }
}
 
// SMELL: Feature Envy — Method uses another object's data extensively
class InvoiceCalculator {
    calculateTotal(invoice: Invoice): number {
        // This method knows too much about Invoice internals
        let total = 0;
        for (const item of invoice.items) {
            total += item.price * item.quantity;
        }
        total -= invoice.discount;
        total += total * invoice.taxRate;
        total += invoice.shippingCost;
        total -= invoice.loyaltyPoints * 0.01;
        return total;
    }
    // The calculation logic likely belongs ON Invoice
}

Reading the smells:

Each smell tells a different story:

The switch on format reveals a missing ReportFormatter abstraction. New formats should be added by creating new classes, not modifying existing code.
The duplicated order processing reveals a missing DiscountStrategy abstraction. The overall workflow is identical; only the discount calculation varies.
The feature envy suggests that calculateTotal should be a method on Invoice itself, or there's a missing TotalCalculator abstraction that both invoice and calculator could use.

These smells are your starting points for deeper investigation, not automatic triggers for refactoring.

The Rule of Three

One of the most practical heuristics for identifying abstraction opportunities is the Rule of Three: you abstract when you see the same pattern three times. Not twice—three times.

Why three?

With two examples, you can't reliably distinguish:

Genuine shared structure from coincidental similarity
The stable core from the variable parts
Which variations matter from which don't

Two Instances (Risky)

•Similarity might be coincidental
•Can't determine which parts vary
•Abstraction might be premature
•Risk of wrong abstraction is high
•Future instances may not fit the pattern

Three+ Instances (Clearer)

•Pattern confirmed across multiple cases
•Stable core becomes visible
•Variable parts become clear
•Abstraction design is informed by evidence
•Confidence in the abstraction's applicability

Applying the rule:

Example in practice:

Suppose you're building an e-commerce system:

First occurrence: You write validation logic for credit card payments
Second occurrence: You need validation for PayPal payments—similar structure, different details
Third occurrence: You add Apple Pay validation—now the pattern is unmistakable

At this point, you have three concrete examples to inform your abstraction:

What do all three share? (validation workflow, error handling, logging)
Where do they differ? (API calls, credential handling, response parsing)
What interface would accommodate all three plus future payment methods?

The Rule of Three gives you the evidence to answer these questions confidently.

The Rule Is a Guideline, Not a Law

Structural vs Conceptual Similarity

The critical distinction:

Structural similarity is about syntax—the actual characters and statements appearing in the code
Conceptual similarity is about semantics—the meaning and purpose of the code

Good abstractions capture conceptual similarity. Bad abstractions are often created to eliminate structural similarity that is actually coincidental.

similarity_distinction.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
// TRAP: Structural similarity that is NOT conceptual
// These loops look similar but represent unrelated concepts
 
function calculateAverageAge(users: User[]): number {
    let sum = 0;
    for (const user of users) {
        sum += user.age;
    }
    return users.length > 0 ? sum / users.length : 0;
}
 
function calculateTotalRevenue(orders: Order[]): number {
    let sum = 0;
    for (const order of orders) {
        sum += order.amount;
    }
    return sum;  // Different return logic!
}
 
// BAD: Forcing an abstraction due to structural similarity
function aggregate<T>(items: T[], getValue: (item: T) => number): number {
    let sum = 0;
    for (const item of items) {
        sum += getValue(item);
    }
    return sum;  // Forces both functions into same mold
}
// calculateAverageAge now needs awkward post-processing!
 
// OPPORTUNITY: Conceptual similarity that SHOULD be abstracted
// These represent the same concept: validating an entity before persistence
 
function validateUserBeforeSave(user: User): ValidationResult {
    const errors: string[] = [];
    if (!user.email || !isValidEmail(user.email)) {
        errors.push('Invalid email');
    }
    if (!user.name || user.name.length < 2) {
        errors.push('Name too short');
    }
    return { isValid: errors.length === 0, errors };
}
 
function validateOrderBeforeSave(order: Order): ValidationResult {
    const errors: string[] = [];
    if (!order.items || order.items.length === 0) {
        errors.push('Order must have items');
    }
    if (order.total < 0) {
        errors.push('Total cannot be negative');
    }
    return { isValid: errors.length === 0, errors };
}
 
// GOOD: Abstraction captures the concept, not just the structure
interface Validatable {
    validate(): ValidationResult;
}
 
class User implements Validatable {
    validate(): ValidationResult {
        const errors: string[] = [];
        // User-specific validation rules
        return { isValid: errors.length === 0, errors };
    }
}
 
class Order implements Validatable {
    validate(): ValidationResult {
        const errors: string[] = [];
        // Order-specific validation rules  
        return { isValid: errors.length === 0, errors };
    }
}
 
// Now we can write code that works with ANY validatable entity
function saveIfValid<T extends Validatable>(entity: T, repository: Repository<T>): boolean {
    const result = entity.validate();
    if (result.isValid) {
        repository.save(entity);
        return true;
    }
    return false;
}

How to tell the difference:

Ask these questions to distinguish structural from conceptual similarity:

Would the similar code change for the same reason? — If changes in user validation requirements would also trigger changes in order validation, they likely share a conceptual basis.
Do the implementations serve the same purpose in their respective contexts? — Both calculateAverageAge and calculateTotalRevenue perform aggregation, but they serve fundamentally different business purposes with different semantics.
Would combining them require artificial parameters or flags? — If unifying the code requires adding includeAverage: boolean or similar flags, the similarity is likely superficial.
Do domain experts use the same language for both? — If stakeholders talk about 'validating before save' for both users and orders, there's conceptual alignment.

Change Patterns as Abstraction Signals

The insight: If changes consistently require touching multiple files in the same pattern, there's likely an abstraction that could consolidate that change pattern into a single location.

Change Patterns That Signal Missing Abstractions

•Co-changing Files — When files A, B, and C are always modified together in commits, they likely share a responsibility that should be abstracted. Run 'git log' analysis to find these clusters.
•Copy-Paste-Modify Commits — When commits frequently show the same change applied to multiple locations with slight variations, you're maintaining duplicate code that begs for unification.
•Cascading Additions — When adding a new feature requires adding parallel implementations in multiple places (e.g., new payment type requires updates to PaymentProcessor, PaymentValidator, PaymentLogger), there's a missing abstraction that could encapsulate all aspects of a 'payment method.'
•Regression Patterns — When fixing a bug in one place frequently introduces (or reveals) the same bug in similar locations, the duplicated logic should be abstracted to a single point of truth.
•Hesitant Refactoring — When developers repeatedly start refactoring duplicated code but abandon the effort because 'it's too risky,' it often means the right abstraction isn't clear—time to study the code more deeply.

git_analysis_commands.bash
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Find files that frequently change together
# This reveals hidden coupling that might benefit from abstraction
 
# List file pairs that appear in the same commits
git log --name-only --pretty=format: | \
  awk 'NF' | \
  sort | \
  uniq -c | \
  sort -rn | \
  head -20
 
# Find files with high churn (frequently modified)
# High-churn files often contain multiple responsibilities
git log --name-only --pretty=format: --since="6 months ago" | \
  sort | \
  uniq -c | \
  sort -rn | \
  head -20
 
# Analyze commit patterns for specific directory
git log --oneline --name-only -- src/payments/ | \
  grep -E '.ts$' | \
  sort | \
  uniq -c | \
  sort -rn
 
# Look for "copy-paste" patterns in commit messages
git log --oneline --grep="similar" --grep="same as" --grep="like" | \
  head -20

The 'Changelog Test'

Domain-Driven Abstraction Discovery

The principle: When domain experts consistently use a term that has no corresponding type in your codebase, you've likely found a missing abstraction.

Domain Language → Missing Abstractions
Domain Expert Says	Code Currently Has	Missing Abstraction
'The customer's order history shows...'	List<Order> scattered across multiple services	OrderHistory value object or aggregate
'Apply the discount policy to the cart'	Multiple if-else blocks checking conditions	DiscountPolicy interface with implementations
'The shipment tracking shows three events'	String[] with parsing logic everywhere	ShipmentEvent entity with TrackingTimeline aggregate
'Check if the user has permission'	Boolean checks duplicated across methods	Permission or AccessControl abstraction
'The pricing tier determines the rate'	Switch statements on tier names	PricingTier abstraction with polymorphic pricing

Listening for abstraction opportunities:

Pay attention when domain experts:

Use nouns that aren't types — 'Campaign,' 'Workflow,' 'Subscription Period' — if these words appear in meetings but not in code, consider adding them.
Describe behaviors that span multiple objects — 'The checkout process validates, reserves, and charges' — the 'checkout process' might deserve to be a first-class object.
Distinguish cases you've conflated — 'That's a promotional discount, not a loyalty discount' — you might have one Discount class where you need two.
Name implicit concepts — 'The SLA for premium customers is different' — 'SLA' is a concept that should probably exist in code.

The Whiteboard Test

Anti-Patterns in Abstraction Identification

False Positives to Avoid

•Speculative Generality — 'We might need to support multiple databases someday, so let's abstract the data layer now.' If the need isn't demonstrated, the abstraction is speculative and often wrong when real requirements arrive.
•Utility Function Creep — 'These three services all call the same utility function, so there must be an abstraction.' Shared utilities don't imply abstraction opportunities; sometimes a function is just a function.
•Configuration Similarity — 'These config files have the same structure, so let's create a generic config handler.' Configuration similarity is often coincidental, not conceptual.
•Line Count Obsession — 'This function is 100 lines, so it needs to be abstracted.' Long functions sometimes indicate missing abstractions, but not always. Some procedures are inherently sequential and simpler as one unit.
•Framework-Driven Abstraction — 'The framework uses Repository pattern, so everything needs a repository.' Following framework patterns blindly leads to abstractions that don't match your domain.
•Premature DRY — 'These two lines are the same, let's extract a method.' Small-scale duplication is often fine; the overhead of abstraction outweighs the cost of the duplication.

false_positives.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// ANTI-PATTERN: Speculative Generality
// "We might need other notification channels someday..."
 
// Current reality: Only email is used
// Created abstraction:
interface NotificationChannel {
    send(message: Message): Promise<void>;
}
 
class EmailChannel implements NotificationChannel { /* ... */ }
class SmsChannel implements NotificationChannel { /* ... */ }      // Never used
class PushChannel implements NotificationChannel { /* ... */ }    // Never used  
class SlackChannel implements NotificationChannel { /* ... */ }   // Never used
 
// Result: Four classes, three of which are dead code
// Maintenance burden for no benefit
 
// ANTI-PATTERN: Premature DRY
// "These two functions both validate email format..."
 
// Before (simple, clear):
function validateUserEmail(email: string): boolean {
    return /^[^@]+@[^@]+\.[^@]+$/.test(email);
}
 
function validateOrderContactEmail(email: string): boolean {
    return /^[^@]+@[^@]+\.[^@]+$/.test(email);
}
 
// After "DRY" refactoring (unnecessary abstraction):
const emailValidator = new EmailValidator({
    pattern: /^[^@]+@[^@]+\.[^@]+$/,
    errorMessage: "Invalid email format"
});
 
function validateUserEmail(email: string): boolean {
    return emailValidator.validate(email);
}
 
function validateOrderContactEmail(email: string): boolean {
    return emailValidator.validate(email);
}
 
// Result: More code, more indirection, no real benefit
// The duplication was FINE—two lines of identical regex
 
// BETTER APPROACH: Wait for evidence
// When you ACTUALLY need different email validation rules:
// - User emails: Must be from non-disposable domains
// - Order contact: Can be any valid email
// - Marketing: Must have opt-in confirmed
// THEN abstract, with real requirements informing the design

The YAGNI Principle

A Systematic Discovery Process

Abstraction Opportunity Discovery Checklist

•Smell Detection — Scan for code smells: duplication, type-switching, feature envy, parallel hierarchies. Mark candidates without immediately acting.
•Rule of Three Check — For each candidate, count concrete instances. If fewer than three exist, monitor rather than abstract.
•Conceptual Analysis — Determine if similarity is structural or conceptual. Ask: 'Would these change for the same reason? Do they serve the same purpose?'
•Change History Review — Check version control for co-changing files, copy-paste-modify patterns, and cascading changes.
•Domain Language Audit — Compare domain expert vocabulary to codebase types. Note concepts that are discussed but not represented.
•False Positive Filter — Challenge each candidate: Is this speculative? Is the duplication actually fine? Am I following a framework blindly?
•Priority Assessment — For surviving candidates, estimate impact: How often is this code changed? How much complexity would the abstraction remove?
•Document Decision — Record whether to abstract now, later, or never—and why. This prevents rehashing the same analysis.

When to run this process:

During code review — Quick smell detection and Rule of Three check
Before major feature work — Full process for the affected area
Quarterly tech debt review — Systematic application across the codebase
After repeated 'this code is hard to change' complaints — Deep analysis of the problematic area

What comes next:

Once you've identified a genuine abstraction opportunity, the next steps are:

Extract Interface — Define the contract that the abstraction provides
Introduce Abstract Class — If shared implementation is needed
Iterate and Refine — Improve the abstraction based on usage feedback

These topics are covered in the following pages of this module.

Page Complete

1 / 4