DRY Principle - Learning Module

Loading content...

0/246

Knowledge Duplication vs Code Duplication

The Critical Distinction

The most common mistake when applying DRY is conflating code duplication with knowledge duplication. Developers see identical lines of code and reflexively extract them into a shared function, believing they're following DRY. But if those identical lines represent different pieces of knowledge—knowledge that could legitimately evolve in different directions—the extraction creates a harmful, inappropriate coupling.

Conversely, the same piece of knowledge can be scattered across code that looks completely different. Different syntax, different structures, different files—but all representing the same underlying truth. This is genuine DRY violation, even though no two blocks of code look alike.

This page will train your eye to see past surface-level code patterns and identify the semantic content—the knowledge—that DRY truly addresses.

What You Will Learn

By the end of this page, you will be able to distinguish between coincidental similarity (code that happens to look the same but represents different knowledge) and essential duplication (code that represents the same knowledge, regardless of how it looks). This skill is fundamental to applying DRY correctly.

Defining the Terms

Let's establish precise definitions to ground our discussion:

Code Duplication (Syntactic Duplication):

Code duplication refers to identical or near-identical sequences of source code appearing in multiple locations. This is what code analysis tools detect. They identify similar token sequences, similar AST structures, or similar function bodies.

Code duplication is:

Mechanically detectable
Measured in lines, tokens, or structural similarity
What most developers think of first when they hear "duplication"

Knowledge Duplication (Semantic Duplication):

Knowledge duplication refers to the same meaning, rule, fact, or concept being represented in multiple places. This is what DRY actually addresses. The representations might use identical code, similar code, or entirely different code—what matters is that they encode the same knowledge.

Knowledge duplication is:

Semantically significant—it's about what the code means
Not always mechanically detectable
The true target of the DRY principle

Code Duplication vs Knowledge Duplication
Aspect	Code Duplication	Knowledge Duplication
Definition	Same source code text	Same underlying meaning/rule
Detectability	Automatic tools can find it	Requires human understanding
Example	Copy-pasted function	Business rule in multiple forms
DRY violation?	Not necessarily	Always
Fix?	Depends on semantics	Create single source of truth

The Semantic Lens

When you see similar code, always ask: "What knowledge does each copy represent? Is it the same knowledge, or different knowledge that happens to have similar implementation?" This question reveals whether you have a genuine DRY violation or merely coincidental similarity.

Coincidental vs Essential Duplication

This distinction is so important that it deserves formal treatment. We'll borrow philosophical terminology to clarify:

Essential Duplication:

Duplication is essential when two pieces of code represent the same piece of knowledge and therefore must change together. If business rules require that they evolve in lockstep, they are essentially duplicated.

Essential duplication is a genuine DRY violation. It should be eliminated by creating a single authoritative representation.

Coincidental Duplication:

Duplication is coincidental (or accidental) when two pieces of code happen to look similar but represent different knowledge. They may or may not change together—their evolution is independent.

Coincidental duplication is not a DRY violation. Extracting it into shared code creates inappropriate coupling, forcing things to change together that shouldn't.

coincidental-example
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// ❌ WRONG: Treating coincidental duplication as essential
 
// These two functions both multiply by 1.1, but for DIFFERENT reasons
function calculateTaxedPrice(basePrice: number): number {
  return basePrice * 1.1;  // 10% tax (tax law)
}
 
function calculatePriorityFee(standardFee: number): number {
  return standardFee * 1.1;  // 10% premium for priority (business policy)
}
 
// Naive DRY "fix" - HARMFUL
function applyTenPercentIncrease(amount: number): number {
  return amount * 1.1;
}
 
// WHY THIS IS WRONG:
// - Tax rate is governed by tax law (could become 8%, 12%, etc.)
// - Priority fee is a business decision (could become 15%, 20%, etc.)
// - They are INDEPENDENT pieces of knowledge
// - Changing one should NOT affect the other
// - "Sharing" this code creates coupling that doesn't reflect reality

essential-example
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// ✅ CORRECT: Eliminating essential duplication
 
// BEFORE: Same knowledge duplicated across layers
// Frontend validation:
function validateOrder(order: Order): boolean {
  return order.items.length > 0 && 
         order.items.length <= 100 &&
         order.total >= 1;
}
 
// Backend validation (same rules, duplicated):
function validateOrderRequest(req: OrderRequest): boolean {
  return req.items.length > 0 && 
         req.items.length <= 100 &&
         req.total >= 1;
}
 
// AFTER: Single source of truth for order validation rules
const ORDER_VALIDATION = {
  MIN_ITEMS: 1,
  MAX_ITEMS: 100,
  MIN_TOTAL: 1
} as const;
 
function isValidOrder(itemCount: number, total: number): boolean {
  return itemCount >= ORDER_VALIDATION.MIN_ITEMS &&
         itemCount <= ORDER_VALIDATION.MAX_ITEMS &&
         total >= ORDER_VALIDATION.MIN_TOTAL;
}
 
// WHY THIS IS CORRECT:
// - These represent the SAME business rule
// - If "max 100 items" changes to "max 50 items", BOTH must change
// - They are the SAME piece of knowledge
// - Single source of truth ensures consistency

The Coupling Test

Before extracting "duplicate" code, ask: "If I change this shared code for one use case, will I break the other use cases?" If the answer is yes, the duplication is coincidental—the use cases are independent, and shared code will fight you when they evolve differently.

Same Knowledge, Different Code

Perhaps the more insidious form of duplication is when the same knowledge appears in completely different-looking code. No code analyzer will flag this. No copy-paste detection will find it. Yet it's a genuine DRY violation with all the associated maintenance costs.

This happens when the same underlying rule or fact is implemented in different ways, different languages, or different representations. The code doesn't look duplicated, but the knowledge is—and when that knowledge changes, all implementations must be updated.

hidden-duplication-example
Hidden Duplication
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// ❌ HIDDEN DUPLICATION: Same knowledge in different forms
 
// 1. Configuration file (orders.config.json)
{
  "freeShippingThreshold": 100,
  "freeShippingMessage": "Free shipping on orders over $100!"
}
 
// 2. Database constraint (schema.sql)
CREATE TABLE orders (
  -- ... other fields ...
  shipping_cost DECIMAL(10,2) 
    CHECK (total >= 100.00 OR shipping_cost > 0)
);
 
// 3. Backend logic (orderService.ts)
function calculateShipping(order: Order): number {
  if (order.total >= 100) {
    return 0;  // Free shipping
  }
  return calculateStandardShipping(order);
}
 
// 4. Frontend display (CheckoutPage.tsx)
<div className="shipping-notice">
  {cart.total < 100 && (
    <span>Add ${(100 - cart.total).toFixed(2)} more for free shipping!</span>
  )}
</div>
 
// 5. API documentation (api-docs.md)
// "Orders with a total of $100 or more qualify for free shipping."
 
// THE PROBLEM:
// - The "free shipping at $100" rule appears in 5 places
// - None of these look like duplicates to automated tools
// - If the threshold changes to $75, all 5 must be updated
// - Miss one, and the system is inconsistent

Common patterns of hidden knowledge duplication:

Schema and validation — Database constraints duplicate application-level validation
Config and code — Magic numbers in config repeated in conditional logic
API and documentation — Contract details described in both OpenAPI spec and prose docs
Frontend and backend — Business rules enforced in both layers
Code and tests — Expected values hard-coded in both implementation and assertions
Comments and code — Comments that repeat what the code does (especially dangerous when code changes but comments don't)

These are all genuine DRY violations, even though no two lines of code look alike.

Consolidation Strategies

For cross-boundary knowledge duplication: define constants in one place and generate/derive others (e.g., generate validation from schema), use shared configuration sources, or create documentation from code annotations. The key is establishing a single source from which all representations can be derived.

Different Knowledge, Same Code

Now let's examine the opposite case: code that looks identical but represents different pieces of knowledge. This is coincidental duplication, and extracting it into shared code is a mistake.

The danger here is that developers (and static analysis tools) see the similar code and immediately think "DRY violation!" But the similarity is shallow—an accident of current requirements. The underlying knowledge is distinct, and future changes will cause the implementations to diverge.

coincidental-examples
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// ❌ COINCIDENTAL DUPLICATION: Same code, different knowledge
 
// Example 1: Both validate length, but for different reasons
 
// Username has 3-20 char limit due to UI constraints and database field size
function validateUsername(username: string): boolean {
  return username.length >= 3 && username.length <= 20;
}
 
// Password has 3-20 char limit due to security policies (will likely change!)
function validatePassword(password: string): boolean {
  return password.length >= 3 && password.length <= 20;
}
 
// DON'T DO THIS:
function validateStringLength(str: string): boolean {
  return str.length >= 3 && str.length <= 20;
}
 
// WHY IT'S WRONG:
// - Password requirements will change (add min 8, special chars, etc.)
// - Username might change to 5-50 for internationalization
// - They serve different purposes with different stakeholders
// - Sharing couples them artificially
 
 
// Example 2: Both format dates, but for different audiences
 
// Internal logs: machine-readable, for debugging
function formatLogTimestamp(date: Date): string {
  return date.toISOString();  // 2024-01-15T10:30:00.000Z
}
 
// User display: human-readable, locale-aware
function formatDisplayDate(date: Date): string {
  return date.toISOString();  // TEMPORARY: will add locale formatting
}
 
// These LOOK identical now, but represent:
// - Log format: technical requirement, might add timezone offset
// - Display format: UX requirement, will become locale-aware
// - Sharing them would fight future changes
 
 
// Example 3: Both calculate 5% of something
 
// Tip suggestion (social norm, varies by culture)
function suggestTip(amount: number): number {
  return amount * 0.05;
}
 
// Sales commission (company policy, set by HR)
function calculateCommission(sales: number): number {
  return sales * 0.05;
}
 
// NEVER share this:
// - Tip percentages are cultural (5%? 15%? 20%?)
// - Commission is HR policy (could become tiered, capped, etc.)
// - They have completely different "owners" of the knowledge

How to identify coincidental duplication:

Different Stakeholders — Is the same person/team responsible for both? If different stakeholders control each piece of knowledge, changes will come from different sources at different times.
Different Concepts — Even if the implementation is identical, do the concepts differ? Tax rate, discount rate, and error rate might all be 5%, but they're different concepts.
Different Evolution Paths — Can you imagine realistic scenarios where one would change but not the other? If yes, the duplication is coincidental.
Different Domains — Does the code span different bounded contexts? Each domain has its own language and may define similar-looking concepts differently.

The Wrong Abstraction

Sandi Metz famously stated: "Duplication is far cheaper than the wrong abstraction." When you merge coincidentally similar code, you create an abstraction that doesn't match reality. When the underlying concepts diverge, the shared code becomes a battleground of conditionals and flags, worse than the original duplication.

Case Study: Order Processing

Let's work through a detailed case study to practice distinguishing essential from coincidental duplication.

Scenario: An e-commerce platform processes orders. Several pieces of code deal with order totals and discounts. We'll analyze each potential duplication and determine whether it's essential or coincidental.

case-study-analysis
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
// CASE STUDY: Analyzing potential duplications
 
// ──────────────────────────────────────────────────────────────────
// SITUATION 1: Discount calculation in cart and invoice
// ──────────────────────────────────────────────────────────────────
 
// Cart preview (shows what customer will pay):
function calculateCartDiscount(items: CartItem[]): number {
  const subtotal = items.reduce((sum, i) => sum + i.price * i.quantity, 0);
  if (subtotal >= 100) return subtotal * 0.1;
  if (subtotal >= 50) return subtotal * 0.05;
  return 0;
}
 
// Invoice generation (legal document for billing):
function calculateInvoiceDiscount(items: InvoiceItem[]): number {
  const subtotal = items.reduce((sum, i) => sum + i.price * i.quantity, 0);
  if (subtotal >= 100) return subtotal * 0.1;
  if (subtotal >= 50) return subtotal * 0.05;
  return 0;
}
 
// VERDICT: ✅ ESSENTIAL DUPLICATION
// Reason: Both represent THE SAME business rule "discount tiers"
// If tiers change (e.g., $75 threshold), both MUST change together
// FIX: Extract to shared discount calculation
 
 
// ──────────────────────────────────────────────────────────────────
// SITUATION 2: Validation in checkout vs admin panel
// ──────────────────────────────────────────────────────────────────
 
// Customer checkout (customer-facing):
function validateCheckoutOrder(order: CustomerOrder): boolean {
  return order.items.length <= 50 &&  // Cart size limit for customers
         order.total <= 10000;         // Spending limit for fraud prevention
}
 
// Admin order creation (internal tool):
function validateAdminOrder(order: AdminOrder): boolean {
  return order.items.length <= 50 &&  // Technical limit (payment processor)
         order.total <= 10000;         // Same limit? Or different reason?
}
 
// VERDICT: ⚠️ PARTIALLY ESSENTIAL, PARTIALLY COINCIDENTAL
// - The 50-item limit: ESSENTIAL (technical constraint from payment processor)
// - The $10,000 limit: COINCIDENTAL (customer fraud prevention vs... what?)
// 
// Investigation reveals:
// - Customer limit: fraud prevention, may be raised for verified users
// - Admin limit: no real need, just copied from customer validation
//
// FIX: Extract payment processor limit; separate admin validation
 
 
// ──────────────────────────────────────────────────────────────────
// SITUATION 3: Shipping calculation in different contexts
// ──────────────────────────────────────────────────────────────────
 
// Domestic shipping:
function calculateDomesticShipping(weight: number): number {
  if (weight <= 1) return 5.99;
  if (weight <= 5) return 9.99;
  return 9.99 + (weight - 5) * 1.50;
}
 
// Return shipping (prepaid labels for returns):
function calculateReturnShipping(weight: number): number {
  if (weight <= 1) return 5.99;
  if (weight <= 5) return 9.99;
  return 9.99 + (weight - 5) * 1.50;
}
 
// VERDICT: ⚠️ LIKELY COINCIDENTAL
// Reason: These might look the same, but investigate:
// - Domestic rates come from carrier contracts
// - Return labels get negotiated discounts
// - Returns are subsidized for customer experience
// 
// They're currently identical, but:
// - Return policy could change independently
// - Carrier contract renegotiation affects domestic only
// - Marketing might subsidize returns differently
//
// FIX: Keep separate for now; document the relationship

Investigation Matters

Notice that determining essential vs coincidental duplication required understanding the business. Code analysis alone can't answer these questions. You must ask: What knowledge does this represent? Who "owns" this knowledge? How might it evolve? These are semantic, not syntactic, questions.

Techniques for Identification

How do you systematically distinguish essential from coincidental duplication in practice? Here are proven techniques:

1. The Change Coupling Test:

Imagine changes to each piece of duplicated code. If changing one necessarily requires changing the other to maintain correctness, the duplication is essential. If they could change independently without breaking the system, it's coincidental.

Identification Techniques

•Stakeholder Analysis — Identify who "owns" each piece of knowledge. If the same stakeholder (person, team, department) controls both, essential coupling is likely. Different stakeholders suggest coincidental similarity.
•Concept Naming — Try to name the shared concept precisely. If you can give it a clear, specific name that both usages agree on (e.g., "order discount tiers"), it's likely essential. If the name feels forced or generic (e.g., "percentage calculation"), it's likely coincidental.
•Evolution Projection — Project the code 1-2 years forward. Imagine realistic changes. Would these copies evolve together or diverge? If you can easily imagine divergence, the current similarity is accidental.
•Domain Boundary Check — Are the duplicates in the same bounded context? Duplication across bounded contexts is often coincidental; concepts may share names but have different meanings in different domains.
•Rate of Change Analysis — How often does each piece of code change? If they change at different rates or in response to different triggers (regulations vs. marketing vs. technology), they're probably independent.

2. The Unification Test:

Pretend you've unified the duplicates. What happens when you need to change the behavior for just one use case?

If the change naturally applies to all uses → Essential duplication was correctly unified
If you add a conditional/flag to handle "this case" → You may have unified coincidental duplication

Many "shared" utilities eventually become riddled with conditionals: if (isForX) { ... } else if (isForY) { ... }. This is a sign that the original similarity was coincidental, and the unified code is now the "wrong abstraction."

The Flag Smell

When shared code starts accumulating boolean flags, optional parameters, or type discriminators to handle different use-case variations, it's often a sign that coincidental duplication was incorrectly unified. Consider whether the use cases should be separated again.

Common Misapplications

Understanding common misapplications of DRY helps avoid them. Here are patterns where developers frequently conflate code and knowledge duplication:

Common DRY Misapplications

•Cross-service "shared" libraries — Sharing code between microservices creates coupling; each service may need independent evolution.
•Test helper mega-utilities — Massive shared test setup that makes tests brittle and hard to understand.
•Generic "Utils" classes — Dumping ground for unrelated functions that happen to be used in multiple places.
•Abstract base classes for "reuse" — Inheritance hierarchies created solely to avoid repeating code, without genuine is-a relationships.

Appropriate DRY Applications

•Shared domain models — Core business entities with clear ownership and consistent meaning.
•Focused utility libraries — Specific purposes (date handling, validation) with clear semantics.
•Contract definitions — API schemas, event formats where consistency is essential.
•Configuration derivation — Generating multiple representations from a single source of truth.

The "Utils" Anti-pattern:

One of the most common signs of DRY misapplication is the utilities file that grows without bound. StringUtils, DateUtils, MathUtils—these become dumping grounds for any code used in more than one place.

The problem: these utilities often contain coincidentally similar code that serves different purposes. When requirements diverge, the utilities become battlegrounds:

// Started as simple date formatting
formatDate(date) { ... }

// Then one use needed timezone support
formatDate(date, timezone?) { ... }

// Another needed locale-specific formatting
formatDate(date, timezone?, locale?) { ... }

// Now it's a mess of conditionals no one understands

Better: recognize that "formatting a date for logging" and "formatting a date for user display" are different knowledge, even if they started with similar code.

Purpose-Driven Modules

Instead of organizing shared code by how it's used (Utils), organize by what knowledge it represents. A module for "tax calculations" or "shipping rules" or "date formatting for invoices" is easier to understand and maintain than a generic "Math utils" or "String utils" grab-bag.

Summary: Knowledge vs Code Duplication

This distinction—between knowledge and code duplication—is fundamental to applying DRY correctly. Let's consolidate the key takeaways:

Key Takeaways

•DRY addresses knowledge, not code — The principle targets semantic duplication (same meaning) rather than syntactic duplication (same characters).
•Essential duplication must be eliminated — When the same piece of knowledge appears in multiple places, they must change together; consolidate them.
•Coincidental duplication should be left alone — When code looks similar but represents different knowledge, extracting it creates inappropriate coupling.
•Same knowledge can hide in different code — The most dangerous DRY violations don't look like duplicates; the knowledge is scattered across different representations.
•Different knowledge can hide in same code — Identical code that serves different purposes may need to evolve independently; don't force them together.
•Use semantic analysis techniques — Stakeholder analysis, concept naming, evolution projection, and the change coupling test help identify true duplication.
•Flags and conditionals are warning signs — When shared code starts needing parameters to handle different cases, the unification may have been a mistake.

What's next:

Now that we understand what true DRY violations look like, we'll explore DRY violations and fixes—practical patterns for identifying duplication in the wild and strategies for eliminating it effectively without creating the wrong abstraction.

Page Complete

You can now distinguish between knowledge duplication (genuine DRY violation) and code duplication (may or may not be a violation). This skill is essential for applying DRY correctly—eliminating harmful duplication while avoiding the wrong abstraction.