Loading content...
The most common mistake when applying DRY is conflating code duplication with knowledge duplication. Developers see identical lines of code and reflexively extract them into a shared function, believing they're following DRY. But if those identical lines represent different pieces of knowledge—knowledge that could legitimately evolve in different directions—the extraction creates a harmful, inappropriate coupling.
Conversely, the same piece of knowledge can be scattered across code that looks completely different. Different syntax, different structures, different files—but all representing the same underlying truth. This is genuine DRY violation, even though no two blocks of code look alike.
This page will train your eye to see past surface-level code patterns and identify the semantic content—the knowledge—that DRY truly addresses.
By the end of this page, you will be able to distinguish between coincidental similarity (code that happens to look the same but represents different knowledge) and essential duplication (code that represents the same knowledge, regardless of how it looks). This skill is fundamental to applying DRY correctly.
Let's establish precise definitions to ground our discussion:
Code Duplication (Syntactic Duplication):
Code duplication refers to identical or near-identical sequences of source code appearing in multiple locations. This is what code analysis tools detect. They identify similar token sequences, similar AST structures, or similar function bodies.
Code duplication is:
Knowledge Duplication (Semantic Duplication):
Knowledge duplication refers to the same meaning, rule, fact, or concept being represented in multiple places. This is what DRY actually addresses. The representations might use identical code, similar code, or entirely different code—what matters is that they encode the same knowledge.
Knowledge duplication is:
| Aspect | Code Duplication | Knowledge Duplication |
|---|---|---|
| Definition | Same source code text | Same underlying meaning/rule |
| Detectability | Automatic tools can find it | Requires human understanding |
| Example | Copy-pasted function | Business rule in multiple forms |
| DRY violation? | Not necessarily | Always |
| Fix? | Depends on semantics | Create single source of truth |
When you see similar code, always ask: "What knowledge does each copy represent? Is it the same knowledge, or different knowledge that happens to have similar implementation?" This question reveals whether you have a genuine DRY violation or merely coincidental similarity.
This distinction is so important that it deserves formal treatment. We'll borrow philosophical terminology to clarify:
Essential Duplication:
Duplication is essential when two pieces of code represent the same piece of knowledge and therefore must change together. If business rules require that they evolve in lockstep, they are essentially duplicated.
Essential duplication is a genuine DRY violation. It should be eliminated by creating a single authoritative representation.
Coincidental Duplication:
Duplication is coincidental (or accidental) when two pieces of code happen to look similar but represent different knowledge. They may or may not change together—their evolution is independent.
Coincidental duplication is not a DRY violation. Extracting it into shared code creates inappropriate coupling, forcing things to change together that shouldn't.
12345678910111213141516171819202122
// ❌ WRONG: Treating coincidental duplication as essential // These two functions both multiply by 1.1, but for DIFFERENT reasonsfunction calculateTaxedPrice(basePrice: number): number { return basePrice * 1.1; // 10% tax (tax law)} function calculatePriorityFee(standardFee: number): number { return standardFee * 1.1; // 10% premium for priority (business policy)} // Naive DRY "fix" - HARMFULfunction applyTenPercentIncrease(amount: number): number { return amount * 1.1;} // WHY THIS IS WRONG:// - Tax rate is governed by tax law (could become 8%, 12%, etc.)// - Priority fee is a business decision (could become 15%, 20%, etc.)// - They are INDEPENDENT pieces of knowledge// - Changing one should NOT affect the other// - "Sharing" this code creates coupling that doesn't reflect reality1234567891011121314151617181920212223242526272829303132333435
// ✅ CORRECT: Eliminating essential duplication // BEFORE: Same knowledge duplicated across layers// Frontend validation:function validateOrder(order: Order): boolean { return order.items.length > 0 && order.items.length <= 100 && order.total >= 1;} // Backend validation (same rules, duplicated):function validateOrderRequest(req: OrderRequest): boolean { return req.items.length > 0 && req.items.length <= 100 && req.total >= 1;} // AFTER: Single source of truth for order validation rulesconst ORDER_VALIDATION = { MIN_ITEMS: 1, MAX_ITEMS: 100, MIN_TOTAL: 1} as const; function isValidOrder(itemCount: number, total: number): boolean { return itemCount >= ORDER_VALIDATION.MIN_ITEMS && itemCount <= ORDER_VALIDATION.MAX_ITEMS && total >= ORDER_VALIDATION.MIN_TOTAL;} // WHY THIS IS CORRECT:// - These represent the SAME business rule// - If "max 100 items" changes to "max 50 items", BOTH must change// - They are the SAME piece of knowledge// - Single source of truth ensures consistencyBefore extracting "duplicate" code, ask: "If I change this shared code for one use case, will I break the other use cases?" If the answer is yes, the duplication is coincidental—the use cases are independent, and shared code will fight you when they evolve differently.
Perhaps the more insidious form of duplication is when the same knowledge appears in completely different-looking code. No code analyzer will flag this. No copy-paste detection will find it. Yet it's a genuine DRY violation with all the associated maintenance costs.
This happens when the same underlying rule or fact is implemented in different ways, different languages, or different representations. The code doesn't look duplicated, but the knowledge is—and when that knowledge changes, all implementations must be updated.
1234567891011121314151617181920212223242526272829303132333435363738
// ❌ HIDDEN DUPLICATION: Same knowledge in different forms // 1. Configuration file (orders.config.json){ "freeShippingThreshold": 100, "freeShippingMessage": "Free shipping on orders over $100!"} // 2. Database constraint (schema.sql)CREATE TABLE orders ( -- ... other fields ... shipping_cost DECIMAL(10,2) CHECK (total >= 100.00 OR shipping_cost > 0)); // 3. Backend logic (orderService.ts)function calculateShipping(order: Order): number { if (order.total >= 100) { return 0; // Free shipping } return calculateStandardShipping(order);} // 4. Frontend display (CheckoutPage.tsx)<div className="shipping-notice"> {cart.total < 100 && ( <span>Add ${(100 - cart.total).toFixed(2)} more for free shipping!</span> )}</div> // 5. API documentation (api-docs.md)// "Orders with a total of $100 or more qualify for free shipping." // THE PROBLEM:// - The "free shipping at $100" rule appears in 5 places// - None of these look like duplicates to automated tools// - If the threshold changes to $75, all 5 must be updated// - Miss one, and the system is inconsistentCommon patterns of hidden knowledge duplication:
These are all genuine DRY violations, even though no two lines of code look alike.
For cross-boundary knowledge duplication: define constants in one place and generate/derive others (e.g., generate validation from schema), use shared configuration sources, or create documentation from code annotations. The key is establishing a single source from which all representations can be derived.
Now let's examine the opposite case: code that looks identical but represents different pieces of knowledge. This is coincidental duplication, and extracting it into shared code is a mistake.
The danger here is that developers (and static analysis tools) see the similar code and immediately think "DRY violation!" But the similarity is shallow—an accident of current requirements. The underlying knowledge is distinct, and future changes will cause the implementations to diverge.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
// ❌ COINCIDENTAL DUPLICATION: Same code, different knowledge // Example 1: Both validate length, but for different reasons // Username has 3-20 char limit due to UI constraints and database field sizefunction validateUsername(username: string): boolean { return username.length >= 3 && username.length <= 20;} // Password has 3-20 char limit due to security policies (will likely change!)function validatePassword(password: string): boolean { return password.length >= 3 && password.length <= 20;} // DON'T DO THIS:function validateStringLength(str: string): boolean { return str.length >= 3 && str.length <= 20;} // WHY IT'S WRONG:// - Password requirements will change (add min 8, special chars, etc.)// - Username might change to 5-50 for internationalization// - They serve different purposes with different stakeholders// - Sharing couples them artificially // Example 2: Both format dates, but for different audiences // Internal logs: machine-readable, for debuggingfunction formatLogTimestamp(date: Date): string { return date.toISOString(); // 2024-01-15T10:30:00.000Z} // User display: human-readable, locale-awarefunction formatDisplayDate(date: Date): string { return date.toISOString(); // TEMPORARY: will add locale formatting} // These LOOK identical now, but represent:// - Log format: technical requirement, might add timezone offset// - Display format: UX requirement, will become locale-aware// - Sharing them would fight future changes // Example 3: Both calculate 5% of something // Tip suggestion (social norm, varies by culture)function suggestTip(amount: number): number { return amount * 0.05;} // Sales commission (company policy, set by HR)function calculateCommission(sales: number): number { return sales * 0.05;} // NEVER share this:// - Tip percentages are cultural (5%? 15%? 20%?)// - Commission is HR policy (could become tiered, capped, etc.)// - They have completely different "owners" of the knowledgeHow to identify coincidental duplication:
Different Stakeholders — Is the same person/team responsible for both? If different stakeholders control each piece of knowledge, changes will come from different sources at different times.
Different Concepts — Even if the implementation is identical, do the concepts differ? Tax rate, discount rate, and error rate might all be 5%, but they're different concepts.
Different Evolution Paths — Can you imagine realistic scenarios where one would change but not the other? If yes, the duplication is coincidental.
Different Domains — Does the code span different bounded contexts? Each domain has its own language and may define similar-looking concepts differently.
Sandi Metz famously stated: "Duplication is far cheaper than the wrong abstraction." When you merge coincidentally similar code, you create an abstraction that doesn't match reality. When the underlying concepts diverge, the shared code becomes a battleground of conditionals and flags, worse than the original duplication.
Let's work through a detailed case study to practice distinguishing essential from coincidental duplication.
Scenario: An e-commerce platform processes orders. Several pieces of code deal with order totals and discounts. We'll analyze each potential duplication and determine whether it's essential or coincidental.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
// CASE STUDY: Analyzing potential duplications // ──────────────────────────────────────────────────────────────────// SITUATION 1: Discount calculation in cart and invoice// ────────────────────────────────────────────────────────────────── // Cart preview (shows what customer will pay):function calculateCartDiscount(items: CartItem[]): number { const subtotal = items.reduce((sum, i) => sum + i.price * i.quantity, 0); if (subtotal >= 100) return subtotal * 0.1; if (subtotal >= 50) return subtotal * 0.05; return 0;} // Invoice generation (legal document for billing):function calculateInvoiceDiscount(items: InvoiceItem[]): number { const subtotal = items.reduce((sum, i) => sum + i.price * i.quantity, 0); if (subtotal >= 100) return subtotal * 0.1; if (subtotal >= 50) return subtotal * 0.05; return 0;} // VERDICT: ✅ ESSENTIAL DUPLICATION// Reason: Both represent THE SAME business rule "discount tiers"// If tiers change (e.g., $75 threshold), both MUST change together// FIX: Extract to shared discount calculation // ──────────────────────────────────────────────────────────────────// SITUATION 2: Validation in checkout vs admin panel// ────────────────────────────────────────────────────────────────── // Customer checkout (customer-facing):function validateCheckoutOrder(order: CustomerOrder): boolean { return order.items.length <= 50 && // Cart size limit for customers order.total <= 10000; // Spending limit for fraud prevention} // Admin order creation (internal tool):function validateAdminOrder(order: AdminOrder): boolean { return order.items.length <= 50 && // Technical limit (payment processor) order.total <= 10000; // Same limit? Or different reason?} // VERDICT: ⚠️ PARTIALLY ESSENTIAL, PARTIALLY COINCIDENTAL// - The 50-item limit: ESSENTIAL (technical constraint from payment processor)// - The $10,000 limit: COINCIDENTAL (customer fraud prevention vs... what?)// // Investigation reveals:// - Customer limit: fraud prevention, may be raised for verified users// - Admin limit: no real need, just copied from customer validation//// FIX: Extract payment processor limit; separate admin validation // ──────────────────────────────────────────────────────────────────// SITUATION 3: Shipping calculation in different contexts// ────────────────────────────────────────────────────────────────── // Domestic shipping:function calculateDomesticShipping(weight: number): number { if (weight <= 1) return 5.99; if (weight <= 5) return 9.99; return 9.99 + (weight - 5) * 1.50;} // Return shipping (prepaid labels for returns):function calculateReturnShipping(weight: number): number { if (weight <= 1) return 5.99; if (weight <= 5) return 9.99; return 9.99 + (weight - 5) * 1.50;} // VERDICT: ⚠️ LIKELY COINCIDENTAL// Reason: These might look the same, but investigate:// - Domestic rates come from carrier contracts// - Return labels get negotiated discounts// - Returns are subsidized for customer experience// // They're currently identical, but:// - Return policy could change independently// - Carrier contract renegotiation affects domestic only// - Marketing might subsidize returns differently//// FIX: Keep separate for now; document the relationshipNotice that determining essential vs coincidental duplication required understanding the business. Code analysis alone can't answer these questions. You must ask: What knowledge does this represent? Who "owns" this knowledge? How might it evolve? These are semantic, not syntactic, questions.
How do you systematically distinguish essential from coincidental duplication in practice? Here are proven techniques:
1. The Change Coupling Test:
Imagine changes to each piece of duplicated code. If changing one necessarily requires changing the other to maintain correctness, the duplication is essential. If they could change independently without breaking the system, it's coincidental.
2. The Unification Test:
Pretend you've unified the duplicates. What happens when you need to change the behavior for just one use case?
Many "shared" utilities eventually become riddled with conditionals: if (isForX) { ... } else if (isForY) { ... }. This is a sign that the original similarity was coincidental, and the unified code is now the "wrong abstraction."
When shared code starts accumulating boolean flags, optional parameters, or type discriminators to handle different use-case variations, it's often a sign that coincidental duplication was incorrectly unified. Consider whether the use cases should be separated again.
Understanding common misapplications of DRY helps avoid them. Here are patterns where developers frequently conflate code and knowledge duplication:
The "Utils" Anti-pattern:
One of the most common signs of DRY misapplication is the utilities file that grows without bound. StringUtils, DateUtils, MathUtils—these become dumping grounds for any code used in more than one place.
The problem: these utilities often contain coincidentally similar code that serves different purposes. When requirements diverge, the utilities become battlegrounds:
// Started as simple date formatting
formatDate(date) { ... }
// Then one use needed timezone support
formatDate(date, timezone?) { ... }
// Another needed locale-specific formatting
formatDate(date, timezone?, locale?) { ... }
// Now it's a mess of conditionals no one understands
Better: recognize that "formatting a date for logging" and "formatting a date for user display" are different knowledge, even if they started with similar code.
Instead of organizing shared code by how it's used (Utils), organize by what knowledge it represents. A module for "tax calculations" or "shipping rules" or "date formatting for invoices" is easier to understand and maintain than a generic "Math utils" or "String utils" grab-bag.
This distinction—between knowledge and code duplication—is fundamental to applying DRY correctly. Let's consolidate the key takeaways:
What's next:
Now that we understand what true DRY violations look like, we'll explore DRY violations and fixes—practical patterns for identifying duplication in the wild and strategies for eliminating it effectively without creating the wrong abstraction.
You can now distinguish between knowledge duplication (genuine DRY violation) and code duplication (may or may not be a violation). This skill is essential for applying DRY correctly—eliminating harmful duplication while avoiding the wrong abstraction.