System Design (LLD)Persistence Layer Design

Repository Pattern Deep Dive

LevelIntermediate

Duration75 mins

TopicPersistence Layer Design

1 / 4

Repository as Collection Abstraction

The Illusion of In-Memory Data

Imagine writing business logic that manipulates domain objects as if they were simply sitting in memory—adding customers to a list, removing orders, querying products by category—without ever writing a SQL statement, constructing a database connection, or worrying about transactional boundaries. The Repository pattern makes this illusion possible.

At its core, the Repository pattern is one of the most elegant abstractions in software design: it presents the persistence layer as an in-memory collection. Your domain code interacts with what appears to be a simple collection interface, while behind the scenes, the repository translates every operation into the appropriate persistence mechanism—be it relational databases, document stores, message queues, or external APIs.

What You Will Learn

By the end of this page, you will deeply understand the Repository pattern as a collection abstraction. You'll learn its origins, core principles, the problems it solves, and how it enables clean separation between your domain logic and persistence concerns. This knowledge forms the foundation for implementing robust, testable, and maintainable data access layers.

Origins and Definition

The Repository pattern was formalized by Martin Fowler in his seminal work Patterns of Enterprise Application Architecture (2002) and further elaborated by Eric Evans in Domain-Driven Design (2003). Both authors recognized a fundamental tension in enterprise software: domain logic should remain pure and focused on business rules, not polluted by data access concerns.

Fowler's Definition:

"A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects."

Evans' Perspective (DDD):

Evans extended this concept within Domain-Driven Design, emphasizing that repositories should be designed for Aggregate Roots only. Each aggregate—the consistency boundary in your domain—should have at most one repository. The repository becomes the gateway through which aggregates are retrieved and persisted, ensuring consistency invariants are always maintained.

The Collection Metaphor

The key insight is the collection metaphor. When you add an object to a collection in memory (e.g., list.add(customer)), you don't think about serialization, storage locations, or retrieval mechanisms. The Repository pattern aims to preserve this simplicity for persistent data. You add(order) to a repository, and the storage details are handled transparently.

Historical Context:

Before the Repository pattern became widely adopted, data access code was often scattered throughout business logic. Developers would construct SQL queries inline, mix transaction management with domain operations, and tightly couple their code to specific database technologies. This created several problems:

Testing difficulty: Business logic couldn't be tested without a database
Technology lock-in: Switching databases required rewrites across the codebase
Code duplication: Similar queries appeared in multiple places
Maintainability nightmare: Finding all data access code for a given entity was nearly impossible

The Repository pattern emerged as a solution to all these problems by introducing a clean abstraction layer.

Core Principles of the Repository Pattern

Understanding the Repository pattern deeply requires grasping its fundamental principles. These aren't arbitrary design choices—they're carefully considered guidelines that ensure the pattern delivers its intended benefits.

Fundamental Principles

•Collection Semantics — A repository presents itself as an in-memory collection. It supports operations like Add, Remove, Find, and Get—just like a list or set. No SQL, no queries in the traditional sense—just collection-style operations.
•Persistence Ignorance — Domain objects retrieved through a repository should be completely unaware of how they're stored. They shouldn't contain database annotations, serialization logic, or framework-specific code. They're pure domain objects.
•Aggregate-Centric Design — In DDD terms, repositories are designed around aggregates, not individual entities. You don't have a repository for OrderLineItem—you have one for Order, which contains its line items. The aggregate is loaded and saved as a unit.
•Query Abstraction — Complex queries are expressed through the repository's interface, not through query languages exposed to calling code. The repository encapsulates how to translate business requirements into data retrieval.
•Transactional Boundary — Repositories typically work within a single transaction. Operations like add and remove don't necessarily persist immediately—they might be deferred until a Unit of Work commits the transaction.

Common Misconception

A Repository is not just a DAO (Data Access Object) with a different name. While DAOs are typically table-centric and expose CRUD operations for a single database table, Repositories are aggregate-centric and hide all persistence details. A Repository might interact with multiple tables, external services, or even multiple databases—but its interface never reveals this complexity.

Designing the Collection Interface

The power of the Repository pattern lies in its interface design. A well-designed repository interface should feel like a natural extension of working with in-memory collections, while providing the operations your domain actually needs.

The Minimal Collection Interface:

At its simplest, a repository supports four fundamental operations that mirror standard collection behavior:

IRepository.cs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/// <summary>
/// The minimal repository interface - pure collection semantics.
/// Notice: no database terminology, no SQL, no connection management.
/// </summary>
public interface IRepository<T, TId> where T : class
{
    /// <summary>
    /// Retrieves an entity by its unique identifier.
    /// Returns null if not found (or throws, depending on design choice).
    /// This mirrors: var item = collection.FirstOrDefault(x => x.Id == id);
    /// </summary>
    T? GetById(TId id);
    
    /// <summary>
    /// Retrieves all entities of this type.
    /// Use with caution - can be expensive for large datasets.
    /// This mirrors: collection.ToList();
    /// </summary>
    IEnumerable<T> GetAll();
    
    /// <summary>
    /// Adds a new entity to the repository.
    /// The entity will be persisted when the unit of work commits.
    /// This mirrors: collection.Add(item);
    /// </summary>
    void Add(T entity);
    
    /// <summary>
    /// Removes an entity from the repository.
    /// The deletion will occur when the unit of work commits.
    /// This mirrors: collection.Remove(item);
    /// </summary>
    void Remove(T entity);
    
    /// <summary>
    /// Updates an existing entity.
    /// Note: In true collection semantics, this isn't needed since objects
    /// are references. But for persistence, explicit update tracking helps.
    /// </summary>
    void Update(T entity);
}

Key Interface Design Decisions:

1. Return Types for Finding Entities:

Should GetById return null/undefined when an entity isn't found, or should it throw an exception? Both approaches are valid, but they communicate different intents:

Return null/Optional: Indicates that a missing entity is a normal, expected case. The caller must handle it.
Throw exception: Indicates that a missing entity is exceptional—business logic expects the entity to exist.

Many repositories provide both: GetById (returns null) and GetByIdOrThrow (throws EntityNotFoundException).

2. Synchronous vs. Asynchronous Operations:

In modern applications, especially those handling I/O, asynchronous operations are preferable. The examples above show synchronous interfaces for clarity, but production repositories should typically be async:

Task<T?> GetByIdAsync(TId id, CancellationToken ct = default);
Task AddAsync(T entity, CancellationToken ct = default);

3. The Update Method Debate:

In pure collection semantics, there's no Update method—you modify an object in place, and since it's a reference, the collection reflects the change. However, with persistence:

Some ORMs track changes automatically (Entity Framework, Hibernate with attached entities)
Some require explicit save calls
Some use CQRS patterns where updates go through command handlers

The Update method exists for clarity and to support various persistence strategies.

Beyond Basic CRUD: Query Methods

While the basic collection operations form the foundation, real-world repositories need domain-specific query methods. These methods express business requirements in terms the domain understands, not database queries the persistence layer requires.

IOrderRepository.cs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/// <summary>
/// A domain-specific repository for Order aggregates.
/// Notice how query methods use business terminology, not database terminology.
/// </summary>
public interface IOrderRepository
{
    // Basic collection operations (inherited or defined here)
    Order? GetById(OrderId id);
    void Add(Order order);
    void Remove(Order order);
    
    // Domain-specific queries - these express business requirements
    
    /// <summary>
    /// Finds orders that are ready for fulfillment (paid, stock verified).
    /// The repository knows what "ready for fulfillment" means in terms of data.
    /// </summary>
    IEnumerable<Order> FindOrdersReadyForFulfillment();
    
    /// <summary>
    /// Finds orders for a specific customer.
    /// </summary>
    IEnumerable<Order> FindByCustomer(CustomerId customerId);
    
    /// <summary>
    /// Finds orders placed within a date range.
    /// Useful for reporting and analytics.
    /// </summary>
    IEnumerable<Order> FindByDateRange(DateTime start, DateTime end);
    
    /// <summary>
    /// Finds orders that require attention - overdue, flagged, etc.
    /// Business logic defines "requires attention", repository implements it.
    /// </summary>
    IEnumerable<Order> FindOrdersRequiringAttention();
    
    /// <summary>
    /// Gets the next order number for sequencing.
    /// Some domain operations require repository involvement.
    /// </summary>
    OrderNumber GetNextOrderNumber();
    
    /// <summary>
    /// Checks if a duplicate order exists (idempotency check).
    /// </summary>
    bool ExistsWithIdempotencyKey(string idempotencyKey);
}

The Query Method Naming Convention

Notice the naming pattern: methods that return multiple entities typically start with Find (C#) or findBy (Java), while methods returning a single entity use Get or findById. This convention helps callers understand what to expect. Also, method names use domain terminology (FindOrdersReadyForFulfillment) rather than implementation details (FindByStatusPaidAndStockVerified).

The Query Method Explosion Problem:

As business requirements grow, you might be tempted to add more and more query methods:

IEnumerable<Order> FindByCustomerAndStatus(CustomerId customerId, OrderStatus status);
IEnumerable<Order> FindByCustomerAndDateRange(CustomerId customerId, DateTime start, DateTime end);
IEnumerable<Order> FindByCustomerAndStatusAndDateRange(...);
// This quickly becomes unmanageable!

Solutions to Query Method Explosion:

Specification Pattern: Pass specification objects that encapsulate query criteria
Query Objects: Create dedicated query classes that express complex criteria
Read Models: Use CQRS to separate read concerns with optimized query services

We'll explore these patterns in detail later in this course.

What the Repository Hides

The true power of the Repository pattern becomes apparent when we examine what it encapsulates—all the complexities that your domain code never needs to know about.

Persistence Concerns Hidden by the Repository
Concern	What Domain Code Sees	What Repository Handles Internally
Database Connection	Nothing—just calls methods	Connection pooling, connection strings, timeout handling
Query Language	Typed method calls	SQL, LINQ, HQL, Cypher, MongoDB queries, etc.
Object-Relational Mapping	Pure domain objects	Column mappings, type conversions, relationship loading
Transaction Management	A unit of work boundary	BEGIN/COMMIT/ROLLBACK, isolation levels, deadlock handling
Caching	Fast responses	First/second-level caches, cache invalidation strategy
Performance Optimization	Consistent behavior	Query optimization, indexing hints, batch operations
Data Source Location	Just the data	Local DB, remote service, read replica routing, sharding
Error Handling	Domain exceptions	Constraint violations, deadlocks, timeout retries

Concrete Example: Hiding Database Complexity

Consider what happens behind a simple GetById call:

OrderRepositoryImplementation.cs
C#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// What the domain code sees:
var order = orderRepository.GetById(orderId);
 
// What actually happens inside the repository implementation:
public class SqlOrderRepository : IOrderRepository
{
    private readonly DbContext _context;
    private readonly ILogger<SqlOrderRepository> _logger;
    private readonly ICacheProvider _cache;
    
    public Order? GetById(OrderId id)
    {
        // 1. Check first-level cache (identity map)
        var cached = _context.ChangeTracker
            .Entries<Order>()
            .FirstOrDefault(e => e.Entity.Id == id)?.Entity;
        if (cached != null) return cached;
        
        // 2. Check second-level distributed cache
        var cacheKey = $"order:{id.Value}";
        var fromCache = _cache.Get<OrderCacheDto>(cacheKey);
        if (fromCache != null)
        {
            var order = MapToDomain(fromCache);
            _context.Attach(order);
            return order;
        }
        
        // 3. Query database with optimized includes
        var dbOrder = _context.Orders
            .Include(o => o.LineItems)
                .ThenInclude(li => li.Product)
            .Include(o => o.ShippingAddress)
            .Include(o => o.BillingAddress)
            .AsSplitQuery() // Optimize for multiple includes
            .FirstOrDefault(o => o.Id == id);
        
        if (dbOrder == null)
        {
            _logger.LogDebug("Order {OrderId} not found", id);
            return null;
        }
        
        // 4. Populate cache for next request
        _cache.Set(cacheKey, MapToCache(dbOrder), TimeSpan.FromMinutes(5));
        
        // 5. Return domain object (not DB entity)
        return dbOrder;
    }
}

The Abstraction Benefit

The domain code simply calls GetById(orderId) and receives an Order. It never knows about the cache checks, eager loading strategy, split queries, or logging. If tomorrow you need to add read replica routing or switch to a different ORM, the domain code doesn't change at all.

The Identity Map: Maintaining Object Identity

A critical aspect of the collection abstraction is identity preservation: if you retrieve the same entity twice within a scope, you should get the same object instance. This is what in-memory collections do naturally—and repositories should preserve this behavior.

The Problem Without Identity Mapping:

IdentityProblem.cs
C#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Without identity mapping, this causes problems:
var order1 = orderRepository.GetById(orderId);
var order2 = orderRepository.GetById(orderId);
 
order1.AddLineItem(newItem);
 
// If order1 and order2 are different instances:
Console.WriteLine(order1.LineItems.Count);  // 3
Console.WriteLine(order2.LineItems.Count);  // 2  ← WRONG!
 
// The change to order1 isn't reflected in order2 because
// they're separate instances representing the same data!
 
// Even worse:
orderRepository.Save(order1);  // Saves with 3 items
orderRepository.Save(order2);  // Overwrites with 2 items! Bug!

The Identity Map Solution:

The repository (or the underlying ORM) maintains an Identity Map—a dictionary that tracks which entity instances have been loaded for each unique identifier. When you request an entity that's already loaded, you get the same instance back.

IdentityMapExample.cs
C#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// With identity mapping (what repositories should provide):
var order1 = orderRepository.GetById(orderId);
var order2 = orderRepository.GetById(orderId);
 
// Both variables reference the same instance
Console.WriteLine(ReferenceEquals(order1, order2));  // true
 
order1.AddLineItem(newItem);
 
// Now both see the change:
Console.WriteLine(order1.LineItems.Count);  // 3
Console.WriteLine(order2.LineItems.Count);  // 3  ← Correct!
 
// Entity Framework and Hibernate provide this automatically via their
// DbContext/Session tracking. The repository doesn't re-query the database
// for an entity that's already loaded in the current unit of work.

Identity Map Scope

The Identity Map is typically scoped to a Unit of Work or database session/context. Once the unit of work ends, the identity map is cleared. If you load the same entity in a new unit of work, you get a new instance. This scope boundary is crucial for proper transaction management and preventing memory leaks in long-running processes.

Repositories and Aggregate Boundaries

In Domain-Driven Design, repositories are intimately connected to aggregates. Understanding this relationship is essential for designing proper repository interfaces.

What is an Aggregate?

An aggregate is a cluster of domain objects that are treated as a single unit for data changes. It has:

An Aggregate Root: The single entry point for all operations on the aggregate
Consistency Boundary: All invariants within the aggregate are always consistent
Transaction Boundary: Changes to an aggregate are saved atomically

The Repository-Aggregate Relationship:

Each aggregate root gets its own repository. You never create repositories for entities or value objects that aren't aggregate roots.

Correct: Aggregate Root Repositories

•IOrderRepository — Order is the aggregate root
•ICustomerRepository — Customer is the aggregate root
•IProductCatalogRepository — Product is the aggregate root
•IShoppingCartRepository — Cart is the aggregate root
•Each repository loads/saves the entire aggregate

Incorrect: Non-Aggregate Repositories

•~~IOrderLineItemRepository~~ — Part of Order aggregate
•~~IAddressRepository~~ — Part of Customer/Order aggregate
•~~IMoneyRepository~~ — Money is a value object
•~~ICartItemRepository~~ — Part of Cart aggregate
•These would break aggregate consistency guarantees

Why This Matters:

If you created an IOrderLineItemRepository, you could modify line items without going through the Order aggregate root. This bypasses the Order's invariant checks:

// WRONG: Directly modifying a child entity
var lineItem = lineItemRepository.GetById(itemId);
lineItem.Quantity = 100;  // What if Order has a max items rule?
lineItemRepository.Save(lineItem);  // Order's invariants are bypassed!

// CORRECT: Going through the aggregate root
var order = orderRepository.GetById(orderId);
order.UpdateLineItemQuantity(itemId, 100);  // Order can enforce its rules
orderRepository.Save(order);  // Entire aggregate is saved consistently

The repository ensures that all access to the aggregate goes through the root, maintaining consistency invariants.

Aggregate Loading Strategies

When loading an aggregate, you typically load the entire aggregate graph. For an Order, this means the order header, all line items, addresses, and any other parts of the aggregate. The repository decides how to load efficiently (eager loading, split queries, etc.), but it always returns the complete aggregate, never a partial one.

Benefits of the Collection Abstraction

The Repository pattern, properly implemented as a collection abstraction, delivers substantial benefits across multiple dimensions of software quality.

Key Benefits

•Testability — Domain logic can be tested with in-memory repository implementations. No database needed for unit tests. Tests run in milliseconds instead of seconds. This alone justifies the pattern for most teams.
•Technology Independence — Switch from SQL Server to PostgreSQL, from a relational DB to a document store, from a local database to a cloud service—all without changing domain code. The repository interface remains stable.
•Centralized Data Access Logic — All queries for an aggregate live in one place. Need to add caching? Change it in the repository. Need to add audit logging? Change it in the repository. Need to optimize a query? You know exactly where to look.
•Clear Architectural Boundaries — The repository interface forms a clean boundary between domain and infrastructure. This boundary is visible in the code, enforceable by dependency direction, and documentable for the team.
•Domain Model Purity — Domain objects remain free of persistence concerns. No [Column] attributes, no lazy-loading proxies visible in domain code, no serialization worries. Pure domain logic.
•Improved Maintainability — When persistence needs change, changes are localized to repository implementations. When domain needs change, repository interfaces evolve clearly. The separation of concerns pays long-term dividends.

The Testing Multiplier

Teams that adopt the Repository pattern consistently report that the testability benefit alone transforms their development workflow. When you can test complex domain scenarios in under a second without database setup, you write more tests, catch more bugs, and refactor with confidence. This is perhaps the pattern's most impactful benefit in practice.

Summary: Repository as Collection

We've established the foundational understanding of the Repository pattern as a collection abstraction. Let's consolidate the key concepts:

Key Takeaways

•Collection Metaphor — A Repository presents persistent data as if it were an in-memory collection, supporting add, remove, find, and get operations with familiar semantics.
•Persistence Ignorance — Domain objects remain pure, unaware of how they're stored. All persistence complexity is encapsulated in the repository implementation.
•Aggregate-Centric Design — Repositories are created for aggregate roots only, ensuring consistency boundaries are respected and invariants maintained.
•Query Abstraction — Domain-specific query methods use business terminology, hiding SQL and query language details from domain code.
•Identity Mapping — Within a unit of work, the same entity ID always returns the same object instance, preserving reference semantics.
•Implementation Hiding — Connection management, caching, optimization, and error handling are all encapsulated, invisible to calling code.

What's Next:

Now that we understand the conceptual foundation of repositories as collections, we'll explore a critical design decision: Generic vs. Specific Repositories. Should you create one generic IRepository<T> that works for all entities, or specific interfaces like IOrderRepository for each aggregate? Both approaches have advocates, and the right choice depends on your context.

Page Complete

You now understand the Repository pattern as a collection abstraction—its origins, principles, interface design, and benefits. This conceptual foundation is essential for the implementation decisions we'll explore in the following pages.

1 / 4

Loading learning content...

System Design (LLD)Persistence Layer Design

Repository Pattern Deep Dive

LevelIntermediate

Duration75 mins

TopicPersistence Layer Design

1 / 4

Repository as Collection Abstraction

The Illusion of In-Memory Data

What You Will Learn

Origins and Definition

Fowler's Definition:

"A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects."

Evans' Perspective (DDD):

The Collection Metaphor

Historical Context:

Testing difficulty: Business logic couldn't be tested without a database
Technology lock-in: Switching databases required rewrites across the codebase
Code duplication: Similar queries appeared in multiple places
Maintainability nightmare: Finding all data access code for a given entity was nearly impossible

The Repository pattern emerged as a solution to all these problems by introducing a clean abstraction layer.

Core Principles of the Repository Pattern

Fundamental Principles

•Collection Semantics — A repository presents itself as an in-memory collection. It supports operations like Add, Remove, Find, and Get—just like a list or set. No SQL, no queries in the traditional sense—just collection-style operations.
•Persistence Ignorance — Domain objects retrieved through a repository should be completely unaware of how they're stored. They shouldn't contain database annotations, serialization logic, or framework-specific code. They're pure domain objects.
•Aggregate-Centric Design — In DDD terms, repositories are designed around aggregates, not individual entities. You don't have a repository for OrderLineItem—you have one for Order, which contains its line items. The aggregate is loaded and saved as a unit.
•Query Abstraction — Complex queries are expressed through the repository's interface, not through query languages exposed to calling code. The repository encapsulates how to translate business requirements into data retrieval.
•Transactional Boundary — Repositories typically work within a single transaction. Operations like add and remove don't necessarily persist immediately—they might be deferred until a Unit of Work commits the transaction.

Common Misconception

Designing the Collection Interface

The Minimal Collection Interface:

At its simplest, a repository supports four fundamental operations that mirror standard collection behavior:

IRepository.cs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/// <summary>
/// The minimal repository interface - pure collection semantics.
/// Notice: no database terminology, no SQL, no connection management.
/// </summary>
public interface IRepository<T, TId> where T : class
{
    /// <summary>
    /// Retrieves an entity by its unique identifier.
    /// Returns null if not found (or throws, depending on design choice).
    /// This mirrors: var item = collection.FirstOrDefault(x => x.Id == id);
    /// </summary>
    T? GetById(TId id);
    
    /// <summary>
    /// Retrieves all entities of this type.
    /// Use with caution - can be expensive for large datasets.
    /// This mirrors: collection.ToList();
    /// </summary>
    IEnumerable<T> GetAll();
    
    /// <summary>
    /// Adds a new entity to the repository.
    /// The entity will be persisted when the unit of work commits.
    /// This mirrors: collection.Add(item);
    /// </summary>
    void Add(T entity);
    
    /// <summary>
    /// Removes an entity from the repository.
    /// The deletion will occur when the unit of work commits.
    /// This mirrors: collection.Remove(item);
    /// </summary>
    void Remove(T entity);
    
    /// <summary>
    /// Updates an existing entity.
    /// Note: In true collection semantics, this isn't needed since objects
    /// are references. But for persistence, explicit update tracking helps.
    /// </summary>
    void Update(T entity);
}

Key Interface Design Decisions:

1. Return Types for Finding Entities:

Should GetById return null/undefined when an entity isn't found, or should it throw an exception? Both approaches are valid, but they communicate different intents:

Return null/Optional: Indicates that a missing entity is a normal, expected case. The caller must handle it.
Throw exception: Indicates that a missing entity is exceptional—business logic expects the entity to exist.

Many repositories provide both: GetById (returns null) and GetByIdOrThrow (throws EntityNotFoundException).

2. Synchronous vs. Asynchronous Operations:

Task<T?> GetByIdAsync(TId id, CancellationToken ct = default);
Task AddAsync(T entity, CancellationToken ct = default);

3. The Update Method Debate:

In pure collection semantics, there's no Update method—you modify an object in place, and since it's a reference, the collection reflects the change. However, with persistence:

Some ORMs track changes automatically (Entity Framework, Hibernate with attached entities)
Some require explicit save calls
Some use CQRS patterns where updates go through command handlers

The Update method exists for clarity and to support various persistence strategies.

Beyond Basic CRUD: Query Methods

IOrderRepository.cs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/// <summary>
/// A domain-specific repository for Order aggregates.
/// Notice how query methods use business terminology, not database terminology.
/// </summary>
public interface IOrderRepository
{
    // Basic collection operations (inherited or defined here)
    Order? GetById(OrderId id);
    void Add(Order order);
    void Remove(Order order);
    
    // Domain-specific queries - these express business requirements
    
    /// <summary>
    /// Finds orders that are ready for fulfillment (paid, stock verified).
    /// The repository knows what "ready for fulfillment" means in terms of data.
    /// </summary>
    IEnumerable<Order> FindOrdersReadyForFulfillment();
    
    /// <summary>
    /// Finds orders for a specific customer.
    /// </summary>
    IEnumerable<Order> FindByCustomer(CustomerId customerId);
    
    /// <summary>
    /// Finds orders placed within a date range.
    /// Useful for reporting and analytics.
    /// </summary>
    IEnumerable<Order> FindByDateRange(DateTime start, DateTime end);
    
    /// <summary>
    /// Finds orders that require attention - overdue, flagged, etc.
    /// Business logic defines "requires attention", repository implements it.
    /// </summary>
    IEnumerable<Order> FindOrdersRequiringAttention();
    
    /// <summary>
    /// Gets the next order number for sequencing.
    /// Some domain operations require repository involvement.
    /// </summary>
    OrderNumber GetNextOrderNumber();
    
    /// <summary>
    /// Checks if a duplicate order exists (idempotency check).
    /// </summary>
    bool ExistsWithIdempotencyKey(string idempotencyKey);
}

The Query Method Naming Convention

The Query Method Explosion Problem:

As business requirements grow, you might be tempted to add more and more query methods:

IEnumerable<Order> FindByCustomerAndStatus(CustomerId customerId, OrderStatus status);
IEnumerable<Order> FindByCustomerAndDateRange(CustomerId customerId, DateTime start, DateTime end);
IEnumerable<Order> FindByCustomerAndStatusAndDateRange(...);
// This quickly becomes unmanageable!

Solutions to Query Method Explosion:

Specification Pattern: Pass specification objects that encapsulate query criteria
Query Objects: Create dedicated query classes that express complex criteria
Read Models: Use CQRS to separate read concerns with optimized query services

We'll explore these patterns in detail later in this course.

What the Repository Hides

The true power of the Repository pattern becomes apparent when we examine what it encapsulates—all the complexities that your domain code never needs to know about.

Persistence Concerns Hidden by the Repository
Concern	What Domain Code Sees	What Repository Handles Internally
Database Connection	Nothing—just calls methods	Connection pooling, connection strings, timeout handling
Query Language	Typed method calls	SQL, LINQ, HQL, Cypher, MongoDB queries, etc.
Object-Relational Mapping	Pure domain objects	Column mappings, type conversions, relationship loading
Transaction Management	A unit of work boundary	BEGIN/COMMIT/ROLLBACK, isolation levels, deadlock handling
Caching	Fast responses	First/second-level caches, cache invalidation strategy
Performance Optimization	Consistent behavior	Query optimization, indexing hints, batch operations
Data Source Location	Just the data	Local DB, remote service, read replica routing, sharding
Error Handling	Domain exceptions	Constraint violations, deadlocks, timeout retries

Concrete Example: Hiding Database Complexity

Consider what happens behind a simple GetById call:

OrderRepositoryImplementation.cs
C#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// What the domain code sees:
var order = orderRepository.GetById(orderId);
 
// What actually happens inside the repository implementation:
public class SqlOrderRepository : IOrderRepository
{
    private readonly DbContext _context;
    private readonly ILogger<SqlOrderRepository> _logger;
    private readonly ICacheProvider _cache;
    
    public Order? GetById(OrderId id)
    {
        // 1. Check first-level cache (identity map)
        var cached = _context.ChangeTracker
            .Entries<Order>()
            .FirstOrDefault(e => e.Entity.Id == id)?.Entity;
        if (cached != null) return cached;
        
        // 2. Check second-level distributed cache
        var cacheKey = $"order:{id.Value}";
        var fromCache = _cache.Get<OrderCacheDto>(cacheKey);
        if (fromCache != null)
        {
            var order = MapToDomain(fromCache);
            _context.Attach(order);
            return order;
        }
        
        // 3. Query database with optimized includes
        var dbOrder = _context.Orders
            .Include(o => o.LineItems)
                .ThenInclude(li => li.Product)
            .Include(o => o.ShippingAddress)
            .Include(o => o.BillingAddress)
            .AsSplitQuery() // Optimize for multiple includes
            .FirstOrDefault(o => o.Id == id);
        
        if (dbOrder == null)
        {
            _logger.LogDebug("Order {OrderId} not found", id);
            return null;
        }
        
        // 4. Populate cache for next request
        _cache.Set(cacheKey, MapToCache(dbOrder), TimeSpan.FromMinutes(5));
        
        // 5. Return domain object (not DB entity)
        return dbOrder;
    }
}

The Abstraction Benefit

The Identity Map: Maintaining Object Identity

The Problem Without Identity Mapping:

IdentityProblem.cs
C#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Without identity mapping, this causes problems:
var order1 = orderRepository.GetById(orderId);
var order2 = orderRepository.GetById(orderId);
 
order1.AddLineItem(newItem);
 
// If order1 and order2 are different instances:
Console.WriteLine(order1.LineItems.Count);  // 3
Console.WriteLine(order2.LineItems.Count);  // 2  ← WRONG!
 
// The change to order1 isn't reflected in order2 because
// they're separate instances representing the same data!
 
// Even worse:
orderRepository.Save(order1);  // Saves with 3 items
orderRepository.Save(order2);  // Overwrites with 2 items! Bug!

The Identity Map Solution:

IdentityMapExample.cs
C#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// With identity mapping (what repositories should provide):
var order1 = orderRepository.GetById(orderId);
var order2 = orderRepository.GetById(orderId);
 
// Both variables reference the same instance
Console.WriteLine(ReferenceEquals(order1, order2));  // true
 
order1.AddLineItem(newItem);
 
// Now both see the change:
Console.WriteLine(order1.LineItems.Count);  // 3
Console.WriteLine(order2.LineItems.Count);  // 3  ← Correct!
 
// Entity Framework and Hibernate provide this automatically via their
// DbContext/Session tracking. The repository doesn't re-query the database
// for an entity that's already loaded in the current unit of work.

Identity Map Scope

Repositories and Aggregate Boundaries

In Domain-Driven Design, repositories are intimately connected to aggregates. Understanding this relationship is essential for designing proper repository interfaces.

What is an Aggregate?

An aggregate is a cluster of domain objects that are treated as a single unit for data changes. It has:

An Aggregate Root: The single entry point for all operations on the aggregate
Consistency Boundary: All invariants within the aggregate are always consistent
Transaction Boundary: Changes to an aggregate are saved atomically

The Repository-Aggregate Relationship:

Each aggregate root gets its own repository. You never create repositories for entities or value objects that aren't aggregate roots.

Correct: Aggregate Root Repositories

•IOrderRepository — Order is the aggregate root
•ICustomerRepository — Customer is the aggregate root
•IProductCatalogRepository — Product is the aggregate root
•IShoppingCartRepository — Cart is the aggregate root
•Each repository loads/saves the entire aggregate

Incorrect: Non-Aggregate Repositories

•~~IOrderLineItemRepository~~ — Part of Order aggregate
•~~IAddressRepository~~ — Part of Customer/Order aggregate
•~~IMoneyRepository~~ — Money is a value object
•~~ICartItemRepository~~ — Part of Cart aggregate
•These would break aggregate consistency guarantees

Why This Matters:

If you created an IOrderLineItemRepository, you could modify line items without going through the Order aggregate root. This bypasses the Order's invariant checks:

// WRONG: Directly modifying a child entity
var lineItem = lineItemRepository.GetById(itemId);
lineItem.Quantity = 100;  // What if Order has a max items rule?
lineItemRepository.Save(lineItem);  // Order's invariants are bypassed!

// CORRECT: Going through the aggregate root
var order = orderRepository.GetById(orderId);
order.UpdateLineItemQuantity(itemId, 100);  // Order can enforce its rules
orderRepository.Save(order);  // Entire aggregate is saved consistently

The repository ensures that all access to the aggregate goes through the root, maintaining consistency invariants.

Aggregate Loading Strategies

Benefits of the Collection Abstraction

The Repository pattern, properly implemented as a collection abstraction, delivers substantial benefits across multiple dimensions of software quality.

Key Benefits

•Testability — Domain logic can be tested with in-memory repository implementations. No database needed for unit tests. Tests run in milliseconds instead of seconds. This alone justifies the pattern for most teams.
•Technology Independence — Switch from SQL Server to PostgreSQL, from a relational DB to a document store, from a local database to a cloud service—all without changing domain code. The repository interface remains stable.
•Centralized Data Access Logic — All queries for an aggregate live in one place. Need to add caching? Change it in the repository. Need to add audit logging? Change it in the repository. Need to optimize a query? You know exactly where to look.
•Clear Architectural Boundaries — The repository interface forms a clean boundary between domain and infrastructure. This boundary is visible in the code, enforceable by dependency direction, and documentable for the team.
•Domain Model Purity — Domain objects remain free of persistence concerns. No [Column] attributes, no lazy-loading proxies visible in domain code, no serialization worries. Pure domain logic.
•Improved Maintainability — When persistence needs change, changes are localized to repository implementations. When domain needs change, repository interfaces evolve clearly. The separation of concerns pays long-term dividends.

The Testing Multiplier

Summary: Repository as Collection

We've established the foundational understanding of the Repository pattern as a collection abstraction. Let's consolidate the key concepts:

Key Takeaways

•Collection Metaphor — A Repository presents persistent data as if it were an in-memory collection, supporting add, remove, find, and get operations with familiar semantics.
•Persistence Ignorance — Domain objects remain pure, unaware of how they're stored. All persistence complexity is encapsulated in the repository implementation.
•Aggregate-Centric Design — Repositories are created for aggregate roots only, ensuring consistency boundaries are respected and invariants maintained.
•Query Abstraction — Domain-specific query methods use business terminology, hiding SQL and query language details from domain code.
•Identity Mapping — Within a unit of work, the same entity ID always returns the same object instance, preserving reference semantics.
•Implementation Hiding — Connection management, caching, optimization, and error handling are all encapsulated, invisible to calling code.

What's Next:

Page Complete

1 / 4