Loading learning content...
Imagine writing business logic that manipulates domain objects as if they were simply sitting in memory—adding customers to a list, removing orders, querying products by category—without ever writing a SQL statement, constructing a database connection, or worrying about transactional boundaries. The Repository pattern makes this illusion possible.
At its core, the Repository pattern is one of the most elegant abstractions in software design: it presents the persistence layer as an in-memory collection. Your domain code interacts with what appears to be a simple collection interface, while behind the scenes, the repository translates every operation into the appropriate persistence mechanism—be it relational databases, document stores, message queues, or external APIs.
By the end of this page, you will deeply understand the Repository pattern as a collection abstraction. You'll learn its origins, core principles, the problems it solves, and how it enables clean separation between your domain logic and persistence concerns. This knowledge forms the foundation for implementing robust, testable, and maintainable data access layers.
The Repository pattern was formalized by Martin Fowler in his seminal work Patterns of Enterprise Application Architecture (2002) and further elaborated by Eric Evans in Domain-Driven Design (2003). Both authors recognized a fundamental tension in enterprise software: domain logic should remain pure and focused on business rules, not polluted by data access concerns.
Fowler's Definition:
"A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects."
Evans' Perspective (DDD):
Evans extended this concept within Domain-Driven Design, emphasizing that repositories should be designed for Aggregate Roots only. Each aggregate—the consistency boundary in your domain—should have at most one repository. The repository becomes the gateway through which aggregates are retrieved and persisted, ensuring consistency invariants are always maintained.
The key insight is the collection metaphor. When you add an object to a collection in memory (e.g., list.add(customer)), you don't think about serialization, storage locations, or retrieval mechanisms. The Repository pattern aims to preserve this simplicity for persistent data. You add(order) to a repository, and the storage details are handled transparently.
Historical Context:
Before the Repository pattern became widely adopted, data access code was often scattered throughout business logic. Developers would construct SQL queries inline, mix transaction management with domain operations, and tightly couple their code to specific database technologies. This created several problems:
The Repository pattern emerged as a solution to all these problems by introducing a clean abstraction layer.
Understanding the Repository pattern deeply requires grasping its fundamental principles. These aren't arbitrary design choices—they're carefully considered guidelines that ensure the pattern delivers its intended benefits.
OrderLineItem—you have one for Order, which contains its line items. The aggregate is loaded and saved as a unit.add and remove don't necessarily persist immediately—they might be deferred until a Unit of Work commits the transaction.A Repository is not just a DAO (Data Access Object) with a different name. While DAOs are typically table-centric and expose CRUD operations for a single database table, Repositories are aggregate-centric and hide all persistence details. A Repository might interact with multiple tables, external services, or even multiple databases—but its interface never reveals this complexity.
The power of the Repository pattern lies in its interface design. A well-designed repository interface should feel like a natural extension of working with in-memory collections, while providing the operations your domain actually needs.
The Minimal Collection Interface:
At its simplest, a repository supports four fundamental operations that mirror standard collection behavior:
1234567891011121314151617181920212223242526272829303132333435363738394041
/// <summary>/// The minimal repository interface - pure collection semantics./// Notice: no database terminology, no SQL, no connection management./// </summary>public interface IRepository<T, TId> where T : class{ /// <summary> /// Retrieves an entity by its unique identifier. /// Returns null if not found (or throws, depending on design choice). /// This mirrors: var item = collection.FirstOrDefault(x => x.Id == id); /// </summary> T? GetById(TId id); /// <summary> /// Retrieves all entities of this type. /// Use with caution - can be expensive for large datasets. /// This mirrors: collection.ToList(); /// </summary> IEnumerable<T> GetAll(); /// <summary> /// Adds a new entity to the repository. /// The entity will be persisted when the unit of work commits. /// This mirrors: collection.Add(item); /// </summary> void Add(T entity); /// <summary> /// Removes an entity from the repository. /// The deletion will occur when the unit of work commits. /// This mirrors: collection.Remove(item); /// </summary> void Remove(T entity); /// <summary> /// Updates an existing entity. /// Note: In true collection semantics, this isn't needed since objects /// are references. But for persistence, explicit update tracking helps. /// </summary> void Update(T entity);}Key Interface Design Decisions:
1. Return Types for Finding Entities:
Should GetById return null/undefined when an entity isn't found, or should it throw an exception? Both approaches are valid, but they communicate different intents:
Many repositories provide both: GetById (returns null) and GetByIdOrThrow (throws EntityNotFoundException).
2. Synchronous vs. Asynchronous Operations:
In modern applications, especially those handling I/O, asynchronous operations are preferable. The examples above show synchronous interfaces for clarity, but production repositories should typically be async:
Task<T?> GetByIdAsync(TId id, CancellationToken ct = default);
Task AddAsync(T entity, CancellationToken ct = default);
3. The Update Method Debate:
In pure collection semantics, there's no Update method—you modify an object in place, and since it's a reference, the collection reflects the change. However, with persistence:
The Update method exists for clarity and to support various persistence strategies.
While the basic collection operations form the foundation, real-world repositories need domain-specific query methods. These methods express business requirements in terms the domain understands, not database queries the persistence layer requires.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
/// <summary>/// A domain-specific repository for Order aggregates./// Notice how query methods use business terminology, not database terminology./// </summary>public interface IOrderRepository{ // Basic collection operations (inherited or defined here) Order? GetById(OrderId id); void Add(Order order); void Remove(Order order); // Domain-specific queries - these express business requirements /// <summary> /// Finds orders that are ready for fulfillment (paid, stock verified). /// The repository knows what "ready for fulfillment" means in terms of data. /// </summary> IEnumerable<Order> FindOrdersReadyForFulfillment(); /// <summary> /// Finds orders for a specific customer. /// </summary> IEnumerable<Order> FindByCustomer(CustomerId customerId); /// <summary> /// Finds orders placed within a date range. /// Useful for reporting and analytics. /// </summary> IEnumerable<Order> FindByDateRange(DateTime start, DateTime end); /// <summary> /// Finds orders that require attention - overdue, flagged, etc. /// Business logic defines "requires attention", repository implements it. /// </summary> IEnumerable<Order> FindOrdersRequiringAttention(); /// <summary> /// Gets the next order number for sequencing. /// Some domain operations require repository involvement. /// </summary> OrderNumber GetNextOrderNumber(); /// <summary> /// Checks if a duplicate order exists (idempotency check). /// </summary> bool ExistsWithIdempotencyKey(string idempotencyKey);}Notice the naming pattern: methods that return multiple entities typically start with Find (C#) or findBy (Java), while methods returning a single entity use Get or findById. This convention helps callers understand what to expect. Also, method names use domain terminology (FindOrdersReadyForFulfillment) rather than implementation details (FindByStatusPaidAndStockVerified).
The Query Method Explosion Problem:
As business requirements grow, you might be tempted to add more and more query methods:
IEnumerable<Order> FindByCustomerAndStatus(CustomerId customerId, OrderStatus status);
IEnumerable<Order> FindByCustomerAndDateRange(CustomerId customerId, DateTime start, DateTime end);
IEnumerable<Order> FindByCustomerAndStatusAndDateRange(...);
// This quickly becomes unmanageable!
Solutions to Query Method Explosion:
We'll explore these patterns in detail later in this course.
The true power of the Repository pattern becomes apparent when we examine what it encapsulates—all the complexities that your domain code never needs to know about.
| Concern | What Domain Code Sees | What Repository Handles Internally |
|---|---|---|
| Database Connection | Nothing—just calls methods | Connection pooling, connection strings, timeout handling |
| Query Language | Typed method calls | SQL, LINQ, HQL, Cypher, MongoDB queries, etc. |
| Object-Relational Mapping | Pure domain objects | Column mappings, type conversions, relationship loading |
| Transaction Management | A unit of work boundary | BEGIN/COMMIT/ROLLBACK, isolation levels, deadlock handling |
| Caching | Fast responses | First/second-level caches, cache invalidation strategy |
| Performance Optimization | Consistent behavior | Query optimization, indexing hints, batch operations |
| Data Source Location | Just the data | Local DB, remote service, read replica routing, sharding |
| Error Handling | Domain exceptions | Constraint violations, deadlocks, timeout retries |
Concrete Example: Hiding Database Complexity
Consider what happens behind a simple GetById call:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// What the domain code sees:var order = orderRepository.GetById(orderId); // What actually happens inside the repository implementation:public class SqlOrderRepository : IOrderRepository{ private readonly DbContext _context; private readonly ILogger<SqlOrderRepository> _logger; private readonly ICacheProvider _cache; public Order? GetById(OrderId id) { // 1. Check first-level cache (identity map) var cached = _context.ChangeTracker .Entries<Order>() .FirstOrDefault(e => e.Entity.Id == id)?.Entity; if (cached != null) return cached; // 2. Check second-level distributed cache var cacheKey = $"order:{id.Value}"; var fromCache = _cache.Get<OrderCacheDto>(cacheKey); if (fromCache != null) { var order = MapToDomain(fromCache); _context.Attach(order); return order; } // 3. Query database with optimized includes var dbOrder = _context.Orders .Include(o => o.LineItems) .ThenInclude(li => li.Product) .Include(o => o.ShippingAddress) .Include(o => o.BillingAddress) .AsSplitQuery() // Optimize for multiple includes .FirstOrDefault(o => o.Id == id); if (dbOrder == null) { _logger.LogDebug("Order {OrderId} not found", id); return null; } // 4. Populate cache for next request _cache.Set(cacheKey, MapToCache(dbOrder), TimeSpan.FromMinutes(5)); // 5. Return domain object (not DB entity) return dbOrder; }}The domain code simply calls GetById(orderId) and receives an Order. It never knows about the cache checks, eager loading strategy, split queries, or logging. If tomorrow you need to add read replica routing or switch to a different ORM, the domain code doesn't change at all.
A critical aspect of the collection abstraction is identity preservation: if you retrieve the same entity twice within a scope, you should get the same object instance. This is what in-memory collections do naturally—and repositories should preserve this behavior.
The Problem Without Identity Mapping:
12345678910111213141516
// Without identity mapping, this causes problems:var order1 = orderRepository.GetById(orderId);var order2 = orderRepository.GetById(orderId); order1.AddLineItem(newItem); // If order1 and order2 are different instances:Console.WriteLine(order1.LineItems.Count); // 3Console.WriteLine(order2.LineItems.Count); // 2 ← WRONG! // The change to order1 isn't reflected in order2 because// they're separate instances representing the same data! // Even worse:orderRepository.Save(order1); // Saves with 3 itemsorderRepository.Save(order2); // Overwrites with 2 items! Bug!The Identity Map Solution:
The repository (or the underlying ORM) maintains an Identity Map—a dictionary that tracks which entity instances have been loaded for each unique identifier. When you request an entity that's already loaded, you get the same instance back.
12345678910111213141516
// With identity mapping (what repositories should provide):var order1 = orderRepository.GetById(orderId);var order2 = orderRepository.GetById(orderId); // Both variables reference the same instanceConsole.WriteLine(ReferenceEquals(order1, order2)); // true order1.AddLineItem(newItem); // Now both see the change:Console.WriteLine(order1.LineItems.Count); // 3Console.WriteLine(order2.LineItems.Count); // 3 ← Correct! // Entity Framework and Hibernate provide this automatically via their// DbContext/Session tracking. The repository doesn't re-query the database// for an entity that's already loaded in the current unit of work.The Identity Map is typically scoped to a Unit of Work or database session/context. Once the unit of work ends, the identity map is cleared. If you load the same entity in a new unit of work, you get a new instance. This scope boundary is crucial for proper transaction management and preventing memory leaks in long-running processes.
In Domain-Driven Design, repositories are intimately connected to aggregates. Understanding this relationship is essential for designing proper repository interfaces.
What is an Aggregate?
An aggregate is a cluster of domain objects that are treated as a single unit for data changes. It has:
The Repository-Aggregate Relationship:
Each aggregate root gets its own repository. You never create repositories for entities or value objects that aren't aggregate roots.
IOrderRepository — Order is the aggregate rootICustomerRepository — Customer is the aggregate rootIProductCatalogRepository — Product is the aggregate rootIShoppingCartRepository — Cart is the aggregate rootIOrderLineItemRepositoryIAddressRepositoryIMoneyRepositoryICartItemRepositoryWhy This Matters:
If you created an IOrderLineItemRepository, you could modify line items without going through the Order aggregate root. This bypasses the Order's invariant checks:
// WRONG: Directly modifying a child entity
var lineItem = lineItemRepository.GetById(itemId);
lineItem.Quantity = 100; // What if Order has a max items rule?
lineItemRepository.Save(lineItem); // Order's invariants are bypassed!
// CORRECT: Going through the aggregate root
var order = orderRepository.GetById(orderId);
order.UpdateLineItemQuantity(itemId, 100); // Order can enforce its rules
orderRepository.Save(order); // Entire aggregate is saved consistently
The repository ensures that all access to the aggregate goes through the root, maintaining consistency invariants.
When loading an aggregate, you typically load the entire aggregate graph. For an Order, this means the order header, all line items, addresses, and any other parts of the aggregate. The repository decides how to load efficiently (eager loading, split queries, etc.), but it always returns the complete aggregate, never a partial one.
The Repository pattern, properly implemented as a collection abstraction, delivers substantial benefits across multiple dimensions of software quality.
[Column] attributes, no lazy-loading proxies visible in domain code, no serialization worries. Pure domain logic.Teams that adopt the Repository pattern consistently report that the testability benefit alone transforms their development workflow. When you can test complex domain scenarios in under a second without database setup, you write more tests, catch more bugs, and refactor with confidence. This is perhaps the pattern's most impactful benefit in practice.
We've established the foundational understanding of the Repository pattern as a collection abstraction. Let's consolidate the key concepts:
What's Next:
Now that we understand the conceptual foundation of repositories as collections, we'll explore a critical design decision: Generic vs. Specific Repositories. Should you create one generic IRepository<T> that works for all entities, or specific interfaces like IOrderRepository for each aggregate? Both approaches have advocates, and the right choice depends on your context.
You now understand the Repository pattern as a collection abstraction—its origins, principles, interface design, and benefits. This conceptual foundation is essential for the implementation decisions we'll explore in the following pages.