Loading learning content...
The most powerful insight about the Repository pattern is understanding it as a collection. Not a service. Not a data access layer. A collection—like the List<T>, Set<T>, or Map<K, V> you use every day in your code.
When you work with an in-memory collection, you never think about file I/O, serialization formats, or storage locations. You simply add items, remove items, and query for items. The collection handles the rest. A repository offers the exact same experience, but the 'rest' it handles includes database persistence, caching, and data mapping.
This metaphor isn't just conceptual elegance—it has profound practical implications for how you design repository interfaces, what methods you expose, and how client code interacts with your domain's persistent objects.
This page examines the collection abstraction in depth. You'll understand which collection operations map to repositories, why certain operations are absent, how to think about repository state, and how this mental model simplifies both implementation and testing.
Consider how you interact with a standard in-memory collection:
123456789101112131415161718192021
// In-memory collection behaviorconst orders = new Map<string, Order>(); // Adding an item - it's now part of the collectionorders.set(order.id, order); // Retrieving an item - it's returned from the collectionconst retrieved = orders.get(orderId); // Removing an item - it's no longer part of the collectionorders.delete(orderId); // Querying the collectionconst customerOrders = Array.from(orders.values()) .filter(o => o.customerId === customerId); // Modifying a retrieved object - changes are in the collectionretrieved.cancel(); // The order in the map is now cancelled // Note: No "save" or "persist" operation needed// Changes to objects in the collection are just... thereA repository mirrors this behavior exactly. From the domain code's perspective, the repository is this collection—an infinitely large set containing every aggregate of its type that has ever been created. The fact that this "collection" is actually backed by a PostgreSQL database, MongoDB cluster, or distributed cache is invisible.
This creates a powerful abstraction: domain code can pretend that all aggregates are always in memory, available for immediate access and modification. The repository maintains this illusion.
| Collection Operation | Repository Equivalent | Database Effect |
|---|---|---|
collection.add(item) | repository.add(aggregate) | INSERT (or queue for INSERT) |
collection.get(key) | repository.findById(id) | SELECT with WHERE clause |
collection.remove(item) | repository.remove(aggregate) | DELETE (or queue for DELETE) |
collection.filter(predicate) | repository.findByXxx(criteria) | SELECT with filtering |
item.mutate() | Same—mutate retrieved aggregate | UPDATE (tracked by Unit of Work) |
collection.contains(item) | repository.exists(id) | SELECT EXISTS or COUNT |
Notice that save() or update() is missing from the collection metaphor. You don't call list.save() after modifying an object in a list—the modification is immediate. Similarly, a pure repository pattern (with Unit of Work) doesn't require explicit save calls. Changes to retrieved aggregates are tracked and persisted automatically at the end of the unit of work.
A repository exposes a specific set of collection-like operations. Understanding each operation's semantics is essential for correct usage and implementation.
Add: Inserting a New Aggregate
The add() operation places a new aggregate into the collection. After this call, the aggregate is considered part of the repository's conceptual set.
Critical semantics:
12345678910111213141516171819202122
interface OrderRepository { add(order: Order): Promise<void>;} // Usageclass OrderService { async placeOrder(customerId: CustomerId, items: OrderItem[]): Promise<Order> { // Create a new aggregate const order = Order.create(customerId, items); // Add to the collection - it's now part of the repository await this.orderRepository.add(order); // At this point, logically, findById(order.id) would return this order // (though actual persistence may be deferred) return order; }} // Implementation note: add() should NOT accept existing aggregates// Trying to add an already-existing aggregate is an errorFind By Identity: The Primary Retrieval
The findById() operation is the most fundamental query. It retrieves an aggregate by its unique identity, returning the fully reconstituted aggregate or null if not found.
Critical semantics:
12345678910111213141516171819202122232425
interface OrderRepository { findById(orderId: OrderId): Promise<Order | null>;} // Usageclass OrderService { async cancelOrder(orderId: OrderId): Promise<void> { // Retrieve the aggregate const order = await this.orderRepository.findById(orderId); if (!order) { throw new OrderNotFoundException(orderId); } // Modify the aggregate - changes are tracked order.cancel(); // No need to call save() - the Unit of Work will // detect changes and persist them }} // Implementation note: identity map pattern ensures that// multiple findById() calls in the same transaction return// the same object instance, preserving reference equalityRemove: Deleting an Aggregate
The remove() operation removes an aggregate from the collection. After this call, the aggregate is no longer part of the repository's conceptual set.
Critical semantics:
1234567891011121314151617181920212223242526272829
interface OrderRepository { remove(order: Order): Promise<void>;} // Usageclass OrderService { async deleteOrder(orderId: OrderId): Promise<void> { // First retrieve the aggregate const order = await this.orderRepository.findById(orderId); if (!order) { throw new OrderNotFoundException(orderId); } // Verify business rules allow deletion if (!order.canBeDeleted()) { throw new OrderCannotBeDeletedError(orderId); } // Remove from the collection await this.orderRepository.remove(order); // At this point, the order is no longer part of the collection // The actual DELETE happens at transaction commit }} // Note: Some implementations use removeById(orderId) for convenience// when you don't need the aggregate for business logic validationBeyond basic add/find/remove, repositories provide domain-meaningful query operations. These operations filter the conceptual collection based on criteria that make sense in the domain.
The key principle: Query methods should express domain concepts, not database concepts.
findPendingOrders()findByCustomer(customerId)findOrdersPlacedBetween(start, end)findOverdueOrders()findHighValueOrders(threshold)findByStatusEquals('PENDING')findByCustomerIdIn([...])findByCreatedAtBetween(a, b)findByDueDateBeforeAndStatusIn(...)findByTotalGreaterThan(amount)Frameworks like Spring Data JPA encourage method names that mirror SQL (findByFieldNameOperator). While convenient, this leaks database concerns into the domain. Prefer explicit domain-named methods even if they're less 'magical.' The implementation can still use query derivation internally.
123456789101112131415161718192021222324
interface OrderRepository { // Identity-based retrieval findById(orderId: OrderId): Promise<Order | null>; // Domain-meaningful queries findByCustomer(customerId: CustomerId): Promise<Order[]>; findPendingOrders(): Promise<Order[]>; findOrdersAwaitingShipment(): Promise<Order[]>; findOrdersPlacedBetween(start: Date, end: Date): Promise<Order[]>; // Existence checks exists(orderId: OrderId): Promise<boolean>; hasCustomerPlacedOrderBefore(customerId: CustomerId): Promise<boolean>; // Counting (for domain purposes, not pagination) countPendingOrders(): Promise<number>; // Collection operations add(order: Order): Promise<void>; remove(order: Order): Promise<void>;} // Note: Each finder method returns complete aggregates// This ensures invariants are maintained when working with resultsPagination and Sorting Considerations
Pagination and sorting are often needed but create tension with the collection metaphor. A finite page of results doesn't feel like a collection.
Recommendations:
Keep repositories aggregate-focused — Repositories retrieve aggregates for command processing, not for display.
Use CQRS for read-heavy use cases — If you need paginated lists for UI display, create separate read models/projections.
If pagination is unavoidable, make it explicit with domain-meaningful semantics:
1234567891011121314151617
// If you must have pagination in a repository, make it domain-meaningfulinterface OrderRepository { // Returns oldest pending orders (for processing queue) findOldestPendingOrders(limit: number): Promise<Order[]>; // Returns recent orders for a customer (for recent activity) findRecentOrdersByCustomer( customerId: CustomerId, limit: number ): Promise<Order[]>; // For paging through large result sets (use sparingly) findAllOrdersPaged(page: PageRequest): Promise<PagedResult<Order>>;} // But consider: if you're paginating heavily, you might need CQRS// with separate read models optimized for queryingFor the collection metaphor to hold, retrieving the same aggregate twice must return the same object instance. This is the Identity Map pattern, and it's essential for repository correctness.
Why identity consistency matters:
Consider two service methods that both retrieve the same order:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
// Without identity map - BROKENclass BrokenOrderService { async processOrder(orderId: OrderId) { const order1 = await this.orderRepository.findById(orderId); order1.approve(); // Sets status to APPROVED // Some intermediate processing... await this.processPayment(orderId); } async processPayment(orderId: OrderId) { const order2 = await this.orderRepository.findById(orderId); // order2 is a DIFFERENT INSTANCE than order1! // order2.status is still PENDING, not APPROVED // Changes made to order1 are invisible to order2 order2.markAsPaid(); // Now we have conflicting states // When we commit, which version wins? order1 or order2? // This leads to data loss or corruption }} // With identity map - CORRECTclass CorrectOrderService { async processOrder(orderId: OrderId) { const order1 = await this.orderRepository.findById(orderId); order1.approve(); await this.processPayment(orderId); } async processPayment(orderId: OrderId) { const order2 = await this.orderRepository.findById(orderId); // order2 IS THE SAME INSTANCE as order1! // order2.status is APPROVED // All changes are visible because it's the same object order2.markAsPaid(); // Consistent modification // At commit, there's one object with all changes }}Most ORMs (Entity Framework, Hibernate, Prisma) implement identity maps automatically within a database context/session. This is one reason repositories typically wrap ORM operations rather than raw SQL—the ORM handles identity map bookkeeping.
Identity Map Implementation
An identity map is essentially a Map<Identity, Aggregate> that tracks all loaded aggregates. When findById() is called:
The identity map is typically scoped to a Unit of Work (transaction/request).
123456789101112131415161718192021222324252627282930313233343536373839
class OrderRepositoryImpl implements OrderRepository { // Identity map - scoped to this unit of work private identityMap = new Map<string, Order>(); async findById(orderId: OrderId): Promise<Order | null> { const key = orderId.value; // Check identity map first if (this.identityMap.has(key)) { return this.identityMap.get(key)!; } // Not cached - load from database const data = await this.database.query( 'SELECT * FROM orders WHERE id = ?', [key] ); if (!data) return null; // Reconstitute the aggregate const order = this.mapper.toDomain(data); // Store in identity map this.identityMap.set(key, order); return order; } async add(order: Order): Promise<void> { // Add to identity map immediately this.identityMap.set(order.id.value, order); // Mark for insertion in Unit of Work this.unitOfWork.registerNew(order); } // At unit of work commit, persist all tracked changes}One of the most counterintuitive aspects of pure repository design is the absence of a save() or update() method. This absence is intentional and flows directly from the collection metaphor.
Consider an in-memory collection:
123456789101112131415
// In-memory collection behaviorconst orders = new Map<string, Order>(); // Add an orderorders.set(orderId, order); // Retrieve and modifyconst retrieved = orders.get(orderId);retrieved.approve(); // Change is immediate and visible // No "save" needed - the map already has the modified object// orders.get(orderId).status === 'APPROVED' // You don't call:// orders.save(retrieved); // This method doesn't exist!If the repository is a collection, the same should apply.
When you retrieve an aggregate, modify it, and the unit of work completes, changes are persisted. The repository tracks which aggregates have been retrieved (via identity map), and the Unit of Work pattern detects changes to those aggregates (via dirty tracking or explicit registration).
However, pragmatic considerations:
Many real-world codebases include a save() method on repositories for several reasons:
123456789101112131415161718192021222324
// Pragmatic approach: include save() for clarityinterface OrderRepository { findById(orderId: OrderId): Promise<Order | null>; add(order: Order): Promise<void>; save(order: Order): Promise<void>; // Explicit update remove(order: Order): Promise<void>;} // Usageclass OrderService { async approveOrder(orderId: OrderId): Promise<void> { const order = await this.orderRepository.findById(orderId); if (!order) throw new OrderNotFoundException(orderId); order.approve(); // Explicit save makes the persistence visible await this.orderRepository.save(order); }} // Note: The "pure" DDD approach would omit save() and use// Unit of Work to detect and persist changes automatically.// The pragmatic approach includes it for clarity.Whether you include save() or use pure Unit of Work tracking, be consistent across your codebase. A mixed approach where some repositories require save() and others don't leads to confusion and bugs.
The collection abstraction shines brightest in testing. Because a repository behaves like a collection, you can implement test doubles using actual in-memory collections. No mocking frameworks, no database setup—just a HashMap.
123456789101112131415161718192021222324252627282930313233343536373839404142
// In-memory implementation for testingclass InMemoryOrderRepository implements OrderRepository { private orders = new Map<string, Order>(); async findById(orderId: OrderId): Promise<Order | null> { return this.orders.get(orderId.value) ?? null; } async findByCustomer(customerId: CustomerId): Promise<Order[]> { return Array.from(this.orders.values()) .filter(o => o.customerId.equals(customerId)); } async findPendingOrders(): Promise<Order[]> { return Array.from(this.orders.values()) .filter(o => o.status === OrderStatus.PENDING); } async add(order: Order): Promise<void> { if (this.orders.has(order.id.value)) { throw new Error('Order already exists'); } this.orders.set(order.id.value, order); } async remove(order: Order): Promise<void> { this.orders.delete(order.id.value); } // Test helper methods clear(): void { this.orders.clear(); } count(): number { return this.orders.size; } getAll(): Order[] { return Array.from(this.orders.values()); }}Now tests become trivial:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
describe('OrderService', () => { let orderRepository: InMemoryOrderRepository; let orderService: OrderService; beforeEach(() => { orderRepository = new InMemoryOrderRepository(); orderService = new OrderService(orderRepository); }); describe('placeOrder', () => { it('should create and store a new order', async () => { const customerId = CustomerId.create('customer-123'); const items = [OrderItem.create('product-1', 2, Money.of(100))]; const order = await orderService.placeOrder(customerId, items); // Verify the order exists in the repository const stored = await orderRepository.findById(order.id); expect(stored).not.toBeNull(); expect(stored!.customerId).toEqual(customerId); expect(stored!.items).toHaveLength(1); }); }); describe('cancelOrder', () => { it('should mark an existing order as cancelled', async () => { // Arrange: set up an existing order const order = Order.create( CustomerId.create('customer-123'), [OrderItem.create('product-1', 1, Money.of(50))] ); await orderRepository.add(order); // Act await orderService.cancelOrder(order.id); // Assert const cancelled = await orderRepository.findById(order.id); expect(cancelled!.status).toBe(OrderStatus.CANCELLED); }); it('should throw when order does not exist', async () => { const nonExistentId = OrderId.create('does-not-exist'); await expect(orderService.cancelOrder(nonExistentId)) .rejects.toThrow(OrderNotFoundException); }); });}); // No database setup, no cleanup, no mocking frameworks// Tests are fast, deterministic, and focused on domain logicIn-memory repository tests run in milliseconds, not seconds. They never fail due to database connection issues. They don't require Docker, migrations, or test data setup. This is the testing power of proper abstractions.
The collection metaphor is the conceptual foundation of the Repository pattern. Let's consolidate what we've learned:
findPendingOrders(), not findByStatusEquals('PENDING').What's next:
We've established how to think about repositories conceptually. The next page dives into the practical details of Repository Interface Design—how to craft repository interfaces that are expressive, clean, and aligned with domain needs.
You now understand the collection abstraction that underlies the Repository pattern. This mental model will guide every repository you design—think collection first, infrastructure second.