System Design (HLD)Data Ownership

Data Ownership in Microservices

LevelAdvanced

Duration90 mins

TopicData Ownership

4 / 5

Query Across Services

The Distributed Join Problem

In a monolithic database, answering complex queries is straightforward. Need to find all orders from customers in California with products in a specific category? A single SQL JOIN across three tables returns the answer in milliseconds.

In microservices, this becomes fundamentally harder. The Customer Service owns customer data (including location). The Order Service owns orders. The Catalog Service owns product categories. There is no shared database to JOIN across.

This is the distributed join problem: how do you answer queries that require data from multiple autonomous services, each with its own database, without reintroducing the tight coupling that microservices aim to eliminate?

The answer involves trade-offs between latency, consistency, complexity, and coupling. There's no free lunch—but there are proven patterns that work at scale.

What You Will Learn

By the end of this page, you will understand the fundamental patterns for cross-service queries: API composition, materialized views, CQRS, and data meshes. You'll know when to apply each pattern, their trade-offs, and how companies like Netflix and Amazon solve this at planetary scale.

Understanding the Cross-Service Query Challenge

Before exploring solutions, let's deeply understand why cross-service queries are hard. The challenge isn't just technical—it's a fundamental tension between service autonomy and query flexibility.

Why Cross-Service Queries Are Difficult

•No Shared Database — Each service has its own database. You literally cannot write a JOIN across them at the database level.
•Network Boundaries — Fetching data from another service requires network calls, adding latency and failure modes.
•Different Data Models — Services model data for their own needs. Catalog's 'product' and Order's 'line item' aren't the same structure.
•Consistency Mismatch — Service A's view might be at time T1, Service B's at T2. Joining them gives inconsistent results.
•Scalability Asymmetry — Query patterns may stress one service more than others. Catalog can't scale independently if every Order query calls it.
•Permission Boundaries — Service A might not be allowed to see all data in Service B. Cross-service queries complicate authorization.

The query taxonomy:

Not all cross-service queries are equal. Understanding the query type guides the solution:

Query Type	Example	Challenge Level
Point lookup	Get order with customer name	Low — single ID lookup
Filtered search	Find orders with product category	Medium — filter on foreign data
Aggregation	Count orders by customer region	High — aggregate across domains
Ad-hoc reporting	Complex business intelligence	Very High — arbitrary combinations

Query Complexity Drives Architecture

Simple point lookups (enriching an order with customer name) can use straightforward API calls. Complex analytical queries (revenue by customer segment by region by product line) require dedicated infrastructure. Don't over-engineer for simple queries or under-engineer for complex ones.

Pattern 1: API Composition

API Composition (also called Aggregator Pattern) is the most straightforward approach: a client or gateway calls multiple services and combines their responses.

How it works:

Client needs order + customer + product data
Client calls Order Service → gets order with customerId, productIds
Client calls Customer Service → gets customer by customerId
Client calls Catalog Service → gets products by productIds
Client combines responses into unified view

api-composition-example
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
// ===================================================
// API COMPOSITION PATTERN
// ===================================================
// An API Gateway or BFF (Backend for Frontend) 
// composes data from multiple services.
// ===================================================
 
interface EnrichedOrder {
  orderId: string;
  orderDate: Date;
  status: string;
  customer: {
    id: string;
    name: string;
    email: string;
  };
  items: Array<{
    productId: string;
    productName: string;
    category: string;
    quantity: number;
    unitPrice: number;
  }>;
  totalAmount: number;
}
 
class OrderCompositionService {
  private orderClient: OrderServiceClient;
  private customerClient: CustomerServiceClient;
  private catalogClient: CatalogServiceClient;
 
  async getEnrichedOrder(orderId: string): Promise<EnrichedOrder> {
    // Step 1: Get the core order data
    const order = await this.orderClient.getOrder(orderId);
    
    // Step 2: Fetch related data in parallel
    // These calls are independent and can execute concurrently
    const [customer, products] = await Promise.all([
      this.customerClient.getCustomer(order.customerId),
      this.catalogClient.getProducts(
        order.lineItems.map(item => item.productId)
      ),
    ]);
    
    // Step 3: Compose the enriched response
    const productMap = new Map(products.map(p => [p.id, p]));
    
    return {
      orderId: order.id,
      orderDate: order.createdAt,
      status: order.status,
      customer: {
        id: customer.id,
        name: customer.name,
        email: customer.email,
      },
      items: order.lineItems.map(item => {
        const product = productMap.get(item.productId)!;
        return {
          productId: item.productId,
          productName: product.name,
          category: product.category,
          quantity: item.quantity,
          unitPrice: item.unitPrice,
        };
      }),
      totalAmount: order.totalAmount,
    };
  }
 
  async getOrdersForCustomer(customerId: string): Promise<EnrichedOrder[]> {
    // Get all orders for a customer
    const orders = await this.orderClient.getOrdersByCustomer(customerId);
    
    // Get customer info once
    const customer = await this.customerClient.getCustomer(customerId);
    
    // Get all unique products across all orders
    const allProductIds = new Set<string>();
    orders.forEach(order => {
      order.lineItems.forEach(item => allProductIds.add(item.productId));
    });
    
    const products = await this.catalogClient.getProducts([...allProductIds]);
    const productMap = new Map(products.map(p => [p.id, p]));
    
    // Compose all orders
    return orders.map(order => ({
      orderId: order.id,
      orderDate: order.createdAt,
      status: order.status,
      customer: {
        id: customer.id,
        name: customer.name,
        email: customer.email,
      },
      items: order.lineItems.map(item => ({
        productId: item.productId,
        productName: productMap.get(item.productId)?.name ?? 'Unknown',
        category: productMap.get(item.productId)?.category ?? 'Unknown',
        quantity: item.quantity,
        unitPrice: item.unitPrice,
      })),
      totalAmount: order.totalAmount,
    }));
  }
}

API Composition Strengths

•✓ Simple to understand and implement
•✓ No additional infrastructure required
•✓ Data is always current (fetched real-time)
•✓ Services remain loosely coupled
•✓ Works for any query pattern

API Composition Weaknesses

•✗ Latency is sum of all service calls
•✗ Availability is product of service availability
•✗ No cross-service filtering or sorting
•✗ Expensive for large result sets
•✗ Consistency issues (data from different times)

When to use API Composition:

Point lookups and small result sets
Latency tolerance of hundreds of milliseconds is acceptable
No need to filter or sort by data from other services
All required services have high availability

When to avoid:

Queries returning thousands of records
Need to filter by foreign service fields
Latency-critical paths (<100ms requirement)
Aggregation queries

Pattern 2: Materialized Views

Materialized Views pre-compute query results by combining data from multiple services into a queryable store. Instead of composing data at query time, you build and maintain a denormalized view optimized for specific queries.

How it works:

Define the query pattern you need to support
Create a store (database, search index) optimized for that pattern
Subscribe to events from all relevant services
Build and update the view as events arrive
Query the materialized view directly—no cross-service calls

materialized-view-example
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
// ===================================================
// MATERIALIZED VIEW PATTERN
// ===================================================
// Build a pre-computed view that supports specific
// cross-service query patterns efficiently.
// ===================================================
 
// The materialized view model - optimized for common queries
interface OrderSearchView {
  // Order fields
  orderId: string;
  orderDate: Date;
  orderStatus: string;
  totalAmount: number;
  
  // Customer fields (denormalized from Customer Service)
  customerId: string;
  customerName: string;
  customerEmail: string;
  customerRegion: string;  // For regional queries
  
  // Product fields (denormalized from Catalog Service)
  productIds: string[];
  productNames: string[];
  categories: string[];  // For category filtering
  
  // Derived fields for search
  searchText: string;  // Combined searchable text
  lastUpdated: Date;
}
 
class OrderSearchViewBuilder {
  private viewStore: OrderSearchViewRepository;  // e.g., Elasticsearch
  private orderClient: OrderServiceClient;
  private customerClient: CustomerServiceClient;
  private catalogClient: CatalogServiceClient;
 
  // ==========================================
  // EVENT HANDLERS - Build view incrementally
  // ==========================================
 
  async handleOrderCreated(event: OrderCreatedEvent): Promise<void> {
    // Fetch related data to build the view
    const [customer, products] = await Promise.all([
      this.customerClient.getCustomer(event.data.customerId),
      this.catalogClient.getProducts(
        event.data.lineItems.map(i => i.productId)
      ),
    ]);
 
    const view: OrderSearchView = {
      orderId: event.aggregateId,
      orderDate: event.data.orderDate,
      orderStatus: event.data.status,
      totalAmount: event.data.totalAmount,
      
      customerId: customer.id,
      customerName: customer.name,
      customerEmail: customer.email,
      customerRegion: customer.region,
      
      productIds: products.map(p => p.id),
      productNames: products.map(p => p.name),
      categories: [...new Set(products.map(p => p.category))],
      
      searchText: this.buildSearchText(event.data, customer, products),
      lastUpdated: new Date(),
    };
 
    await this.viewStore.index(view);
  }
 
  async handleOrderStatusUpdated(event: OrderStatusUpdatedEvent): Promise<void> {
    // Partial update - only change the status field
    await this.viewStore.updatePartial(event.aggregateId, {
      orderStatus: event.data.newStatus,
      lastUpdated: new Date(),
    });
  }
 
  async handleCustomerUpdated(event: CustomerUpdatedEvent): Promise<void> {
    // Update all orders for this customer
    // This is the trade-off: fan-out updates when source data changes
    const ordersForCustomer = await this.viewStore.findByCustomerId(
      event.aggregateId
    );
 
    for (const order of ordersForCustomer) {
      await this.viewStore.updatePartial(order.orderId, {
        customerName: event.data.name,
        customerEmail: event.data.email,
        customerRegion: event.data.region ?? order.customerRegion,
        lastUpdated: new Date(),
      });
    }
  }
 
  async handleProductUpdated(event: ProductUpdatedEvent): Promise<void> {
    // Update all orders containing this product
    const ordersWithProduct = await this.viewStore.findByProductId(
      event.aggregateId
    );
 
    for (const order of ordersWithProduct) {
      // Refetch all products for this order to rebuild product fields
      const products = await this.catalogClient.getProducts(order.productIds);
      
      await this.viewStore.updatePartial(order.orderId, {
        productNames: products.map(p => p.name),
        categories: [...new Set(products.map(p => p.category))],
        lastUpdated: new Date(),
      });
    }
  }
 
  private buildSearchText(order: any, customer: any, products: any[]): string {
    return [
      order.id,
      customer.name,
      customer.email,
      ...products.map(p => p.name),
      ...products.map(p => p.category),
    ].join(' ').toLowerCase();
  }
}
 
// ==========================================
// QUERY SERVICE - Uses the materialized view
// ==========================================
 
class OrderSearchService {
  private viewStore: OrderSearchViewRepository;
 
  async searchOrders(query: OrderSearchQuery): Promise<OrderSearchResult> {
    // All filtering happens on the materialized view
    // No cross-service calls at query time!
    
    const filters: any = {};
    
    if (query.customerRegion) {
      filters.customerRegion = query.customerRegion;
    }
    
    if (query.categories?.length) {
      filters.categories = { $in: query.categories };
    }
    
    if (query.dateRange) {
      filters.orderDate = {
        $gte: query.dateRange.start,
        $lte: query.dateRange.end,
      };
    }
    
    if (query.searchText) {
      filters.searchText = { $contains: query.searchText.toLowerCase() };
    }
 
    const results = await this.viewStore.search({
      filters,
      sort: query.sortBy ?? 'orderDate',
      sortOrder: query.sortOrder ?? 'desc',
      limit: query.limit ?? 50,
      offset: query.offset ?? 0,
    });
 
    return {
      orders: results.hits,
      total: results.totalCount,
      hasMore: results.totalCount > (query.offset ?? 0) + results.hits.length,
    };
  }
 
  // Complex queries are now fast!
  async getRevenueByRegion(): Promise<Map<string, number>> {
    return await this.viewStore.aggregate({
      groupBy: 'customerRegion',
      sum: 'totalAmount',
    });
  }
}

Trade-offs of Materialized Views:

Aspect	Benefit	Cost
Query latency	Fast—single datastore	N/A
Query flexibility	Optimized for defined patterns	Can't support arbitrary queries
Data freshness	Eventual consistency	Staleness during sync
Write fan-out	N/A	One source change updates many views
Storage	N/A	Stores data redundantly
Operational	N/A	Additional system to maintain

Use Elasticsearch or Similar

Elasticsearch (or OpenSearch, Solr) is commonly used for materialized views because it supports full-text search, filtering, sorting, and aggregations—exactly what cross-service queries need. Consider it for any significant materialized view use case.

Pattern 3: CQRS (Command Query Responsibility Segregation)

CQRS takes materialized views to the architectural level. Instead of a single model for reads and writes, you maintain separate read and write models, each optimized for its purpose.

The core insight: Commands (writes) have different requirements than Queries (reads). Commands need strong consistency, validation, and business rules. Queries need speed, flexibility, and often span domains. Separating them allows optimization of each.

CQRS Architecture:

                     ┌─────────────────────────┐
                     │      API Gateway         │
                     └────────────┬─────────────┘
                                  │
            ┌─────────────────────┴─────────────────────┐
            │                                           │
    ┌───────▼───────┐                          ┌───────▼───────┐
    │   Commands     │                          │    Queries    │
    └───────┬────────┘                          └───────┬───────┘
            │                                           │
    ┌───────▼────────┐                         ┌───────▼───────┐
    │   Write Model   │──── Events ────────────▶│  Read Model   │
    │ (Domain Services)│                        │ (Query Services)│
    │                  │                        │                │
    │  ┌─────────────┐ │                        │ ┌─────────────┐ │
    │  │ Customer DB │ │                        │ │ Order Search│ │
    │  │ Order DB    │ │                        │ │ (Elastic)   │ │
    │  │ Catalog DB  │ │                        │ │             │ │
    │  └─────────────┘ │                        │ │ Analytics   │ │
    │                  │                        │ │ (ClickHouse)│ │
    └──────────────────┘                        │ └─────────────┘ │
                                                └─────────────────┘

How CQRS solves cross-service queries:

Write side remains service-oriented: Customer Service handles customer commands, Order Service handles order commands
Read side is query-oriented: build views that span multiple domains, optimized for specific use cases
Events connect them: write-side services publish events; read-side services consume and build views
Multiple read models: you can have different read models for different query patterns (search, analytics, reporting)

CQRS Read Models for Different Use Cases
Read Model	Technology	Use Case	Data Sources
Order Search	Elasticsearch	Full-text search, filtering	Orders, Customers, Catalog
Customer 360	PostgreSQL	Customer profile with history	Customers, Orders, Support
Analytics Cube	ClickHouse	Business intelligence	All domain events
Real-time Dashboard	Redis	Live metrics	Aggregated events
Recommendation	Neo4j	Graph-based recommendations	Purchases, Products, Preferences

CQRS Adds Significant Complexity

CQRS is powerful but complex. You're maintaining multiple models, handling eventual consistency, and building event-synchronization infrastructure. Only adopt CQRS if your query requirements genuinely can't be met with simpler patterns.

When to use CQRS:

Read and write patterns are significantly different
High-volume reads that can't afford cross-service calls
Complex queries spanning many domains
Need for specialized query stores (search, analytics, graphs)
Willingness to accept eventual consistency on reads

When to avoid:

Simple CRUD applications
Strong consistency requirements on reads
Small team without event-sourcing experience
Low query volume that doesn't justify complexity

Pattern 4: Data Mesh for Analytics Queries

Data Mesh is an architectural paradigm that treats data as a product, owned by domain teams. It's particularly relevant for analytical and reporting queries that span the entire organization.

The Data Mesh principle: Instead of a central data team extracting data from all services into a monolithic warehouse, each domain team publishes curated data products that other teams can consume.

Key concepts:

Domain Ownership — The Order team owns and publishes the 'Orders' data product
Data as a Product — Published data has quality guarantees, documentation, SLAs
Self-Serve Platform — Infrastructure enables teams to easily publish and consume
Federated Governance — Standards for interoperability without central bottleneck

data-product-example
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
// ===================================================
// DATA MESH: Domain Data Products
// ===================================================
// Each domain publishes curated, documented data products
// for consumption by other teams and analytics systems.
// ===================================================
 
// Orders Domain publishes this data product
interface OrdersDataProduct {
  // Metadata (discoverable in data catalog)
  productName: 'orders-v2';
  owner: 'order-team';
  description: 'Complete order data for analytics and reporting';
  sla: {
    freshness: '< 5 minutes';
    availability: '99.9%';
  };
  schema: typeof OrderDataSchema;
  
  // Available interfaces
  interfaces: {
    // Streaming for real-time consumers
    stream: {
      type: 'kafka';
      topic: 'orders.events.v2';
      format: 'avro';
    };
    
    // Batch for analytics warehouses
    batch: {
      type: 's3';
      path: 's3://data-lake/orders/daily/';
      format: 'parquet';
      partitionBy: ['date', 'region'];
    };
    
    // Query for ad-hoc analysis
    query: {
      type: 'presto';
      catalog: 'orders';
      table: 'orders_v2';
    };
  };
}
 
const OrderDataSchema = {
  orderId: 'string',
  orderDate: 'timestamp',
  status: 'string',
  customerId: 'string',
  totalAmount: 'decimal(10,2)',
  currency: 'string',
  itemCount: 'int',
  region: 'string',
  channel: 'string',  // web, mobile, api
  // Note: No PII like customer name/email
  // Consumers JOIN with Customers data product if needed
};
 
// ==========================================
// CONSUMING DATA PRODUCTS
// ==========================================
 
// Analytics team builds cross-domain reports
class RevenueReportBuilder {
  private ordersProduct: DataProductClient<OrdersDataProduct>;
  private customersProduct: DataProductClient<CustomersDataProduct>;
  private productsProduct: DataProductClient<ProductsDataProduct>;
 
  async buildRevenueBySegmentReport(): Promise<Report> {
    // Query each data product
    // The data mesh platform handles the distributed query
    const query = `
      SELECT 
        c.segment,
        c.region,
        p.category,
        SUM(o.totalAmount) as revenue,
        COUNT(DISTINCT o.orderId) as orderCount,
        COUNT(DISTINCT o.customerId) as customerCount
      FROM orders.orders_v2 o
      JOIN customers.customers_v2 c ON o.customerId = c.customerId
      JOIN products.products_v2 p ON o.primaryProductId = p.productId
      WHERE o.orderDate >= DATE_SUB(CURRENT_DATE, 30)
        AND o.status = 'completed'
      GROUP BY c.segment, c.region, p.category
      ORDER BY revenue DESC
    `;
    
    // The query engine (e.g., Presto/Trino) federates across products
    const results = await this.queryEngine.execute(query);
    
    return this.formatReport(results);
  }
}

When Data Mesh applies:

Large organizations with many domains
Significant analytical and BI requirements
Data quality and governance are critical
Teams have capacity to own data products

Trade-offs:

Requires organizational buy-in and cultural change
Each team needs data engineering capability
Platform investment for self-serve infrastructure
Governance without centralization is challenging

Data Mesh for Operational vs. Analytical

Data Mesh is primarily about analytical data—reports, BI, ML training. For operational queries (user-facing features), the other patterns (API composition, materialized views, CQRS) are usually more appropriate.

Choosing the Right Query Pattern

With multiple patterns available, how do you choose? The decision depends on query characteristics, consistency requirements, and system constraints.

Pattern Selection Matrix
Factor	API Composition	Materialized View	CQRS	Data Mesh
Query complexity	Low	Medium-High	High	Very High
Latency requirement	Tolerant (100ms+)	Strict (<50ms)	Strict	Tolerant
Consistency	Strong-ish	Eventual	Eventual	Eventual
Query volume	Low-Medium	High	Very High	Varies
Implementation cost	Low	Medium	High	Very High
Operational cost	Low	Medium	High	High
Use case	Enrichment	Search/Filter	Complex apps	Analytics

Decision flowchart:

Is this an analytical/BI query?
├── Yes → Consider Data Mesh or dedicated data warehouse
└── No → Continue...
        │
        ▼
Can you tolerate eventual consistency?
├── No → Must use API Composition (accept latency)
└── Yes → Continue...
        │
        ▼
Need to filter/sort by data from multiple services?
├── No → API Composition is sufficient
└── Yes → Continue...
        │
        ▼
How many different query patterns?
├── Few (1-3) → Materialized Views
└── Many → Consider CQRS

Combining patterns:

Real systems often use multiple patterns:

API Composition for simple enrichment (order detail page)
Materialized View for search (order search with filters)
CQRS for complex application (admin dashboard with multiple views)
Data Mesh for analytics (monthly business reports)

Each pattern addresses different needs. The goal is matching the right pattern to each use case, not picking one pattern for everything.

Start Simple, Evolve as Needed

Begin with API Composition—it's the simplest. Add materialized views when specific queries become bottlenecks. Adopt CQRS only when you have clear need for complex read models. This incremental approach avoids premature optimization.

Implementation Considerations

Regardless of which pattern you choose, several implementation concerns apply across the board.

Cross-Cutting Implementation Concerns

•Authorization — Cross-service queries complicate authorization. If User A can see their orders but not customer demographics, how do you filter enriched results? Push authorization into each service, or apply it in the aggregation layer.
•Pagination — When composing data from multiple services, pagination becomes complex. You can't paginate orders and then enrich — what if you need 100 orders but Customer Service is down? Design pagination that's resilient to partial data.
•Error Handling — What happens when one service in a composition fails? Options: partial response, cached fallback, error to user. Define degradation strategy per-service and per-use case.
•Caching — For API Composition, caching enriched responses can reduce load and latency. But cache invalidation is challenging when source data spans services. Consider short TTLs or event-based invalidation.
•Monitoring — Track query latency broken down by component. If Order Search is slow, is it the view store or event processing lag? Visibility into each piece is essential.
•Testing — Integration testing cross-service queries is difficult. Use contract tests to verify service interfaces; use synthetic tests against materialized views to verify query behavior.

Beware N+1 Queries

API Composition can easily become N+1 if you're not careful. Fetching order list, then customer per order, then products per line item... adds up fast. Always batch requests: get all customer IDs, fetch all customers in one call.

Summary: Querying Across Service Boundaries

Cross-service queries are an inevitable challenge in microservices. There's no perfect solution—only trade-offs between latency, consistency, complexity, and coupling. Understanding the patterns and when to apply each is essential for designing queryable distributed systems.

Key Takeaways

•API Composition is the simplest pattern — call services, combine responses. Use for point lookups and simple enrichment. Accept latency from sequential calls.
•Materialized Views pre-compute query results — build denormalized stores updated via events. Use for high-volume search and filtering across domains.
•CQRS separates read and write models — write side stays service-oriented, read side holds query-optimized views. Use for complex applications with diverse query needs.
•Data Mesh treats data as products — domains publish curated data for cross-organizational consumption. Use for large-scale analytics and BI.
•Choose based on query characteristics — complexity, latency, consistency, and volume drive pattern selection. Most systems use multiple patterns.
•Implementation details matter — authorization, pagination, error handling, and caching all require careful design for cross-service queries.

What's next:

With query patterns established, the final piece is ensuring data consistency even when it's distributed. The next page explores data consistency patterns—how to maintain correctness across services when traditional ACID transactions don't apply.

Page Complete

You now understand the major patterns for querying across service boundaries: API Composition, Materialized Views, CQRS, and Data Mesh. Each addresses different trade-offs between simplicity, latency, consistency, and complexity. Start simple with API Composition and evolve to more sophisticated patterns as needs demand. Next, we'll tackle data consistency patterns for distributed systems.

4 / 5

Loading learning content...

System Design (HLD)Data Ownership

Data Ownership in Microservices

LevelAdvanced

Duration90 mins

TopicData Ownership

4 / 5

Query Across Services

The Distributed Join Problem

The answer involves trade-offs between latency, consistency, complexity, and coupling. There's no free lunch—but there are proven patterns that work at scale.

What You Will Learn

Understanding the Cross-Service Query Challenge

Before exploring solutions, let's deeply understand why cross-service queries are hard. The challenge isn't just technical—it's a fundamental tension between service autonomy and query flexibility.

Why Cross-Service Queries Are Difficult

•No Shared Database — Each service has its own database. You literally cannot write a JOIN across them at the database level.
•Network Boundaries — Fetching data from another service requires network calls, adding latency and failure modes.
•Different Data Models — Services model data for their own needs. Catalog's 'product' and Order's 'line item' aren't the same structure.
•Consistency Mismatch — Service A's view might be at time T1, Service B's at T2. Joining them gives inconsistent results.
•Scalability Asymmetry — Query patterns may stress one service more than others. Catalog can't scale independently if every Order query calls it.
•Permission Boundaries — Service A might not be allowed to see all data in Service B. Cross-service queries complicate authorization.

The query taxonomy:

Not all cross-service queries are equal. Understanding the query type guides the solution:

Query Type	Example	Challenge Level
Point lookup	Get order with customer name	Low — single ID lookup
Filtered search	Find orders with product category	Medium — filter on foreign data
Aggregation	Count orders by customer region	High — aggregate across domains
Ad-hoc reporting	Complex business intelligence	Very High — arbitrary combinations

Query Complexity Drives Architecture

Pattern 1: API Composition

API Composition (also called Aggregator Pattern) is the most straightforward approach: a client or gateway calls multiple services and combines their responses.

How it works:

Client needs order + customer + product data
Client calls Order Service → gets order with customerId, productIds
Client calls Customer Service → gets customer by customerId
Client calls Catalog Service → gets products by productIds
Client combines responses into unified view

api-composition-example
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
// ===================================================
// API COMPOSITION PATTERN
// ===================================================
// An API Gateway or BFF (Backend for Frontend) 
// composes data from multiple services.
// ===================================================
 
interface EnrichedOrder {
  orderId: string;
  orderDate: Date;
  status: string;
  customer: {
    id: string;
    name: string;
    email: string;
  };
  items: Array<{
    productId: string;
    productName: string;
    category: string;
    quantity: number;
    unitPrice: number;
  }>;
  totalAmount: number;
}
 
class OrderCompositionService {
  private orderClient: OrderServiceClient;
  private customerClient: CustomerServiceClient;
  private catalogClient: CatalogServiceClient;
 
  async getEnrichedOrder(orderId: string): Promise<EnrichedOrder> {
    // Step 1: Get the core order data
    const order = await this.orderClient.getOrder(orderId);
    
    // Step 2: Fetch related data in parallel
    // These calls are independent and can execute concurrently
    const [customer, products] = await Promise.all([
      this.customerClient.getCustomer(order.customerId),
      this.catalogClient.getProducts(
        order.lineItems.map(item => item.productId)
      ),
    ]);
    
    // Step 3: Compose the enriched response
    const productMap = new Map(products.map(p => [p.id, p]));
    
    return {
      orderId: order.id,
      orderDate: order.createdAt,
      status: order.status,
      customer: {
        id: customer.id,
        name: customer.name,
        email: customer.email,
      },
      items: order.lineItems.map(item => {
        const product = productMap.get(item.productId)!;
        return {
          productId: item.productId,
          productName: product.name,
          category: product.category,
          quantity: item.quantity,
          unitPrice: item.unitPrice,
        };
      }),
      totalAmount: order.totalAmount,
    };
  }
 
  async getOrdersForCustomer(customerId: string): Promise<EnrichedOrder[]> {
    // Get all orders for a customer
    const orders = await this.orderClient.getOrdersByCustomer(customerId);
    
    // Get customer info once
    const customer = await this.customerClient.getCustomer(customerId);
    
    // Get all unique products across all orders
    const allProductIds = new Set<string>();
    orders.forEach(order => {
      order.lineItems.forEach(item => allProductIds.add(item.productId));
    });
    
    const products = await this.catalogClient.getProducts([...allProductIds]);
    const productMap = new Map(products.map(p => [p.id, p]));
    
    // Compose all orders
    return orders.map(order => ({
      orderId: order.id,
      orderDate: order.createdAt,
      status: order.status,
      customer: {
        id: customer.id,
        name: customer.name,
        email: customer.email,
      },
      items: order.lineItems.map(item => ({
        productId: item.productId,
        productName: productMap.get(item.productId)?.name ?? 'Unknown',
        category: productMap.get(item.productId)?.category ?? 'Unknown',
        quantity: item.quantity,
        unitPrice: item.unitPrice,
      })),
      totalAmount: order.totalAmount,
    }));
  }
}

API Composition Strengths

•✓ Simple to understand and implement
•✓ No additional infrastructure required
•✓ Data is always current (fetched real-time)
•✓ Services remain loosely coupled
•✓ Works for any query pattern

API Composition Weaknesses

•✗ Latency is sum of all service calls
•✗ Availability is product of service availability
•✗ No cross-service filtering or sorting
•✗ Expensive for large result sets
•✗ Consistency issues (data from different times)

When to use API Composition:

Point lookups and small result sets
Latency tolerance of hundreds of milliseconds is acceptable
No need to filter or sort by data from other services
All required services have high availability

When to avoid:

Queries returning thousands of records
Need to filter by foreign service fields
Latency-critical paths (<100ms requirement)
Aggregation queries

Pattern 2: Materialized Views

How it works:

Define the query pattern you need to support
Create a store (database, search index) optimized for that pattern
Subscribe to events from all relevant services
Build and update the view as events arrive
Query the materialized view directly—no cross-service calls

materialized-view-example
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
// ===================================================
// MATERIALIZED VIEW PATTERN
// ===================================================
// Build a pre-computed view that supports specific
// cross-service query patterns efficiently.
// ===================================================
 
// The materialized view model - optimized for common queries
interface OrderSearchView {
  // Order fields
  orderId: string;
  orderDate: Date;
  orderStatus: string;
  totalAmount: number;
  
  // Customer fields (denormalized from Customer Service)
  customerId: string;
  customerName: string;
  customerEmail: string;
  customerRegion: string;  // For regional queries
  
  // Product fields (denormalized from Catalog Service)
  productIds: string[];
  productNames: string[];
  categories: string[];  // For category filtering
  
  // Derived fields for search
  searchText: string;  // Combined searchable text
  lastUpdated: Date;
}
 
class OrderSearchViewBuilder {
  private viewStore: OrderSearchViewRepository;  // e.g., Elasticsearch
  private orderClient: OrderServiceClient;
  private customerClient: CustomerServiceClient;
  private catalogClient: CatalogServiceClient;
 
  // ==========================================
  // EVENT HANDLERS - Build view incrementally
  // ==========================================
 
  async handleOrderCreated(event: OrderCreatedEvent): Promise<void> {
    // Fetch related data to build the view
    const [customer, products] = await Promise.all([
      this.customerClient.getCustomer(event.data.customerId),
      this.catalogClient.getProducts(
        event.data.lineItems.map(i => i.productId)
      ),
    ]);
 
    const view: OrderSearchView = {
      orderId: event.aggregateId,
      orderDate: event.data.orderDate,
      orderStatus: event.data.status,
      totalAmount: event.data.totalAmount,
      
      customerId: customer.id,
      customerName: customer.name,
      customerEmail: customer.email,
      customerRegion: customer.region,
      
      productIds: products.map(p => p.id),
      productNames: products.map(p => p.name),
      categories: [...new Set(products.map(p => p.category))],
      
      searchText: this.buildSearchText(event.data, customer, products),
      lastUpdated: new Date(),
    };
 
    await this.viewStore.index(view);
  }
 
  async handleOrderStatusUpdated(event: OrderStatusUpdatedEvent): Promise<void> {
    // Partial update - only change the status field
    await this.viewStore.updatePartial(event.aggregateId, {
      orderStatus: event.data.newStatus,
      lastUpdated: new Date(),
    });
  }
 
  async handleCustomerUpdated(event: CustomerUpdatedEvent): Promise<void> {
    // Update all orders for this customer
    // This is the trade-off: fan-out updates when source data changes
    const ordersForCustomer = await this.viewStore.findByCustomerId(
      event.aggregateId
    );
 
    for (const order of ordersForCustomer) {
      await this.viewStore.updatePartial(order.orderId, {
        customerName: event.data.name,
        customerEmail: event.data.email,
        customerRegion: event.data.region ?? order.customerRegion,
        lastUpdated: new Date(),
      });
    }
  }
 
  async handleProductUpdated(event: ProductUpdatedEvent): Promise<void> {
    // Update all orders containing this product
    const ordersWithProduct = await this.viewStore.findByProductId(
      event.aggregateId
    );
 
    for (const order of ordersWithProduct) {
      // Refetch all products for this order to rebuild product fields
      const products = await this.catalogClient.getProducts(order.productIds);
      
      await this.viewStore.updatePartial(order.orderId, {
        productNames: products.map(p => p.name),
        categories: [...new Set(products.map(p => p.category))],
        lastUpdated: new Date(),
      });
    }
  }
 
  private buildSearchText(order: any, customer: any, products: any[]): string {
    return [
      order.id,
      customer.name,
      customer.email,
      ...products.map(p => p.name),
      ...products.map(p => p.category),
    ].join(' ').toLowerCase();
  }
}
 
// ==========================================
// QUERY SERVICE - Uses the materialized view
// ==========================================
 
class OrderSearchService {
  private viewStore: OrderSearchViewRepository;
 
  async searchOrders(query: OrderSearchQuery): Promise<OrderSearchResult> {
    // All filtering happens on the materialized view
    // No cross-service calls at query time!
    
    const filters: any = {};
    
    if (query.customerRegion) {
      filters.customerRegion = query.customerRegion;
    }
    
    if (query.categories?.length) {
      filters.categories = { $in: query.categories };
    }
    
    if (query.dateRange) {
      filters.orderDate = {
        $gte: query.dateRange.start,
        $lte: query.dateRange.end,
      };
    }
    
    if (query.searchText) {
      filters.searchText = { $contains: query.searchText.toLowerCase() };
    }
 
    const results = await this.viewStore.search({
      filters,
      sort: query.sortBy ?? 'orderDate',
      sortOrder: query.sortOrder ?? 'desc',
      limit: query.limit ?? 50,
      offset: query.offset ?? 0,
    });
 
    return {
      orders: results.hits,
      total: results.totalCount,
      hasMore: results.totalCount > (query.offset ?? 0) + results.hits.length,
    };
  }
 
  // Complex queries are now fast!
  async getRevenueByRegion(): Promise<Map<string, number>> {
    return await this.viewStore.aggregate({
      groupBy: 'customerRegion',
      sum: 'totalAmount',
    });
  }
}

Trade-offs of Materialized Views:

Aspect	Benefit	Cost
Query latency	Fast—single datastore	N/A
Query flexibility	Optimized for defined patterns	Can't support arbitrary queries
Data freshness	Eventual consistency	Staleness during sync
Write fan-out	N/A	One source change updates many views
Storage	N/A	Stores data redundantly
Operational	N/A	Additional system to maintain

Use Elasticsearch or Similar

Pattern 3: CQRS (Command Query Responsibility Segregation)

CQRS takes materialized views to the architectural level. Instead of a single model for reads and writes, you maintain separate read and write models, each optimized for its purpose.

CQRS Architecture:

                     ┌─────────────────────────┐
                     │      API Gateway         │
                     └────────────┬─────────────┘
                                  │
            ┌─────────────────────┴─────────────────────┐
            │                                           │
    ┌───────▼───────┐                          ┌───────▼───────┐
    │   Commands     │                          │    Queries    │
    └───────┬────────┘                          └───────┬───────┘
            │                                           │
    ┌───────▼────────┐                         ┌───────▼───────┐
    │   Write Model   │──── Events ────────────▶│  Read Model   │
    │ (Domain Services)│                        │ (Query Services)│
    │                  │                        │                │
    │  ┌─────────────┐ │                        │ ┌─────────────┐ │
    │  │ Customer DB │ │                        │ │ Order Search│ │
    │  │ Order DB    │ │                        │ │ (Elastic)   │ │
    │  │ Catalog DB  │ │                        │ │             │ │
    │  └─────────────┘ │                        │ │ Analytics   │ │
    │                  │                        │ │ (ClickHouse)│ │
    └──────────────────┘                        │ └─────────────┘ │
                                                └─────────────────┘

How CQRS solves cross-service queries:

Write side remains service-oriented: Customer Service handles customer commands, Order Service handles order commands
Read side is query-oriented: build views that span multiple domains, optimized for specific use cases
Events connect them: write-side services publish events; read-side services consume and build views
Multiple read models: you can have different read models for different query patterns (search, analytics, reporting)

CQRS Read Models for Different Use Cases
Read Model	Technology	Use Case	Data Sources
Order Search	Elasticsearch	Full-text search, filtering	Orders, Customers, Catalog
Customer 360	PostgreSQL	Customer profile with history	Customers, Orders, Support
Analytics Cube	ClickHouse	Business intelligence	All domain events
Real-time Dashboard	Redis	Live metrics	Aggregated events
Recommendation	Neo4j	Graph-based recommendations	Purchases, Products, Preferences

CQRS Adds Significant Complexity

When to use CQRS:

Read and write patterns are significantly different
High-volume reads that can't afford cross-service calls
Complex queries spanning many domains
Need for specialized query stores (search, analytics, graphs)
Willingness to accept eventual consistency on reads

When to avoid:

Simple CRUD applications
Strong consistency requirements on reads
Small team without event-sourcing experience
Low query volume that doesn't justify complexity

Pattern 4: Data Mesh for Analytics Queries

Data Mesh is an architectural paradigm that treats data as a product, owned by domain teams. It's particularly relevant for analytical and reporting queries that span the entire organization.

Key concepts:

Domain Ownership — The Order team owns and publishes the 'Orders' data product
Data as a Product — Published data has quality guarantees, documentation, SLAs
Self-Serve Platform — Infrastructure enables teams to easily publish and consume
Federated Governance — Standards for interoperability without central bottleneck

data-product-example
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
// ===================================================
// DATA MESH: Domain Data Products
// ===================================================
// Each domain publishes curated, documented data products
// for consumption by other teams and analytics systems.
// ===================================================
 
// Orders Domain publishes this data product
interface OrdersDataProduct {
  // Metadata (discoverable in data catalog)
  productName: 'orders-v2';
  owner: 'order-team';
  description: 'Complete order data for analytics and reporting';
  sla: {
    freshness: '< 5 minutes';
    availability: '99.9%';
  };
  schema: typeof OrderDataSchema;
  
  // Available interfaces
  interfaces: {
    // Streaming for real-time consumers
    stream: {
      type: 'kafka';
      topic: 'orders.events.v2';
      format: 'avro';
    };
    
    // Batch for analytics warehouses
    batch: {
      type: 's3';
      path: 's3://data-lake/orders/daily/';
      format: 'parquet';
      partitionBy: ['date', 'region'];
    };
    
    // Query for ad-hoc analysis
    query: {
      type: 'presto';
      catalog: 'orders';
      table: 'orders_v2';
    };
  };
}
 
const OrderDataSchema = {
  orderId: 'string',
  orderDate: 'timestamp',
  status: 'string',
  customerId: 'string',
  totalAmount: 'decimal(10,2)',
  currency: 'string',
  itemCount: 'int',
  region: 'string',
  channel: 'string',  // web, mobile, api
  // Note: No PII like customer name/email
  // Consumers JOIN with Customers data product if needed
};
 
// ==========================================
// CONSUMING DATA PRODUCTS
// ==========================================
 
// Analytics team builds cross-domain reports
class RevenueReportBuilder {
  private ordersProduct: DataProductClient<OrdersDataProduct>;
  private customersProduct: DataProductClient<CustomersDataProduct>;
  private productsProduct: DataProductClient<ProductsDataProduct>;
 
  async buildRevenueBySegmentReport(): Promise<Report> {
    // Query each data product
    // The data mesh platform handles the distributed query
    const query = `
      SELECT 
        c.segment,
        c.region,
        p.category,
        SUM(o.totalAmount) as revenue,
        COUNT(DISTINCT o.orderId) as orderCount,
        COUNT(DISTINCT o.customerId) as customerCount
      FROM orders.orders_v2 o
      JOIN customers.customers_v2 c ON o.customerId = c.customerId
      JOIN products.products_v2 p ON o.primaryProductId = p.productId
      WHERE o.orderDate >= DATE_SUB(CURRENT_DATE, 30)
        AND o.status = 'completed'
      GROUP BY c.segment, c.region, p.category
      ORDER BY revenue DESC
    `;
    
    // The query engine (e.g., Presto/Trino) federates across products
    const results = await this.queryEngine.execute(query);
    
    return this.formatReport(results);
  }
}

When Data Mesh applies:

Large organizations with many domains
Significant analytical and BI requirements
Data quality and governance are critical
Teams have capacity to own data products

Trade-offs:

Requires organizational buy-in and cultural change
Each team needs data engineering capability
Platform investment for self-serve infrastructure
Governance without centralization is challenging

Data Mesh for Operational vs. Analytical

Choosing the Right Query Pattern

With multiple patterns available, how do you choose? The decision depends on query characteristics, consistency requirements, and system constraints.

Pattern Selection Matrix
Factor	API Composition	Materialized View	CQRS	Data Mesh
Query complexity	Low	Medium-High	High	Very High
Latency requirement	Tolerant (100ms+)	Strict (<50ms)	Strict	Tolerant
Consistency	Strong-ish	Eventual	Eventual	Eventual
Query volume	Low-Medium	High	Very High	Varies
Implementation cost	Low	Medium	High	Very High
Operational cost	Low	Medium	High	High
Use case	Enrichment	Search/Filter	Complex apps	Analytics

Decision flowchart:

Is this an analytical/BI query?
├── Yes → Consider Data Mesh or dedicated data warehouse
└── No → Continue...
        │
        ▼
Can you tolerate eventual consistency?
├── No → Must use API Composition (accept latency)
└── Yes → Continue...
        │
        ▼
Need to filter/sort by data from multiple services?
├── No → API Composition is sufficient
└── Yes → Continue...
        │
        ▼
How many different query patterns?
├── Few (1-3) → Materialized Views
└── Many → Consider CQRS

Combining patterns:

Real systems often use multiple patterns:

API Composition for simple enrichment (order detail page)
Materialized View for search (order search with filters)
CQRS for complex application (admin dashboard with multiple views)
Data Mesh for analytics (monthly business reports)

Each pattern addresses different needs. The goal is matching the right pattern to each use case, not picking one pattern for everything.

Start Simple, Evolve as Needed

Implementation Considerations

Regardless of which pattern you choose, several implementation concerns apply across the board.

Cross-Cutting Implementation Concerns

•Authorization — Cross-service queries complicate authorization. If User A can see their orders but not customer demographics, how do you filter enriched results? Push authorization into each service, or apply it in the aggregation layer.
•Pagination — When composing data from multiple services, pagination becomes complex. You can't paginate orders and then enrich — what if you need 100 orders but Customer Service is down? Design pagination that's resilient to partial data.
•Error Handling — What happens when one service in a composition fails? Options: partial response, cached fallback, error to user. Define degradation strategy per-service and per-use case.
•Caching — For API Composition, caching enriched responses can reduce load and latency. But cache invalidation is challenging when source data spans services. Consider short TTLs or event-based invalidation.
•Monitoring — Track query latency broken down by component. If Order Search is slow, is it the view store or event processing lag? Visibility into each piece is essential.
•Testing — Integration testing cross-service queries is difficult. Use contract tests to verify service interfaces; use synthetic tests against materialized views to verify query behavior.

Beware N+1 Queries

Summary: Querying Across Service Boundaries

Key Takeaways

•API Composition is the simplest pattern — call services, combine responses. Use for point lookups and simple enrichment. Accept latency from sequential calls.
•Materialized Views pre-compute query results — build denormalized stores updated via events. Use for high-volume search and filtering across domains.
•CQRS separates read and write models — write side stays service-oriented, read side holds query-optimized views. Use for complex applications with diverse query needs.
•Data Mesh treats data as products — domains publish curated data for cross-organizational consumption. Use for large-scale analytics and BI.
•Choose based on query characteristics — complexity, latency, consistency, and volume drive pattern selection. Most systems use multiple patterns.
•Implementation details matter — authorization, pagination, error handling, and caching all require careful design for cross-service queries.

What's next:

Page Complete

4 / 5