Inter-Service Comm - Learning Module

Loading content...

0/273

Protocol Selection

Choosing the Right Wire Protocol

Once you've decided on synchronous vs asynchronous communication and defined your API contracts, you face the next architectural decision: which protocol should carry those contracts across the network?

This choice affects far more than you might initially expect. The protocol you choose influences:

Latency: Some protocols are inherently faster due to binary formats and connection reuse
Throughput: Different protocols handle high-volume scenarios differently
Streaming: Real-time data flows have vastly different support across protocols
Tooling: Developer experience varies dramatically—from debugging to testing to monitoring
Client compatibility: Not all protocols work in all environments (especially browsers)
Operational complexity: Different infrastructure requirements and failure modes

There's no universally superior protocol. Each has been designed for specific use cases, and understanding those origins helps predict where each excels.

What You Will Master

By the end of this page, you will understand the strengths, weaknesses, and ideal use cases for REST, gRPC, GraphQL, and message-based protocols. You'll develop a decision framework for selecting protocols based on specific requirements and learn how organizations successfully use multiple protocols for different communication patterns.

REST: The Universal Standard

REST (Representational State Transfer) is the dominant API paradigm on the web. Built on HTTP, REST uses standard HTTP methods (GET, POST, PUT, DELETE) to operate on resources identified by URLs.

Core REST Principles

Statelessness: Each request contains all information needed to process it. The server maintains no session state between requests.

Resource-Oriented Design: APIs are organized around resources (nouns), not actions (verbs). /orders/123 identifies an order; HTTP methods determine what to do with it.

Uniform Interface: Standard HTTP semantics apply universally. GET is safe and idempotent. PUT is idempotent. POST creates resources. DELETE removes them.

HATEOAS (Hypermedia): Responses include links to related resources, enabling client navigation without hardcoded URLs. Often aspirational rather than implemented.

REST Protocol Characteristics
Characteristic	Details
Serialization	Typically JSON (human-readable), occasionally XML
Transport	HTTP/1.1 or HTTP/2
Discovery	OpenAPI specs, documentation portals
Streaming	Limited—HTTP chunked encoding, Server-Sent Events
Browser Support	Native—fetch API, works everywhere
Tooling	Extensive—Postman, cURL, browser DevTools

REST Strengths

•Universal compatibility — Works in browsers, mobile, servers, IoT
•Human-readable — JSON payloads are debuggable without special tools
•Cacheable — HTTP caching semantics (ETags, Cache-Control) built in
•Mature tooling — Decades of ecosystem development
•No special infrastructure — Any HTTP server works
•Load balancer friendly — Standard HTTP routing

REST Weaknesses

•Verbose payloads — JSON overhead; field names repeated
•No built-in typing — Runtime validation required
•Over/under-fetching — Fixed response shapes
•HTTP/1.1 limitations — Connection overhead, head-of-line blocking
•Poor streaming support — Not designed for real-time
•Schema evolution challenges — Field naming, type changes

When to Choose REST

Ideal for:

Public APIs consumed by external developers
Web applications with browser clients
CRUD-heavy applications with simple data models
Situations where caching provides significant benefit
APIs that prioritize adoption over raw performance

Not ideal for:

High-frequency internal service communication (gRPC better)
Real-time bidirectional streaming (WebSockets/gRPC better)
Bandwidth-constrained environments (binary protocols better)
APIs with complex, nested data needs (GraphQL might help)

rest-example
TypeScript (Express)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
// RESTful Order API with proper HTTP semantics
import express, { Request, Response, NextFunction } from 'express';
 
const app = express();
 
// GET /orders - List orders (cacheable, safe, idempotent)
app.get('/orders', async (req: Request, res: Response) => {
  const { customerId, status, limit = 20, cursor } = req.query;
  
  const orders = await orderRepository.find({
    customerId: customerId as string,
    status: status as string,
    limit: Number(limit),
    after: cursor as string,
  });
  
  // Pagination using cursors (not offset)
  const nextCursor = orders.length === Number(limit)
    ? orders[orders.length - 1].id
    : null;
  
  res.json({
    data: orders,
    pagination: {
      nextCursor,
      hasMore: nextCursor !== null,
    },
    _links: {
      self: `/orders?limit=${limit}${cursor ? `&cursor=${cursor}` : ''}`,
      next: nextCursor ? `/orders?limit=${limit}&cursor=${nextCursor}` : null,
    },
  });
});
 
// GET /orders/:id - Single order (cacheable)
app.get('/orders/:id', async (req: Request, res: Response) => {
  const order = await orderRepository.findById(req.params.id);
  
  if (!order) {
    return res.status(404).json({
      error: 'ORDER_NOT_FOUND',
      message: `Order ${req.params.id} not found`,
    });
  }
  
  // Enable caching with ETag
  const etag = computeETag(order);
  res.set('ETag', etag);
  
  // Check if client has current version
  if (req.get('If-None-Match') === etag) {
    return res.status(304).end(); // Not Modified
  }
  
  res.json({
    data: order,
    _links: {
      self: `/orders/${order.id}`,
      customer: `/customers/${order.customerId}`,
      cancel: order.canCancel ? `/orders/${order.id}/cancel` : null,
    },
  });
});
 
// POST /orders - Create order (not idempotent by default)
app.post('/orders', async (req: Request, res: Response) => {
  // Idempotency via client-provided key
  const idempotencyKey = req.get('Idempotency-Key');
  
  if (idempotencyKey) {
    const existing = await orderRepository.findByIdempotencyKey(idempotencyKey);
    if (existing) {
      return res.status(200).json({ data: existing }); // Return cached result
    }
  }
  
  const order = await orderRepository.create({
    ...req.body,
    idempotencyKey,
  });
  
  res.status(201)
    .set('Location', `/orders/${order.id}`)
    .json({ data: order });
});
 
// PUT /orders/:id - Full replacement (idempotent)
app.put('/orders/:id', async (req: Request, res: Response) => {
  const order = await orderRepository.replace(req.params.id, req.body);
  res.json({ data: order });
});
 
// DELETE /orders/:id - Remove order (idempotent)
app.delete('/orders/:id', async (req: Request, res: Response) => {
  await orderRepository.delete(req.params.id);
  res.status(204).end(); // No Content
});

gRPC: High-Performance Internal Communication

gRPC (gRPC Remote Procedure Call) is Google's high-performance RPC framework. It uses Protocol Buffers for serialization and HTTP/2 for transport, designed specifically for efficient service-to-service communication.

Core gRPC Characteristics

Protocol Buffers: Binary serialization format that's 3-10x smaller than JSON and faster to parse. Strongly typed with generated code.

HTTP/2 Native: Multiplexing (multiple streams over single connection), header compression, server push. Eliminates HTTP/1.1's connection overhead.

Four Communication Patterns:

Unary RPC: Traditional request-response
Server Streaming: Server sends stream of responses to single request
Client Streaming: Client sends stream of requests, server returns single response
Bidirectional Streaming: Both client and server send streams independently

Code Generation: Define service in .proto files; generate type-safe client and server stubs for 10+ languages.

gRPC Protocol Characteristics
Characteristic	Details
Serialization	Protocol Buffers (binary, efficient)
Transport	HTTP/2 (required)
Discovery	Proto files, gRPC reflection
Streaming	Native—all four patterns supported
Browser Support	Limited—requires gRPC-Web proxy
Tooling	Growing—grpcurl, BloomRPC, built-in CLI

gRPC Strengths

•Performance — 3-10x smaller payloads, faster parsing
•Strong typing — Compile-time errors, generated clients
•Streaming — Native bidirectional streaming support
•HTTP/2 benefits — Multiplexing, reduced latency
•Code generation — Consistent clients across languages
•Deadlines — Built-in timeout propagation across services

gRPC Weaknesses

•Browser limitations — Requires gRPC-Web, extra proxy layer
•Not human-readable — Binary format needs tooling to inspect
•HTTP/2 requirement — Some infrastructure doesn't support it
•Load balancer complexity — Needs L7 balancers with gRPC support
•Learning curve — Proto syntax, build tooling setup
•No native caching — HTTP caching doesn't apply

When to Choose gRPC

Ideal for:

High-volume internal service-to-service communication
Real-time streaming applications (live data, chat, gaming)
Polyglot environments needing consistent clients
Performance-critical paths where latency matters
Mobile applications with bandwidth constraints

Not ideal for:

Public APIs for external developers (REST more accessible)
Browser-only applications (needs gRPC-Web proxy)
Simple CRUD with few operations (REST simpler)
Environments without HTTP/2 infrastructure support

grpc-example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
// order_service.proto - gRPC service definition
syntax = "proto3";
 
package orders.v1;
 
import "google/protobuf/timestamp.proto";
import "google/protobuf/empty.proto";
 
// OrderService provides order management operations
service OrderService {
  // Unary RPC - simple request/response
  rpc CreateOrder(CreateOrderRequest) returns (Order);
  rpc GetOrder(GetOrderRequest) returns (Order);
  
  // Server streaming - get all orders for a customer as a stream
  // Useful when there are many orders to return
  rpc ListOrders(ListOrdersRequest) returns (stream Order);
  
  // Client streaming - batch create orders
  // Client sends multiple orders, server responds with summary
  rpc BatchCreateOrders(stream CreateOrderRequest) returns (BatchCreateResponse);
  
  // Bidirectional streaming - real-time order updates
  // Subscribe to changes, can also send acknowledgments
  rpc WatchOrders(stream WatchOrdersRequest) returns (stream OrderUpdate);
}
 
message CreateOrderRequest {
  string customer_id = 1;
  repeated OrderItem items = 2;
  string idempotency_key = 3;
}
 
message GetOrderRequest {
  string order_id = 1;
}
 
message ListOrdersRequest {
  string customer_id = 1;
  OrderStatus status_filter = 2;
  int32 limit = 3;
}
 
message Order {
  string id = 1;
  string customer_id = 2;
  repeated OrderItem items = 3;
  OrderStatus status = 4;
  int64 total_cents = 5;
  google.protobuf.Timestamp created_at = 6;
}
 
message OrderItem {
  string product_id = 1;
  int32 quantity = 2;
  int64 unit_price_cents = 3;
}
 
enum OrderStatus {
  ORDER_STATUS_UNSPECIFIED = 0;
  ORDER_STATUS_PENDING = 1;
  ORDER_STATUS_CONFIRMED = 2;
  ORDER_STATUS_SHIPPED = 3;
  ORDER_STATUS_DELIVERED = 4;
  ORDER_STATUS_CANCELLED = 5;
}
 
message WatchOrdersRequest {
  oneof request {
    SubscribeRequest subscribe = 1;
    AcknowledgeRequest acknowledge = 2;
  }
}
 
message SubscribeRequest {
  repeated string order_ids = 1;
}
 
message AcknowledgeRequest {
  string update_id = 1;
}
 
message OrderUpdate {
  string update_id = 1;
  Order order = 2;
  OrderStatus previous_status = 3;
  google.protobuf.Timestamp changed_at = 4;
}
 
message BatchCreateResponse {
  int32 successful = 1;
  int32 failed = 2;
  repeated string created_order_ids = 3;
}

GraphQL: Flexible Query Language

GraphQL is a query language and runtime that gives clients the power to request exactly the data they need. Developed by Facebook, it addresses REST's over-fetching and under-fetching problems.

Core GraphQL Concepts

Schema Definition: Strongly-typed schema defines available data and operations. Types, relationships, and available queries are explicit.

Client-Specified Queries: Clients define the exact shape of the response. No more fetching 50 fields when you need 3.

Single Endpoint: All operations go through one endpoint (typically /graphql). The request body specifies what to do.

Three Operation Types:

Query: Read data (like GET)
Mutation: Write data (like POST/PUT/DELETE)
Subscription: Real-time updates via WebSocket

GraphQL Characteristics
Characteristic	Details
Serialization	JSON (request and response)
Transport	HTTP (queries/mutations), WebSocket (subscriptions)
Discovery	Introspection queries, GraphQL Playground
Streaming	Subscriptions for real-time (WebSocket)
Browser Support	Native—standard fetch API
Tooling	Strong—Apollo, Relay, GraphQL Playground

GraphQL Strengths

•Precise data fetching — No over/under-fetching
•Strong typing — Schema validation, introspection
•Reduced round trips — Fetch related data in one request
•API evolution — Deprecate fields without breaking clients
•Developer experience — GraphQL Playground, auto-complete
•Federation — Combine multiple services into one graph

GraphQL Weaknesses

•Complexity cost — Resolver implementation, N+1 problems
•Caching challenges — No HTTP caching (POST requests)
•Query complexity attacks — Deep/wide queries can DoS
•Overhead — Schema stitching/federation is complex
•Learning curve — New paradigm for backend teams
•Monitoring — Single endpoint makes metrics harder

When to Choose GraphQL

Ideal for:

Client applications with diverse data needs (web, mobile, IoT)
APIs serving multiple client types with different requirements
Product teams wanting backend flexibility without backend changes
Data aggregation from multiple backend services
Rapid frontend iteration with evolving data requirements

Not ideal for:

Simple CRUD with predictable data shapes (REST simpler)
Internal service-to-service APIs (gRPC more efficient)
File uploads and downloads (REST better suited)
Operations requiring HTTP caching benefits

The N+1 Problem in GraphQL

GraphQL's flexibility creates the N+1 query problem. If you query 100 orders with their customers, a naive implementation makes 100 database queries for customers. Solutions include DataLoader for batching, join-monster for SQL optimization, or careful resolver design. This isn't magic—it requires engineering effort.

graphql-example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# Order API Schema
type Query {
  # Get single order by ID
  order(id: ID!): Order
  
  # List orders with filters
  orders(
    customerId: ID
    status: OrderStatus
    first: Int = 20
    after: String
  ): OrderConnection!
  
  # Get current user's orders (authenticated)
  myOrders(first: Int = 20, after: String): OrderConnection!
}
 
type Mutation {
  # Create a new order
  createOrder(input: CreateOrderInput!): CreateOrderPayload!
  
  # Cancel an existing order
  cancelOrder(id: ID!, reason: String): CancelOrderPayload!
}
 
type Subscription {
  # Subscribe to order status changes
  orderStatusChanged(orderId: ID!): OrderStatusUpdate!
}
 
# Relay-style connection for pagination
type OrderConnection {
  edges: [OrderEdge!]!
  pageInfo: PageInfo!
  totalCount: Int!
}
 
type OrderEdge {
  node: Order!
  cursor: String!
}
 
type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}
 
type Order {
  id: ID!
  status: OrderStatus!
  totalAmount: Money!
  createdAt: DateTime!
  updatedAt: DateTime!
  
  # Related objects - fetched only if requested
  customer: Customer!
  items: [OrderItem!]!
  shipments: [Shipment!]!
  payments: [Payment!]!
}
 
type OrderItem {
  id: ID!
  quantity: Int!
  unitPrice: Money!
  product: Product!  # Fetched from Product Service
}
 
type Customer {
  id: ID!
  name: String!
  email: String!
  orders(first: Int = 10): OrderConnection!  # Nested pagination
}
 
enum OrderStatus {
  PENDING
  CONFIRMED
  SHIPPED
  DELIVERED
  CANCELLED
}
 
input CreateOrderInput {
  items: [OrderItemInput!]!
  shippingAddressId: ID!
  idempotencyKey: String!
}
 
input OrderItemInput {
  productId: ID!
  quantity: Int!
}
 
type CreateOrderPayload {
  order: Order
  errors: [UserError!]!
}
 
type UserError {
  field: [String!]
  message: String!
  code: ErrorCode!
}

Message Queues: Asynchronous Protocols

For asynchronous communication, message queue protocols provide the foundation. These aren't request-response protocols like REST or gRPC—they're publish-subscribe and event-driven systems.

AMQP (Advanced Message Queuing Protocol)

Open standard for message-oriented middleware. RabbitMQ is the most popular implementation.

Exchange-based routing: Messages go to exchanges, which route to queues based on rules
Acknowledgment model: Explicit acks ensure reliable delivery
Queue semantics: FIFO within queues, consumer groups for scaling
Best for: Traditional enterprise messaging, work queues, RPC patterns

Kafka Protocol

Apache Kafka's proprietary binary protocol, designed for distributed commit logs.

Append-only log: Messages stored durably in order
Partition-based: Horizontal scaling via partitions
Consumer offsets: Consumers track their position; can replay
Retention: Messages kept for configured time, not just until consumed
Best for: Event sourcing, stream processing, high-throughput analytics

Cloud Provider Protocols

AWS SQS/SNS: Managed queuing with HTTP API
Google Pub/Sub: Global, ordered messaging with HTTP/gRPC
Azure Service Bus: Enterprise messaging with AMQP

Message Queue Protocol Comparison
Protocol/System	Throughput	Ordering	Retention	Best Use Case
AMQP (RabbitMQ)	Moderate (50K/s)	Per-queue FIFO	Until consumed	Work queues, traditional messaging
Kafka	Very High (1M+/s)	Per-partition	Time/size based	Event streaming, analytics
AWS SQS	High (managed)	Best-effort (FIFO available)	14 days max	Serverless integration
Google Pub/Sub	Very High (managed)	Per-partition	7 days (configurable)	Global event distribution

Kafka vs Traditional Queues

The key distinction: traditional queues delete messages after consumption; Kafka retains a log. This enables multiple consumers to read independently, replay from any point, and build materialised views. Kafka is not 'better' than queues—it's a different paradigm suited to event-driven architectures where events are facts worth preserving.

message-queue-examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// Kafka producer and consumer with proper semantics
import { Kafka, logLevel } from 'kafkajs';
 
const kafka = new Kafka({
  clientId: 'order-service',
  brokers: ['kafka-1:9092', 'kafka-2:9092'],
  logLevel: logLevel.WARN,
});
 
// Producer: publishing order events
const producer = kafka.producer({
  // Exactly-once semantics (Kafka 0.11+)
  idempotent: true,
  maxInFlightRequests: 5,
});
 
async function publishOrderEvent(event: OrderEvent): Promise<void> {
  await producer.send({
    topic: 'order-events',
    messages: [{
      // Key determines partition - related events go to same partition
      key: event.orderId,
      value: JSON.stringify(event),
      headers: {
        eventType: event.type,
        correlationId: event.correlationId,
        timestamp: Date.now().toString(),
      },
    }],
  });
}
 
// Consumer: processing order events
const consumer = kafka.consumer({
  groupId: 'notification-service',  // Consumer group for scaling
});
 
async function startConsumer(): Promise<void> {
  await consumer.connect();
  await consumer.subscribe({
    topic: 'order-events',
    fromBeginning: false,  // Only new messages
  });
 
  await consumer.run({
    // Control concurrency
    partitionsConsumedConcurrently: 3,
    
    // Process each message
    eachMessage: async ({ topic, partition, message }) => {
      const event = JSON.parse(message.value!.toString());
      
      // Idempotency: check if already processed
      const messageId = `${partition}-${message.offset}`;
      if (await isProcessed(messageId)) {
        return;
      }
      
      try {
        await processOrderEvent(event);
        await markProcessed(messageId);
        // Offset auto-committed after successful processing
      } catch (error) {
        // Don't commit offset - message will be redelivered
        console.error('Failed to process:', error);
        throw error;
      }
    },
  });
}

Protocol Selection Decision Framework

With multiple viable protocols, how do you choose? Here's a systematic framework based on key requirements:

Decision Criteria

•Who are the consumers? External developers → REST or GraphQL. Internal services → gRPC. Background processors → Message queues.
•What's the latency requirement? Sub-10ms critical path → gRPC. 100ms acceptable → REST. Seconds/minutes acceptable → Async messaging.
•What's the data access pattern? Fixed CRUD → REST. Flexible, nested queries → GraphQL. High-frequency streaming → gRPC or Kafka.
•Is browser support required? Yes → REST, GraphQL, or gRPC-Web. No → gRPC is excellent for server-to-server.
•Is the operation synchronous or async? Request-response → REST/gRPC/GraphQL. Fire-and-forget → Message queue.
•What's the team's expertise? Build on existing skills unless performance demands otherwise.

Protocol Selection by Use Case
Use Case	Recommended Protocol	Rationale
Public developer API	REST	Universal compatibility, great docs/tooling
Mobile app to backend	GraphQL or REST	Precise data fetching, single round-trip
Service-to-service sync	gRPC	Performance, type safety, streaming
Real-time streaming	gRPC or WebSocket	Native bidirectional support
Event publishing	Kafka/RabbitMQ	Durability, multiple consumers, replay
Background job queue	SQS/RabbitMQ	Reliable delivery, worker scaling
API gateway aggregation	GraphQL	Federation, single entry point

Multi-Protocol Architecture

Most mature systems use multiple protocols. A common pattern: GraphQL or REST at the edge (API gateway), gRPC between internal services, and Kafka for event-driven communication. Each protocol plays to its strengths. Don't force one protocol to do everything.

Common Multi-Protocol Patterns

Pattern 1: Edge REST + Internal gRPC

Browser → REST API Gateway → gRPC → Service A
                           → gRPC → Service B

External REST for compatibility; internal gRPC for performance.

Pattern 2: Sync gRPC + Async Kafka

Service A → gRPC → Service B (query)
Service A → Kafka → Service B (events)
Service A → Kafka → Service C (events)

Queries via gRPC; events broadcast via Kafka.

Pattern 3: GraphQL Gateway + REST Services

Clients → GraphQL Gateway → REST → Service A
                          → REST → Service B
                          → REST → Service C

GraphQL aggregates multiple REST backends.

Pattern 4: BFF (Backend for Frontend)

Web App → REST BFF → gRPC → Services
Mobile  → GraphQL BFF → gRPC → Services

Client-specific APIs optimized for each platform.

Summary: Choosing Protocols Deliberately

Protocol selection is a consequential decision that affects performance, developer experience, operational complexity, and system evolution. There's no universal best answer—only the best answer for your specific context.

Let's consolidate the key insights:

Key Takeaways

•REST is the universal default—choose it for public APIs, browser clients, and when simplicity matters more than raw performance.
•gRPC excels for internal service communication—choose it for high-throughput, streaming, or polyglot environments where performance is critical.
•GraphQL solves data-fetching flexibility—choose it when clients have diverse needs and over/under-fetching is a significant problem.
•Message queues (Kafka, RabbitMQ) enable asynchronous patterns—choose them for event-driven architectures, work queues, and decoupled processing.
•Multi-protocol is normal—mature systems use different protocols for different communication patterns. Edge, internal, and async typically use different approaches.
•Match protocol to requirements—latency, browser support, streaming needs, team expertise, and consumer type should drive the decision.

What's next:

Now that we understand protocols, we face the reality that distributed communication fails. Network partitions, timeouts, cascading failures—these are not edge cases but everyday occurrences. The next page explores error handling strategies that make inter-service communication resilient.

Page Complete

You now understand the major protocols for inter-service communication and can make informed decisions about when to use each. You've learned REST, gRPC, GraphQL, and message queue protocols, along with frameworks for choosing between them. Next, we'll tackle the challenge of handling errors gracefully across service boundaries.

Protocol Selection

Choosing the Right Wire Protocol

This choice affects far more than you might initially expect. The protocol you choose influences:

Latency: Some protocols are inherently faster due to binary formats and connection reuse
Throughput: Different protocols handle high-volume scenarios differently
Streaming: Real-time data flows have vastly different support across protocols
Tooling: Developer experience varies dramatically—from debugging to testing to monitoring
Client compatibility: Not all protocols work in all environments (especially browsers)
Operational complexity: Different infrastructure requirements and failure modes

There's no universally superior protocol. Each has been designed for specific use cases, and understanding those origins helps predict where each excels.

What You Will Master

REST: The Universal Standard

REST (Representational State Transfer) is the dominant API paradigm on the web. Built on HTTP, REST uses standard HTTP methods (GET, POST, PUT, DELETE) to operate on resources identified by URLs.

Core REST Principles

Statelessness: Each request contains all information needed to process it. The server maintains no session state between requests.

Resource-Oriented Design: APIs are organized around resources (nouns), not actions (verbs). /orders/123 identifies an order; HTTP methods determine what to do with it.

Uniform Interface: Standard HTTP semantics apply universally. GET is safe and idempotent. PUT is idempotent. POST creates resources. DELETE removes them.

HATEOAS (Hypermedia): Responses include links to related resources, enabling client navigation without hardcoded URLs. Often aspirational rather than implemented.

REST Protocol Characteristics
Characteristic	Details
Serialization	Typically JSON (human-readable), occasionally XML
Transport	HTTP/1.1 or HTTP/2
Discovery	OpenAPI specs, documentation portals
Streaming	Limited—HTTP chunked encoding, Server-Sent Events
Browser Support	Native—fetch API, works everywhere
Tooling	Extensive—Postman, cURL, browser DevTools

REST Strengths

•Universal compatibility — Works in browsers, mobile, servers, IoT
•Human-readable — JSON payloads are debuggable without special tools
•Cacheable — HTTP caching semantics (ETags, Cache-Control) built in
•Mature tooling — Decades of ecosystem development
•No special infrastructure — Any HTTP server works
•Load balancer friendly — Standard HTTP routing

REST Weaknesses

•Verbose payloads — JSON overhead; field names repeated
•No built-in typing — Runtime validation required
•Over/under-fetching — Fixed response shapes
•HTTP/1.1 limitations — Connection overhead, head-of-line blocking
•Poor streaming support — Not designed for real-time
•Schema evolution challenges — Field naming, type changes

When to Choose REST

Ideal for:

Public APIs consumed by external developers
Web applications with browser clients
CRUD-heavy applications with simple data models
Situations where caching provides significant benefit
APIs that prioritize adoption over raw performance

Not ideal for:

High-frequency internal service communication (gRPC better)
Real-time bidirectional streaming (WebSockets/gRPC better)
Bandwidth-constrained environments (binary protocols better)
APIs with complex, nested data needs (GraphQL might help)

rest-example
TypeScript (Express)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
// RESTful Order API with proper HTTP semantics
import express, { Request, Response, NextFunction } from 'express';
 
const app = express();
 
// GET /orders - List orders (cacheable, safe, idempotent)
app.get('/orders', async (req: Request, res: Response) => {
  const { customerId, status, limit = 20, cursor } = req.query;
  
  const orders = await orderRepository.find({
    customerId: customerId as string,
    status: status as string,
    limit: Number(limit),
    after: cursor as string,
  });
  
  // Pagination using cursors (not offset)
  const nextCursor = orders.length === Number(limit)
    ? orders[orders.length - 1].id
    : null;
  
  res.json({
    data: orders,
    pagination: {
      nextCursor,
      hasMore: nextCursor !== null,
    },
    _links: {
      self: `/orders?limit=${limit}${cursor ? `&cursor=${cursor}` : ''}`,
      next: nextCursor ? `/orders?limit=${limit}&cursor=${nextCursor}` : null,
    },
  });
});
 
// GET /orders/:id - Single order (cacheable)
app.get('/orders/:id', async (req: Request, res: Response) => {
  const order = await orderRepository.findById(req.params.id);
  
  if (!order) {
    return res.status(404).json({
      error: 'ORDER_NOT_FOUND',
      message: `Order ${req.params.id} not found`,
    });
  }
  
  // Enable caching with ETag
  const etag = computeETag(order);
  res.set('ETag', etag);
  
  // Check if client has current version
  if (req.get('If-None-Match') === etag) {
    return res.status(304).end(); // Not Modified
  }
  
  res.json({
    data: order,
    _links: {
      self: `/orders/${order.id}`,
      customer: `/customers/${order.customerId}`,
      cancel: order.canCancel ? `/orders/${order.id}/cancel` : null,
    },
  });
});
 
// POST /orders - Create order (not idempotent by default)
app.post('/orders', async (req: Request, res: Response) => {
  // Idempotency via client-provided key
  const idempotencyKey = req.get('Idempotency-Key');
  
  if (idempotencyKey) {
    const existing = await orderRepository.findByIdempotencyKey(idempotencyKey);
    if (existing) {
      return res.status(200).json({ data: existing }); // Return cached result
    }
  }
  
  const order = await orderRepository.create({
    ...req.body,
    idempotencyKey,
  });
  
  res.status(201)
    .set('Location', `/orders/${order.id}`)
    .json({ data: order });
});
 
// PUT /orders/:id - Full replacement (idempotent)
app.put('/orders/:id', async (req: Request, res: Response) => {
  const order = await orderRepository.replace(req.params.id, req.body);
  res.json({ data: order });
});
 
// DELETE /orders/:id - Remove order (idempotent)
app.delete('/orders/:id', async (req: Request, res: Response) => {
  await orderRepository.delete(req.params.id);
  res.status(204).end(); // No Content
});

gRPC: High-Performance Internal Communication

Core gRPC Characteristics

Protocol Buffers: Binary serialization format that's 3-10x smaller than JSON and faster to parse. Strongly typed with generated code.

HTTP/2 Native: Multiplexing (multiple streams over single connection), header compression, server push. Eliminates HTTP/1.1's connection overhead.

Four Communication Patterns:

Unary RPC: Traditional request-response
Server Streaming: Server sends stream of responses to single request
Client Streaming: Client sends stream of requests, server returns single response
Bidirectional Streaming: Both client and server send streams independently

Code Generation: Define service in .proto files; generate type-safe client and server stubs for 10+ languages.

gRPC Protocol Characteristics
Characteristic	Details
Serialization	Protocol Buffers (binary, efficient)
Transport	HTTP/2 (required)
Discovery	Proto files, gRPC reflection
Streaming	Native—all four patterns supported
Browser Support	Limited—requires gRPC-Web proxy
Tooling	Growing—grpcurl, BloomRPC, built-in CLI

gRPC Strengths

•Performance — 3-10x smaller payloads, faster parsing
•Strong typing — Compile-time errors, generated clients
•Streaming — Native bidirectional streaming support
•HTTP/2 benefits — Multiplexing, reduced latency
•Code generation — Consistent clients across languages
•Deadlines — Built-in timeout propagation across services

gRPC Weaknesses

•Browser limitations — Requires gRPC-Web, extra proxy layer
•Not human-readable — Binary format needs tooling to inspect
•HTTP/2 requirement — Some infrastructure doesn't support it
•Load balancer complexity — Needs L7 balancers with gRPC support
•Learning curve — Proto syntax, build tooling setup
•No native caching — HTTP caching doesn't apply

When to Choose gRPC

Ideal for:

High-volume internal service-to-service communication
Real-time streaming applications (live data, chat, gaming)
Polyglot environments needing consistent clients
Performance-critical paths where latency matters
Mobile applications with bandwidth constraints

Not ideal for:

Public APIs for external developers (REST more accessible)
Browser-only applications (needs gRPC-Web proxy)
Simple CRUD with few operations (REST simpler)
Environments without HTTP/2 infrastructure support

grpc-example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
// order_service.proto - gRPC service definition
syntax = "proto3";
 
package orders.v1;
 
import "google/protobuf/timestamp.proto";
import "google/protobuf/empty.proto";
 
// OrderService provides order management operations
service OrderService {
  // Unary RPC - simple request/response
  rpc CreateOrder(CreateOrderRequest) returns (Order);
  rpc GetOrder(GetOrderRequest) returns (Order);
  
  // Server streaming - get all orders for a customer as a stream
  // Useful when there are many orders to return
  rpc ListOrders(ListOrdersRequest) returns (stream Order);
  
  // Client streaming - batch create orders
  // Client sends multiple orders, server responds with summary
  rpc BatchCreateOrders(stream CreateOrderRequest) returns (BatchCreateResponse);
  
  // Bidirectional streaming - real-time order updates
  // Subscribe to changes, can also send acknowledgments
  rpc WatchOrders(stream WatchOrdersRequest) returns (stream OrderUpdate);
}
 
message CreateOrderRequest {
  string customer_id = 1;
  repeated OrderItem items = 2;
  string idempotency_key = 3;
}
 
message GetOrderRequest {
  string order_id = 1;
}
 
message ListOrdersRequest {
  string customer_id = 1;
  OrderStatus status_filter = 2;
  int32 limit = 3;
}
 
message Order {
  string id = 1;
  string customer_id = 2;
  repeated OrderItem items = 3;
  OrderStatus status = 4;
  int64 total_cents = 5;
  google.protobuf.Timestamp created_at = 6;
}
 
message OrderItem {
  string product_id = 1;
  int32 quantity = 2;
  int64 unit_price_cents = 3;
}
 
enum OrderStatus {
  ORDER_STATUS_UNSPECIFIED = 0;
  ORDER_STATUS_PENDING = 1;
  ORDER_STATUS_CONFIRMED = 2;
  ORDER_STATUS_SHIPPED = 3;
  ORDER_STATUS_DELIVERED = 4;
  ORDER_STATUS_CANCELLED = 5;
}
 
message WatchOrdersRequest {
  oneof request {
    SubscribeRequest subscribe = 1;
    AcknowledgeRequest acknowledge = 2;
  }
}
 
message SubscribeRequest {
  repeated string order_ids = 1;
}
 
message AcknowledgeRequest {
  string update_id = 1;
}
 
message OrderUpdate {
  string update_id = 1;
  Order order = 2;
  OrderStatus previous_status = 3;
  google.protobuf.Timestamp changed_at = 4;
}
 
message BatchCreateResponse {
  int32 successful = 1;
  int32 failed = 2;
  repeated string created_order_ids = 3;
}

GraphQL: Flexible Query Language

GraphQL is a query language and runtime that gives clients the power to request exactly the data they need. Developed by Facebook, it addresses REST's over-fetching and under-fetching problems.

Core GraphQL Concepts

Schema Definition: Strongly-typed schema defines available data and operations. Types, relationships, and available queries are explicit.

Client-Specified Queries: Clients define the exact shape of the response. No more fetching 50 fields when you need 3.

Single Endpoint: All operations go through one endpoint (typically /graphql). The request body specifies what to do.

Three Operation Types:

Query: Read data (like GET)
Mutation: Write data (like POST/PUT/DELETE)
Subscription: Real-time updates via WebSocket

GraphQL Characteristics
Characteristic	Details
Serialization	JSON (request and response)
Transport	HTTP (queries/mutations), WebSocket (subscriptions)
Discovery	Introspection queries, GraphQL Playground
Streaming	Subscriptions for real-time (WebSocket)
Browser Support	Native—standard fetch API
Tooling	Strong—Apollo, Relay, GraphQL Playground

GraphQL Strengths

•Precise data fetching — No over/under-fetching
•Strong typing — Schema validation, introspection
•Reduced round trips — Fetch related data in one request
•API evolution — Deprecate fields without breaking clients
•Developer experience — GraphQL Playground, auto-complete
•Federation — Combine multiple services into one graph

GraphQL Weaknesses

•Complexity cost — Resolver implementation, N+1 problems
•Caching challenges — No HTTP caching (POST requests)
•Query complexity attacks — Deep/wide queries can DoS
•Overhead — Schema stitching/federation is complex
•Learning curve — New paradigm for backend teams
•Monitoring — Single endpoint makes metrics harder

When to Choose GraphQL

Ideal for:

Client applications with diverse data needs (web, mobile, IoT)
APIs serving multiple client types with different requirements
Product teams wanting backend flexibility without backend changes
Data aggregation from multiple backend services
Rapid frontend iteration with evolving data requirements

Not ideal for:

Simple CRUD with predictable data shapes (REST simpler)
Internal service-to-service APIs (gRPC more efficient)
File uploads and downloads (REST better suited)
Operations requiring HTTP caching benefits

The N+1 Problem in GraphQL

graphql-example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# Order API Schema
type Query {
  # Get single order by ID
  order(id: ID!): Order
  
  # List orders with filters
  orders(
    customerId: ID
    status: OrderStatus
    first: Int = 20
    after: String
  ): OrderConnection!
  
  # Get current user's orders (authenticated)
  myOrders(first: Int = 20, after: String): OrderConnection!
}
 
type Mutation {
  # Create a new order
  createOrder(input: CreateOrderInput!): CreateOrderPayload!
  
  # Cancel an existing order
  cancelOrder(id: ID!, reason: String): CancelOrderPayload!
}
 
type Subscription {
  # Subscribe to order status changes
  orderStatusChanged(orderId: ID!): OrderStatusUpdate!
}
 
# Relay-style connection for pagination
type OrderConnection {
  edges: [OrderEdge!]!
  pageInfo: PageInfo!
  totalCount: Int!
}
 
type OrderEdge {
  node: Order!
  cursor: String!
}
 
type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}
 
type Order {
  id: ID!
  status: OrderStatus!
  totalAmount: Money!
  createdAt: DateTime!
  updatedAt: DateTime!
  
  # Related objects - fetched only if requested
  customer: Customer!
  items: [OrderItem!]!
  shipments: [Shipment!]!
  payments: [Payment!]!
}
 
type OrderItem {
  id: ID!
  quantity: Int!
  unitPrice: Money!
  product: Product!  # Fetched from Product Service
}
 
type Customer {
  id: ID!
  name: String!
  email: String!
  orders(first: Int = 10): OrderConnection!  # Nested pagination
}
 
enum OrderStatus {
  PENDING
  CONFIRMED
  SHIPPED
  DELIVERED
  CANCELLED
}
 
input CreateOrderInput {
  items: [OrderItemInput!]!
  shippingAddressId: ID!
  idempotencyKey: String!
}
 
input OrderItemInput {
  productId: ID!
  quantity: Int!
}
 
type CreateOrderPayload {
  order: Order
  errors: [UserError!]!
}
 
type UserError {
  field: [String!]
  message: String!
  code: ErrorCode!
}

Message Queues: Asynchronous Protocols

For asynchronous communication, message queue protocols provide the foundation. These aren't request-response protocols like REST or gRPC—they're publish-subscribe and event-driven systems.

AMQP (Advanced Message Queuing Protocol)

Open standard for message-oriented middleware. RabbitMQ is the most popular implementation.

Exchange-based routing: Messages go to exchanges, which route to queues based on rules
Acknowledgment model: Explicit acks ensure reliable delivery
Queue semantics: FIFO within queues, consumer groups for scaling
Best for: Traditional enterprise messaging, work queues, RPC patterns

Kafka Protocol

Apache Kafka's proprietary binary protocol, designed for distributed commit logs.

Append-only log: Messages stored durably in order
Partition-based: Horizontal scaling via partitions
Consumer offsets: Consumers track their position; can replay
Retention: Messages kept for configured time, not just until consumed
Best for: Event sourcing, stream processing, high-throughput analytics

Cloud Provider Protocols

AWS SQS/SNS: Managed queuing with HTTP API
Google Pub/Sub: Global, ordered messaging with HTTP/gRPC
Azure Service Bus: Enterprise messaging with AMQP

Message Queue Protocol Comparison
Protocol/System	Throughput	Ordering	Retention	Best Use Case
AMQP (RabbitMQ)	Moderate (50K/s)	Per-queue FIFO	Until consumed	Work queues, traditional messaging
Kafka	Very High (1M+/s)	Per-partition	Time/size based	Event streaming, analytics
AWS SQS	High (managed)	Best-effort (FIFO available)	14 days max	Serverless integration
Google Pub/Sub	Very High (managed)	Per-partition	7 days (configurable)	Global event distribution

Kafka vs Traditional Queues

message-queue-examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// Kafka producer and consumer with proper semantics
import { Kafka, logLevel } from 'kafkajs';
 
const kafka = new Kafka({
  clientId: 'order-service',
  brokers: ['kafka-1:9092', 'kafka-2:9092'],
  logLevel: logLevel.WARN,
});
 
// Producer: publishing order events
const producer = kafka.producer({
  // Exactly-once semantics (Kafka 0.11+)
  idempotent: true,
  maxInFlightRequests: 5,
});
 
async function publishOrderEvent(event: OrderEvent): Promise<void> {
  await producer.send({
    topic: 'order-events',
    messages: [{
      // Key determines partition - related events go to same partition
      key: event.orderId,
      value: JSON.stringify(event),
      headers: {
        eventType: event.type,
        correlationId: event.correlationId,
        timestamp: Date.now().toString(),
      },
    }],
  });
}
 
// Consumer: processing order events
const consumer = kafka.consumer({
  groupId: 'notification-service',  // Consumer group for scaling
});
 
async function startConsumer(): Promise<void> {
  await consumer.connect();
  await consumer.subscribe({
    topic: 'order-events',
    fromBeginning: false,  // Only new messages
  });
 
  await consumer.run({
    // Control concurrency
    partitionsConsumedConcurrently: 3,
    
    // Process each message
    eachMessage: async ({ topic, partition, message }) => {
      const event = JSON.parse(message.value!.toString());
      
      // Idempotency: check if already processed
      const messageId = `${partition}-${message.offset}`;
      if (await isProcessed(messageId)) {
        return;
      }
      
      try {
        await processOrderEvent(event);
        await markProcessed(messageId);
        // Offset auto-committed after successful processing
      } catch (error) {
        // Don't commit offset - message will be redelivered
        console.error('Failed to process:', error);
        throw error;
      }
    },
  });
}

Protocol Selection Decision Framework

With multiple viable protocols, how do you choose? Here's a systematic framework based on key requirements:

Decision Criteria

•Who are the consumers? External developers → REST or GraphQL. Internal services → gRPC. Background processors → Message queues.
•What's the latency requirement? Sub-10ms critical path → gRPC. 100ms acceptable → REST. Seconds/minutes acceptable → Async messaging.
•What's the data access pattern? Fixed CRUD → REST. Flexible, nested queries → GraphQL. High-frequency streaming → gRPC or Kafka.
•Is browser support required? Yes → REST, GraphQL, or gRPC-Web. No → gRPC is excellent for server-to-server.
•Is the operation synchronous or async? Request-response → REST/gRPC/GraphQL. Fire-and-forget → Message queue.
•What's the team's expertise? Build on existing skills unless performance demands otherwise.

Protocol Selection by Use Case
Use Case	Recommended Protocol	Rationale
Public developer API	REST	Universal compatibility, great docs/tooling
Mobile app to backend	GraphQL or REST	Precise data fetching, single round-trip
Service-to-service sync	gRPC	Performance, type safety, streaming
Real-time streaming	gRPC or WebSocket	Native bidirectional support
Event publishing	Kafka/RabbitMQ	Durability, multiple consumers, replay
Background job queue	SQS/RabbitMQ	Reliable delivery, worker scaling
API gateway aggregation	GraphQL	Federation, single entry point

Multi-Protocol Architecture

Common Multi-Protocol Patterns

Pattern 1: Edge REST + Internal gRPC

Browser → REST API Gateway → gRPC → Service A
                           → gRPC → Service B

External REST for compatibility; internal gRPC for performance.

Pattern 2: Sync gRPC + Async Kafka

Service A → gRPC → Service B (query)
Service A → Kafka → Service B (events)
Service A → Kafka → Service C (events)

Queries via gRPC; events broadcast via Kafka.

Pattern 3: GraphQL Gateway + REST Services

Clients → GraphQL Gateway → REST → Service A
                          → REST → Service B
                          → REST → Service C

GraphQL aggregates multiple REST backends.

Pattern 4: BFF (Backend for Frontend)

Web App → REST BFF → gRPC → Services
Mobile  → GraphQL BFF → gRPC → Services

Client-specific APIs optimized for each platform.

Summary: Choosing Protocols Deliberately

Let's consolidate the key insights:

Key Takeaways

•REST is the universal default—choose it for public APIs, browser clients, and when simplicity matters more than raw performance.
•gRPC excels for internal service communication—choose it for high-throughput, streaming, or polyglot environments where performance is critical.
•GraphQL solves data-fetching flexibility—choose it when clients have diverse needs and over/under-fetching is a significant problem.
•Message queues (Kafka, RabbitMQ) enable asynchronous patterns—choose them for event-driven architectures, work queues, and decoupled processing.
•Multi-protocol is normal—mature systems use different protocols for different communication patterns. Edge, internal, and async typically use different approaches.
•Match protocol to requirements—latency, browser support, streaming needs, team expertise, and consumer type should drive the decision.

What's next:

Page Complete