Messaging Systems - Learning Module

Loading content...

0/273

NATS: Lightweight, Cloud-Native

NATS: Messaging for the Modern Cloud

In the landscape of messaging systems, NATS occupies a unique position: a connective tissue designed from the ground up for cloud-native, edge, and IoT environments. Where Kafka prioritizes durability and RabbitMQ prioritizes flexibility, NATS prioritizes simplicity, performance, and operational efficiency.

Created by Derek Collison (who previously built messaging systems at TIBCO and Apcera), NATS strips messaging to its essence. The core NATS server is a single 15MB binary with zero dependencies, capable of handling millions of messages per second with sub-millisecond latency. It's messaging for environments where complexity is the enemy—distributed systems at the edge, IoT device communication, microservices sidecars, and high-frequency trading systems.

With NATS JetStream, NATS has evolved to include persistence and streaming capabilities while maintaining its lightweight philosophy—making it a compelling alternative for teams who want Kafka-like features without Kafka-like operational overhead.

What You Will Learn

By the end of this page, you will understand NATS's core messaging model, the subject-based addressing system, request-reply patterns, queue groups for load balancing, NATS JetStream for persistence and streaming, and when NATS is the optimal choice for your architecture.

Core NATS Messaging Model

Core NATS (without JetStream) operates as a "fire-and-forget" messaging system with at-most-once delivery semantics. This design choice enables extreme performance by eliminating acknowledgment overhead.

Key characteristics:

No message persistence: Messages exist only in memory; if no subscriber is connected, messages are lost
At-most-once delivery: Messages are delivered zero or one time—never duplicated, never retried
Pure pub-sub: Publishers broadcast to subjects; subscribers receive if connected
Text-based protocol: Simple, human-readable protocol over TCP

This model is ideal for scenarios where message loss is acceptable (telemetry, metrics, heartbeats) or where the application layer handles reliability.

// Go client example - Publisher
nc, _ := nats.Connect("nats://localhost:4222")
nc.Publish("updates.weather.nyc", []byte("Sunny, 72°F"))

// Subscriber
nc.Subscribe("updates.weather.*", func(msg *nats.Msg) {
    fmt.Printf("Received: %s\n", string(msg.Data))
})

nats-message-flow.txt
NATS Core Message Flow (At-Most-Once)
+=======================================================+
|                    NATS Server                        |
|  Single binary (~15MB), in-memory routing             |
+=======================================================+
           ↑                           ↓
    +-------------+              +-------------+
    |  Publisher  |              | Subscriber  |
    | Sends to:   |              | Subscribed: |
    |"orders.new" |              |"orders.new" |
    +-------------+              +-------------+
 
Timeline:
T=0: Subscriber connects, subscribes to "orders.new"
T=1: Publisher sends to "orders.new" → Delivered ✓
T=2: Subscriber disconnects
T=3: Publisher sends to "orders.new" → Lost! (no subscribers)
T=4: Subscriber reconnects, subscribes again
T=5: Publisher sends to "orders.new" → Delivered ✓
 
Key insight: Messages sent while subscriber offline are NOT queued.
This is by design for maximum performance.

Why at-most-once can be the right choice:

The lack of persistence might seem like a limitation, but it's actually a deliberate design for specific use cases:

Real-time data where old data is stale: Stock prices, sensor readings—old values are replaced, not accumulated
Heartbeats and health checks: Missing one heartbeat is fine; it's the pattern that matters
Events triggering cached lookups: The event signals "something changed," receiver fetches current state from source of truth
High-frequency trading: Sub-millisecond latency matters more than guaranteed delivery
Distributed metrics: Aggregates tolerate occasional missing data points

NATS Protocol Simplicity

NATS uses a text-based protocol you can literally type into telnet: 'PUB subject 5\r\nhello\r\n' publishes 'hello'. This simplicity means any language can implement a NATS client in hours, not weeks. It's common to find NATS clients for obscure languages and embedded systems where complex protocols can't run.

Subject-Based Addressing

NATS uses a hierarchical, dot-separated subject addressing scheme that enables powerful topic filtering without broker configuration.

Subject anatomy:

Subjects are case-sensitive strings using tokens separated by dots:

time.us.east — Three tokens
orders.new — Two tokens
metrics.host123.cpu — Three tokens

Wildcards enable flexible subscriptions:

Single-token wildcard (*): Matches exactly one token
- orders.*.shipped matches orders.123.shipped, not orders.shipped
Multi-token wildcard (>): Matches one or more tokens at the end
- orders.> matches orders.new, orders.123.shipped, orders.a.b.c.d

NATS Subject Wildcard Examples
Subscription	Matches	Does NOT Match
orders.*	orders.new, orders.cancelled	orders, orders.new.urgent
orders.>	orders.new, orders.us.east.new	orders (needs at least 1 more token)
..east	orders.us.east, metrics.dc.east	orders.east, us.east
time.us.*	time.us.east, time.us.west	time.us, time.us.east.zone1
	Everything (catch-all)	Nothing excluded

Subject design patterns:

# Hierarchical naming convention
<domain>.<entity>.<event-type>
<system>.<component>.<metric>
<region>.<service>.<action>

# Examples
orders.customer123.created
metrics.webserver01.cpu.load
us-east.payments.transaction.completed

# Subscription patterns
orders.*.created        # All order creations
metrics.webserver*.>    # All metrics from webservers
us-east.payments.>      # All payment events in US-East

No broker configuration required:

Unlike RabbitMQ where you must declare exchanges, bindings, and queues before use, NATS subjects are purely convention. Publishers send to any subject; subscribers listen on any subject. The server routes based on subscriptions without prior configuration. This enables true dynamic messaging where services can appear and disappear without broker coordination.

Subject Design Best Practices

Design subjects from general to specific (left to right). Put stable identifiers before volatile ones. Consider how subscribers will filter. Good: 'orders.us.premium.created'. Avoid: 'created.premium.us.orders'. Document your subject namespace like an API contract.

Request-Reply Pattern

Beyond pub-sub, NATS excels at request-reply patterns—synchronous-style communication built on asynchronous messaging. This enables RPC-like semantics without point-to-point connections.

How request-reply works:

Requester creates a unique reply subject (inbox)
Requester sends message with Reply-To header pointing to inbox
Responder receives message, sends reply to the inbox
Requester receives reply on inbox, optionally with timeout

// Request - blocks waiting for reply (with timeout)
msg, err := nc.Request("service.users.get", 
    []byte(`{"id": "123"}`), 
    2*time.Second)
if err != nil {
    // Timeout or error
}
fmt.Printf("User: %s\n", msg.Data)

// Responder
nc.Subscribe("service.users.get", func(msg *nats.Msg) {
    user := getUserById(msg.Data)
    msg.Respond([]byte(user.ToJSON()))
})

nats-request-reply.txt
NATS Request-Reply Flow
+------------------------------------------------------------+
|  Client A                                                   |
|  1. Creates unique inbox: _INBOX.abc123                     |
|  2. Subscribes to _INBOX.abc123                             |
|  3. Publishes to: service.users.get                         |
|     Reply-To: _INBOX.abc123                                 |
|     Body: {"id": "123"}                                     |
|  4. Waits with timeout...                                   |
+------------------------------------------------------------+
                              ↓
+------------------------------------------------------------+
|  NATS Server                                                |
|  Routes message to all service.users.get subscribers        |
+------------------------------------------------------------+
                              ↓
+------------------------------------------------------------+
|  Service B (subscribed to service.users.get)                |
|  1. Receives request                                        |
|  2. Processes: lookup user 123                              |
|  3. Publishes to: _INBOX.abc123 (Reply-To address)          |
|     Body: {"name": "Alice", "email": "..."}                 |
+------------------------------------------------------------+
                              ↓
+------------------------------------------------------------+
|  Client A                                                   |
|  Receives reply on _INBOX.abc123 before timeout             |
|  → Success!                                                 |
+------------------------------------------------------------+

Scatter-gather pattern:

Request-reply can collect responses from multiple responders:

// Send request, collect multiple replies
inbox := nats.NewInbox()
sub, _ := nc.SubscribeSync(inbox)

nc.PublishRequest("service.health.check", inbox, []byte(""))

// Collect responses with timeout
for {
    msg, err := sub.NextMsg(100 * time.Millisecond)
    if err == nats.ErrTimeout {
        break // No more responses
    }
    fmt.Printf("Response from: %s\n", msg.Data)
}

Advantages over HTTP for microservices:

Location transparency: Requester doesn't need responder's address
Load balancing built-in: When combined with queue groups
Timeout handling: Native support for request timeouts
Decoupled deployment: Services can move without client changes

Queue Groups: Built-in Load Balancing

Queue groups provide automatic load balancing across subscribers without additional configuration. When multiple subscribers join the same queue group, NATS distributes messages among them—only one member receives each message.

// Three service instances subscribed to same queue group
// Each receives ~1/3 of messages

// Instance 1
nc.QueueSubscribe("orders.process", "order-workers", handler)

// Instance 2
nc.QueueSubscribe("orders.process", "order-workers", handler)

// Instance 3
nc.QueueSubscribe("orders.process", "order-workers", handler)

Queue groups enable competing consumers:

This is NATS's equivalent of a work queue. Each message goes to exactly one subscriber in the group, distributing load automatically.

queue-groups.txt
Queue Group Distribution
+=======================================================+
|                    Publisher                          |
|  Sends 6 messages to "orders.process"                 |
+=======================================================+
                    ↓↓↓↓↓↓
+=======================================================+
|                    NATS Server                        |
|  Distributes to queue group "order-workers"           |
+=======================================================+
         /          |           \
        ↓           ↓            ↓
+------------+ +------------+ +------------+
| Worker 1   | | Worker 2   | | Worker 3   |
| Queue:     | | Queue:     | | Queue:     |
| order-     | | order-     | | order-     |
| workers    | | workers    | | workers    |
+------------+ +------------+ +------------+
| Msg 1      | | Msg 2      | | Msg 3      |
| Msg 4      | | Msg 5      | | Msg 6      |
+------------+ +------------+ +------------+
 
Each message delivered to exactly ONE worker.
Workers share load automatically.
If Worker 2 disconnects, msgs go to Workers 1 & 3.

Combining queue groups with wildcards:

// All order events, distributed across workers
nc.QueueSubscribe("orders.>", "order-processors", handler)

// Each worker processes a fraction of:
// - orders.created
// - orders.updated  
// - orders.shipped
// - orders.cancelled

Queue groups + request-reply:

Combine queue groups with request-reply for load-balanced RPC:

// Multiple service instances form queue group
nc.QueueSubscribe("service.calculate", "calculators", func(msg *nats.Msg) {
    result := expensiveCalculation(msg.Data)
    msg.Respond(result)
})

// Client request goes to ONE calculator
msg, _ := nc.Request("service.calculate", data, timeout)

Queue Groups vs Partitioning

Unlike Kafka's partitions, NATS queue groups don't guarantee message ordering or key-based affinity. Messages are distributed round-robin. For ordered processing, use JetStream with consumers that preserve order, or design your application to handle out-of-order messages.

NATS JetStream: Persistence and Streaming

While core NATS provides at-most-once delivery, NATS JetStream adds:

Message persistence: Messages stored to disk, survive restarts
At-least-once delivery: Acknowledgments and redelivery
Stream processing: Replay from any point in time
Exactly-once semantics: Message deduplication

JetStream brings Kafka-like capabilities to NATS while maintaining its operational simplicity.

Creating streams and consumers:

js, _ := nc.JetStream()

// Create a stream (like a Kafka topic)
js.AddStream(&nats.StreamConfig{
    Name:     "ORDERS",
    Subjects: []string{"orders.>"},
    Storage:  nats.FileStorage,
    Replicas: 3,
    Retention: nats.LimitsPolicy,
    MaxMsgs:  1000000,
    MaxAge:   24 * time.Hour,
})

// Publish with acknowledgment
ack, _ := js.Publish("orders.new", orderData)
fmt.Printf("Published to stream seq: %d\n", ack.Sequence)

JetStream Concepts vs Kafka Equivalents
NATS JetStream	Kafka Equivalent	Description
Stream	Topic	Named message store with retention policy
Subject filter	Partition key	Messages routed by subject pattern
Consumer	Consumer group	Named message reader with position
Durable consumer	Committed offset	Position survives disconnection
Pull consumer	Consumer poll	Client requests messages
Push consumer	No direct equivalent	Server pushes to subscriber

Consumer types:

// Pull consumer - client controls pace
consumer, _ := js.CreateConsumer("ORDERS", &nats.ConsumerConfig{
    Durable:   "order-processor",
    AckPolicy: nats.AckExplicitPolicy,
})

sub, _ := consumer.Pull(10)  // Pull batch of 10
for _, msg := range sub.Messages() {
    process(msg)
    msg.Ack()  // Explicit acknowledgment
}

// Push consumer - server delivers to subject
js.Subscribe("orders.>", func(msg *nats.Msg) {
    process(msg)
    msg.Ack()
}, nats.Durable("order-handler"))

Replay and temporal queries:

// Replay from beginning of stream
js.Subscribe("orders.>", handler, 
    nats.DeliverAll())

// Replay from specific time
js.Subscribe("orders.>", handler,
    nats.StartTime(time.Now().Add(-1*time.Hour)))

// Replay from sequence number
js.Subscribe("orders.>", handler,
    nats.StartSequence(1000))

JetStream vs Core NATS

Use Core NATS for: real-time ephemeral messaging, heartbeats, cache invalidation, pattern matching. Use JetStream for: durable message storage, exactly-once processing, stream replay, event sourcing. Both can coexist in the same NATS cluster.

Clustering, Leaf Nodes, and Deployment

NATS offers multiple topology options for different deployment scenarios, from single-node development to global multi-cluster production.

Full mesh clustering:

NATS servers form a full mesh where every server connects to every other server. Messages route automatically across the cluster.

# Server 1 config
cluster {
  listen: 0.0.0.0:6222
  routes: [
    nats://server2:6222
    nats://server3:6222
  ]
}

# Clients connect to any server; messages route automatically
nats://server1:4222  →  Messages reach subscribers on any server

Leaf nodes for edge and hybrid:

Leaf nodes connect to a central cluster without becoming full members. Ideal for:

Edge locations with limited connectivity
Security zones requiring isolation
Extending NATS into cloud/on-prem hybrid

nats-topology.txt
NATS Deployment Topologies
 
1. SINGLE CLUSTER (Full Mesh)
+---------------------------------------+
|              Data Center              |
|  +------+    +------+    +------+     |
|  |NATS 1|←→|NATS 2|←→|NATS 3|     |
|  +------+    +------+    +------+     |
|      ↑          ↑          ↑          |
|  Clients connect to any server        |
+---------------------------------------+
 
2. LEAF NODES (Edge Extension)
+------------------+     +------------------+
|   Edge Site A    |     |   Edge Site B    |
|  +------------+  |     |  +------------+  |
|  | Leaf Node  |  |     |  | Leaf Node  |  |
|  +------------+  |     |  +------------+  |
|        ↑         |     |        ↑         |
|   IoT Devices    |     |   IoT Devices    |
+--------|---------+     +--------|--------+
         |                        |
         ↓ connect to hub         ↓
+---------------------------------------+
|            Central Cluster            |
|        (Hub for all leaf nodes)       |
+---------------------------------------+
 
3. SUPERCLUSTERS (Global)
+------------------+     +------------------+
|   US-EAST Cluster|←→|   EU-WEST Cluster|
+------------------+     +------------------+
         ↑                        ↑
         ↓ gateway                ↓
+------------------+     +------------------+
|  US-WEST Cluster |←→| ASIA-PAC Cluster |
+------------------+     +------------------+

JetStream replication:

For JetStream persistence, configure replication factor:

// Stream replicated across 3 servers
js.AddStream(&nats.StreamConfig{
    Name:     "ORDERS",
    Replicas: 3,  // Raft consensus for durability
    Storage:  nats.FileStorage,
})

Deployment options:

Environment	Approach
Development	Single server binary
Kubernetes	Helm chart, NATS Operator
Cloud VMs	Clustered servers with monitoring
Edge/IoT	Leaf nodes connecting to central cluster
Synadia Cloud	Fully managed NATS service

NATS Operator for Kubernetes

The NATS Kubernetes Operator manages NATS clusters declaratively. It handles cluster formation, certificate rotation, and JetStream configuration. Combined with NATS's small footprint, it's an excellent fit for service mesh sidecar patterns.

NATS Use Cases and When to Choose NATS

NATS's unique combination of simplicity and performance makes it ideal for specific architectural patterns.

Optimal use cases:

Microservices communication: Lightweight, low-latency inter-service messaging
IoT and edge computing: Small footprint, leaf node topology
Command and control systems: Real-time commands to distributed agents
Service mesh sidecars: Minimal resource consumption per pod
Telemetry and metrics: High-frequency, lossy-acceptable data
Cloud-native applications: Kubernetes-native, container-friendly

NATS Strengths

•Ultra-low latency (<1ms typical)
•Minimal resource footprint
•Simple operations (single binary)
•Subject-based dynamic routing
•Built-in request-reply and load balancing
•Excellent Go/Rust client ecosystem

NATS Considerations

•Less mature than Kafka/RabbitMQ
•Smaller community and ecosystem
•JetStream is newer (production-ready but evolving)
•No complex routing (topic exchange equivalent)
•Less enterprise tooling

NATS vs Alternatives Decision Matrix
Requirement	NATS	Alternative
Ultra-low latency	✓ Best choice	Consider ZeroMQ if even lower needed
Simple operations	✓ Best choice	SQS if managed preferred
Complex routing	Limited	RabbitMQ
Long-term replay	JetStream (good)	Kafka (more mature)
IoT/Edge	✓ Best choice (leaf nodes)	MQTT in some cases
Enterprise integration	Limited	RabbitMQ, IBM MQ

NATS Sweet Spot

Choose NATS when you value simplicity, need sub-millisecond latency, operate in resource-constrained environments (edge/IoT), or want Kubernetes-native messaging without operational complexity. It's particularly compelling for teams building from scratch who can design around its subject-based model.

Summary: NATS

NATS represents a return to messaging fundamentals: simple, fast, and operational efficiency above all. With JetStream, it offers the best of both worlds—ephemeral and persistent messaging in one system.

Key Takeaways

•Core NATS — At-most-once, fire-and-forget messaging with extreme performance.
•Subject-based addressing — Hierarchical naming with wildcards enables flexible routing without configuration.
•Request-reply — Synchronous-style RPC built on asynchronous messaging.
•Queue groups — Automatic load balancing across competing consumers.
•JetStream — Adds persistence, streaming, and exactly-once semantics while maintaining simplicity.
•Deployment flexibility — From single binary to global superclusters with leaf nodes.

Page Complete

You now understand NATS's lightweight, high-performance messaging model and its role in cloud-native architectures. Next, we'll synthesize everything learned about Kafka, RabbitMQ, SQS, and NATS to develop a framework for choosing the right messaging system for your needs.