System Design (HLD)Service Discovery Mechanisms

Service Discovery Mechanisms

LevelIntermediate

Duration75 mins

TopicService Discovery Mechanisms

3 / 5

Service Registries: Consul, etcd, and Zookeeper

The Heart of Service Discovery

At the core of every service discovery system lies a service registry—a distributed database that maintains the inventory of available service instances. The registry must answer a deceptively simple question: "Where are the instances of service X right now?"

But the simplicity is deceiving. The registry must:

Handle high read volumes (every service request may trigger a discovery lookup)
Survive node failures without data loss
Provide consistent data across distributed clients
Scale to thousands of registered services
Respond with low latency to not bottleneck request paths

Three systems have emerged as the dominant production service registries: Consul, etcd, and Apache Zookeeper. Each has distinct design philosophies, trade-offs, and sweet spots.

What You Will Learn

By the end of this page, you will deeply understand the architecture of each registry, their consistency models and CAP theorem positioning, operational characteristics and failure modes, when to choose each for your specific requirements, and how to evaluate registries for new projects.

Apache Zookeeper: The Coordination Pioneer

Apache Zookeeper was created at Yahoo in the mid-2000s to solve coordination challenges in their distributed systems. It became a foundational component of the Hadoop ecosystem and remains widely deployed, particularly in organizations using Apache Kafka, HBase, or Solr.

Design Philosophy

Zookeeper is fundamentally a distributed coordination service, not specifically a service registry. It provides low-level primitives from which higher-level coordination patterns (including service discovery) can be built:

Hierarchical key-value namespace (like a filesystem)
Strong consistency guarantees via ZAB (Zookeeper Atomic Broadcast)
Ephemeral nodes that disappear when their creator disconnects
Watches for event notification
Sequential nodes for ordered operations

Service discovery is built on these primitives: services create ephemeral nodes under a path like /services/payment/instances/, and clients watch that path for changes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/services
  /payment-service
    /instances
      /payment-service-001 (ephemeral)
        {"host": "172.31.1.10", "port": 8080, "version": "2.3.1"}
      /payment-service-002 (ephemeral)
        {"host": "172.31.1.11", "port": 8080, "version": "2.3.1"}
  /inventory-service
    /instances
      /inventory-service-001 (ephemeral)
        {"host": "172.31.2.10", "port": 8081, "version": "1.5.0"}
  /catalog-service
    /instances
      /catalog-service-001 (ephemeral)
        {"host": "172.31.3.10", "port": 8082, "version": "3.1.0"}
      /catalog-service-002 (ephemeral)
        {"host": "172.31.3.11", "port": 8082, "version": "3.1.0"}

Architecture

┌─────────────────────────────────────────────────────────┐
│                   Zookeeper Ensemble                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │
│  │   Leader    │  │  Follower   │  │  Follower   │      │
│  │  (Writes)   │◄─┤  (Reads)    │◄─┤  (Reads)    │      │
│  │             │  │             │  │             │      │
│  └──────▲──────┘  └──────▲──────┘  └──────▲──────┘      │
│         │                │                │              │
│         │ ZAB Protocol (Consensus)        │              │
│         └────────────────┴────────────────┘              │
└─────────────────────────────────────────────────────────┘
           ▲                    ▲                    ▲
           │                    │                    │
    ┌──────┴──────┐      ┌──────┴──────┐      ┌──────┴──────┐
    │   Service   │      │   Service   │      │   Client    │
    │  Instance   │      │  Instance   │      │             │
    └─────────────┘      └─────────────┘      └─────────────┘

Zookeeper runs as an ensemble of servers (typically 3 or 5) that elect a leader. The leader handles all write operations and broadcasts them to followers using the ZAB consensus protocol. Reads can be served by any server, though by default they may return slightly stale data (for performance). Clients can request linearizable reads from the leader if strict consistency is required.

Key Characteristics:

Consensus Protocol: ZAB (Zookeeper Atomic Broadcast)
Consistency Model: Linearizable writes, sequential reads
Data Model: Hierarchical namespace with znodes
Session Semantics: Client maintains session; ephemeral nodes tied to session

Zookeeper Strengths

•Battle-tested: Deployed at massive scale (Yahoo, LinkedIn, Twitter) for over 15 years
•Strong consistency: Linearizable writes with total ordering guarantee
•Ephemeral nodes: Automatic cleanup when clients disconnect—perfect for service discovery
•Watches: Efficient event notification without polling
•Rich ecosystem: Native integration with Kafka, HBase, Solr, Hadoop
•Flexibility: Generic coordination primitives support many use cases beyond discovery

Zookeeper Challenges

•Operational complexity: Requires careful tuning, monitoring, and maintenance
•Write scalability limits: All writes go through leader—can bottleneck at high write volumes
•Java-centric: Native client is Java; other languages have varying quality clients
•No built-in health checking: Service discovery requires additional tooling (Curator, etc.)
•Session management overhead: Clients must maintain sessions with heartbeats
•Learning curve: Low-level primitives require building higher-level abstractions

Curator: Making Zookeeper Usable

Apache Curator is a high-level library built on Zookeeper that provides recipes for common patterns: service discovery, leader election, distributed locks, etc. If you use Zookeeper for service discovery, you'll likely use Curator's Service Discovery recipe rather than building on raw Zookeeper primitives.

etcd: The Kubernetes Foundation

etcd (pronounced "et-see-dee") was created by CoreOS in 2013 as a distributed key-value store for their container infrastructure. When Kubernetes adopted etcd as its data store, etcd became one of the most critical components in the cloud-native ecosystem.

Design Philosophy

etcd is designed as a simple, reliable key-value store with strong consistency. Unlike Zookeeper's hierarchical focus, etcd emphasizes:

Simple key-value operations with flat namespace
Strong consistency via Raft consensus
Watch semantics for reactive programming
Lease-based TTLs for ephemeral data
gRPC API (modern, efficient, well-documented)
Cloud-native operations (easy backup, restore, cluster membership changes)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Register a service instance with a lease (TTL)
# Grant a lease with 30-second TTL
$ etcdctl lease grant 30
lease 694d7e04c3d10f01 granted with TTL(30s)
 
# Register service instance under the lease
$ etcdctl put /services/payment-service/instances/i-abc123 \
    '{"host":"172.31.1.10","port":8080,"version":"2.3.1"}' \
    --lease=694d7e04c3d10f01
 
# Keep the lease alive (run in background)
$ etcdctl lease keep-alive 694d7e04c3d10f01
 
# Discover all instances of a service (prefix query)
$ etcdctl get /services/payment-service/instances --prefix
/services/payment-service/instances/i-abc123
{"host":"172.31.1.10","port":8080,"version":"2.3.1"}
/services/payment-service/instances/i-def456
{"host":"172.31.1.11","port":8080,"version":"2.3.1"}
 
# Watch for changes (reactive discovery)
$ etcdctl watch /services/payment-service/instances --prefix
PUT
/services/payment-service/instances/i-ghi789
{"host":"172.31.1.12","port":8080,"version":"2.3.1"}
DELETE
/services/payment-service/instances/i-abc123

Architecture

┌─────────────────────────────────────────────────────────┐
│                     etcd Cluster                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │
│  │   Leader    │  │  Follower   │  │  Follower   │      │
│  │             │◄─┤             │◄─┤             │      │
│  │ bbolt store │  │ bbolt store │  │ bbolt store │      │
│  └──────▲──────┘  └──────▲──────┘  └──────▲──────┘      │
│         │                │                │              │
│         │ Raft Protocol (Consensus)       │              │
│         └────────────────┴────────────────┘              │
└─────────────────────────────────────────────────────────┘
           ▲                    ▲                    ▲
    gRPC   │             gRPC   │             gRPC   │
    ┌──────┴──────┐      ┌──────┴──────┐      ┌──────┴──────┐
    │   Service   │      │   Service   │      │   Client    │
    │  Instance   │      │  Instance   │      │             │
    └─────────────┘      └─────────────┘      └─────────────┘

etcd uses the Raft consensus algorithm for leader election and log replication. All writes go through the leader, and a quorum of nodes must acknowledge writes before they're committed. Each node stores data in an embedded bbolt database, providing persistence.

Key Characteristics:

Consensus Protocol: Raft (more understandable than Paxos/ZAB)
Consistency Model: Linearizable (all operations)
Data Model: Flat key-value with prefix queries
Session Semantics: Lease-based TTLs (not session-based like Zookeeper)

etcd Strengths

•Simple, focused design: Key-value operations are intuitive and well-documented
•Modern protocol: gRPC API is efficient, strongly-typed, and language-agnostic
•Raft consensus: Easier to understand and debug than Paxos/ZAB
•Kubernetes ecosystem: Default storage for K8s; excellent cloud-native tooling
•Operational simplicity: Easy backup/restore, automated compaction, member management
•Watch efficiency: Multiplexed watches with revisioned history
•Strong consistency by default: No surprising stale reads

etcd Challenges

•Limited to coordination workloads: Not designed for general-purpose storage
•Size limits: Recommended max database size ~8GB; not for large datasets
•Flat namespace: No native hierarchy (must use key prefixes)
•Write throughput ceiling: All writes through leader; ~10K writes/sec typical
•No built-in service discovery features: Need to build or use external tools
•Lease management overhead: Clients must renew leases to maintain registration

etcd and Kubernetes

If you're running on Kubernetes, you're already running etcd (it stores all Kubernetes cluster state). However, using the same etcd cluster for both K8s and application service discovery is risky—etcd issues would impact K8s control plane. Consider a separate etcd cluster for application use or use Kubernetes-native discovery (Services).

HashiCorp Consul: The Full-Featured Platform

HashiCorp Consul, released in 2014, took a different approach than Zookeeper and etcd. While those systems provide generic coordination primitives, Consul is a purpose-built service networking platform that includes service discovery as a first-class feature.

Design Philosophy

Consul is designed as a complete solution for service networking:

Native service discovery: Built-in service registration, discovery, and health checking
Distributed KV store: For configuration and coordination
Service mesh: Consul Connect provides secure service-to-service communication
Multi-datacenter: First-class support for geo-distributed deployments
DNS interface: Services discoverable via standard DNS queries

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
  "service": {
    "name": "payment-service",
    "id": "payment-service-i-abc123",
    "address": "172.31.1.10",
    "port": 8080,
    "tags": ["production", "v2.3.1", "us-east-1a"],
    "meta": {
      "version": "2.3.1",
      "protocol": "grpc",
      "owner": "payments-team"
    },
    "checks": [
      {
        "id": "http-check",
        "name": "HTTP Health Check",
        "http": "http://172.31.1.10:8080/health",
        "interval": "10s",
        "timeout": "3s"
      },
      {
        "id": "tcp-check",
        "name": "TCP Port Check",
        "tcp": "172.31.1.10:8080",
        "interval": "5s",
        "timeout": "1s"
      }
    ],
    "weights": {
      "passing": 100,
      "warning": 50
    }
  }
}

Architecture

┌─────────────────────── Datacenter 1 ───────────────────────┐
│  ┌──────────────────────────────────────────────────────┐  │
│  │                  Consul Server Cluster               │  │
│  │  ┌─────────┐    ┌─────────┐    ┌─────────┐          │  │
│  │  │ Leader  │    │Follower │    │Follower │          │  │
│  │  │         │◄──►│         │◄──►│         │          │  │
│  │  └────▲────┘    └────▲────┘    └────▲────┘          │  │
│  │       │ Raft          │              │               │  │
│  └───────┼───────────────┼──────────────┼───────────────┘  │
│          │               │              │                   │
│  ┌───────┼───────────────┼──────────────┼───────────────┐  │
│  │       ▼               ▼              ▼               │  │
│  │ ┌──────────┐   ┌──────────┐   ┌──────────┐          │  │
│  │ │ Consul   │   │ Consul   │   │ Consul   │          │  │
│  │ │ Agent    │   │ Agent    │   │ Agent    │          │  │
│  │ │(Client)  │   │(Client)  │   │(Client)  │          │  │
│  │ └────┬─────┘   └────┬─────┘   └────┬─────┘          │  │
│  │      │              │              │                 │  │
│  │ ┌────▼─────┐   ┌────▼─────┐   ┌────▼─────┐          │  │
│  │ │ Service  │   │ Service  │   │ Service  │          │  │
│  │ │ Instance │   │ Instance │   │ Instance │          │  │
│  │ └──────────┘   └──────────┘   └──────────┘          │  │
│  └──────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────┘
                              ▲
                              │ WAN Gossip
                              ▼
┌─────────────────────── Datacenter 2 ───────────────────────┐
│  (Similar structure with own server cluster)               │
└────────────────────────────────────────────────────────────┘

Consul has a unique two-tier architecture:

Server nodes: Form a consensus cluster using Raft, store all state
Client agents: Run on every node with services, handle discovery/health checking, forward to servers

Within a datacenter, agents communicate via Gossip protocol (Serf). Across datacenters, only servers communicate via WAN Gossip. This architecture enables massive scale—you can have thousands of client agents without overwhelming the server cluster.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Query services via HTTP API
$ curl http://localhost:8500/v1/catalog/service/payment-service
[
  {
    "ID": "i-abc123",
    "Node": "node-1",
    "Address": "172.31.1.10",
    "ServiceName": "payment-service",
    "ServicePort": 8080,
    "ServiceTags": ["production", "v2.3.1"],
    "ServiceMeta": {"version": "2.3.1"}
  }
]
 
# Query only healthy instances
$ curl http://localhost:8500/v1/health/service/payment-service?passing=true
 
# DNS-based discovery (SRV records)
$ dig @localhost -p 8600 payment-service.service.consul SRV
;; ANSWER SECTION:
payment-service.service.consul. 0 IN SRV 1 1 8080 i-abc123.node.dc1.consul.
 
# DNS-based discovery (A records)
$ dig @localhost -p 8600 payment-service.service.consul
;; ANSWER SECTION:
payment-service.service.consul. 0 IN A 172.31.1.10
payment-service.service.consul. 0 IN A 172.31.1.11

Consul Strengths

•First-class service discovery: Purpose-built features for registration, discovery, health checking
•Built-in health checking: Active health checks without additional tooling
•DNS interface: Services discoverable via standard DNS—no client library needed
•Multi-datacenter native: First-class support for geo-distributed deployments
•Service mesh (Consul Connect): mTLS, intentions, traffic management
•Agent architecture: Scales to thousands of nodes via client agents
•Rich metadata and tags: Flexible service annotation and filtering

Consul Challenges

•Operational complexity: Agent management, cluster sizing, upgrade coordination
•Resource overhead: Agent runs on every node—memory and CPU impact
•Feature sprawl: Many features means complexity; you might not need all of them
•Licensing concerns: Some advanced features require paid HashiCorp Enterprise license
•DNS TTL challenges: Default 0 TTL can overwhelm DNS; tuning required
•Learning curve: Rich feature set takes time to understand and configure properly

Consul's Sweet Spot

Consul excels when you need service discovery beyond Kubernetes (multi-platform, VMs, bare metal, or multi-cloud), built-in health checking is valuable, DNS-based discovery simplifies integration, or you're considering service mesh but aren't on Kubernetes or don't want Istio complexity.

Head-to-Head Comparison

Let's systematically compare these three registries across the dimensions that matter for production deployments.

Service Registry Comparison Matrix
Dimension	Zookeeper	etcd	Consul
Primary Purpose	Coordination primitives	Key-value store	Service networking platform
Consensus Protocol	ZAB	Raft	Raft
Consistency Model	Linearizable writes, sequential reads	Linearizable	Linearizable (default)
Data Model	Hierarchical znodes	Flat key-value	Service catalog + KV
Native Health Checking	No	No	Yes
DNS Interface	No	No	Yes
Multi-Datacenter	Limited	Limited	First-class
Service Mesh	No	No	Yes (Consul Connect)
Typical Cluster Size	3-5 nodes	3-5 nodes	3-5 servers + many agents
Client Architecture	Direct connection	Direct connection	Local agent

Operational Characteristics Comparison
Characteristic	Zookeeper	etcd	Consul
Operational Complexity	High	Medium	Medium-High
Resource Footprint	Medium	Low	Medium (with agents)
Upgrade Difficulty	Medium	Low	Medium
Monitoring/Observability	Good (many metrics)	Excellent (Prometheus)	Excellent (built-in UI)
Documentation Quality	Good	Excellent	Excellent
Community Activity	Active (Apache)	Very Active (CNCF)	Active (HashiCorp)
Commercial Support	Confluent, Cloudera	CNCF ecosystem	HashiCorp Enterprise

Performance Considerations

Benchmark data varies by workload, but general characteristics:

Read Performance:

All three handle 10K-50K reads/sec per node depending on data size
etcd and Consul offer better read scaling via linearizable reads from any replica
Zookeeper offers fastest reads if you accept potentially stale data

Write Performance:

All limited by consensus (typically 1K-10K writes/sec)
etcd often benchmarks highest for pure write throughput
Consul's agent layer can buffer writes, but adds latency

Watch/Notification:

All support watches efficiently
Consul's blocking queries are simple but less efficient than true pushes
etcd's gRPC streaming is most efficient for high-volume watches
Zookeeper's watches are one-shot (must re-register after each event)

Benchmark Skepticism

Published benchmarks should be viewed skeptically. Performance depends heavily on workload patterns, network topology, hardware, and configuration. The differences between these systems are usually smaller than the impact of proper tuning. Test with YOUR workload before making decisions based on benchmarks.

Choosing the Right Registry

Selecting a service registry isn't primarily a performance decision—it's about fit with your ecosystem, team expertise, and requirements profile.

Choose Zookeeper When

•You're already running Kafka, HBase, or Hadoop ecosystem tools
•You need strong coordination primitives (distributed locks, leader election)
•Your organization has existing Zookeeper expertise
•You're in a Java-centric environment
•You need rock-solid consistency guarantees

Choose etcd When

•You're building on Kubernetes and want consistency with K8s components
•You value operational simplicity and modern tooling
•You need a straightforward key-value store with watches
•You prefer Raft's understandability over ZAB
•You want strong gRPC/Prometheus ecosystem integration

Choose Consul When

•Service discovery is your primary use case (not generic coordination)
•You need built-in health checking without additional tooling
•DNS-based discovery is valuable for your architecture
•You have multi-datacenter requirements
•You're considering service mesh but want something simpler than Istio
•You have a heterogeneous environment (VMs, containers, bare metal)
•You want a single solution for discovery, configuration, and service mesh

The Kubernetes Consideration

If you're running on Kubernetes, the calculus changes significantly:

Kubernetes already provides:

Service discovery via Services and DNS
Health checking via probes
Configuration via ConfigMaps/Secrets
Service mesh options (Istio, Linkerd)

When you might still need a registry:

Multi-cluster service discovery
Services running outside Kubernetes
Features Kubernetes doesn't provide (KV store, advanced service mesh)
Avoiding Kubernetes vendor lock-in

For pure Kubernetes environments, the default answer is increasingly: just use Kubernetes native discovery. External registries add operational burden without proportional benefit.

The Pragmatic Choice

If you're starting fresh with no existing registry expertise: On Kubernetes, use native Kubernetes Services. Off Kubernetes with service discovery needs, Consul is often the best fit due to its purpose-built features. If you just need a consistent KV store, etcd is simplest. Only choose Zookeeper if you're already in its ecosystem.

Operational Essentials

Running a service registry in production requires attention to several critical operational concerns.

1. Cluster Sizing

All three systems typically run 3 or 5 node clusters:

3 nodes: Tolerates 1 failure (quorum requires 2)
5 nodes: Tolerates 2 failures (quorum requires 3)
7 nodes: Rarely needed; higher latency for writes

Even numbers don't help—quorum requirements mean 4 nodes and 3 nodes tolerate the same number of failures, but 4 nodes have higher coordination overhead.

2. Hardware Recommendations

Service registries are latency-sensitive:

Fast local storage: SSD/NVMe essential; avoid network storage
Low-latency network: Registry nodes should be in same availability zone
Adequate memory: Prevent swapping; give generous heap
Dedicated resources: Don't co-locate with variable workloads

Typical Production Hardware Recommendations
Component	Zookeeper	etcd	Consul Server
CPU	2-4 cores	2-4 cores	2-4 cores
Memory	4-8 GB	2-8 GB	4-8 GB
Storage	SSD, 20-50 GB	SSD, 20-50 GB	SSD, 20-50 GB
Network	1 Gbps, low latency	1 Gbps, low latency	1 Gbps, low latency
IOPS	500+	500+	500+

3. Monitoring and Alerting

Critical metrics to track:

Cluster health: Leader presence, follower count, election frequency
Latency: Operation latency histograms (p50, p95, p99)
Throughput: Operations per second
Storage: Disk usage, compaction health
Network: Inter-node communication health
Client connections: Connection count, connection errors

Alert on:

Leader election events (might indicate instability)
Latency exceeding SLO (typically p99 > 50ms)
Disk usage approaching capacity
Cluster member failures

4. Backup and Recovery

All three systems require backup strategies:

Regular snapshots: Automated, tested, stored off-cluster
Restoration testing: Regularly verify backups are restorable
Point-in-time recovery: Understand recovery point objectives (RPO)
Disaster recovery: Plan for complete cluster loss scenarios

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Create snapshot backup
$ etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y%m%d).db
 
# Verify snapshot
$ etcdctl snapshot status /backup/etcd-snapshot-20240115.db
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 3c9cd0d7 |   152843 |       1250 |     2.1 MB |
+----------+----------+------------+------------+
 
# Restore from snapshot (on new cluster)
$ etcdctl snapshot restore /backup/etcd-snapshot-20240115.db \
    --name node1 \
    --initial-cluster node1=https://node1:2380,node2=https://node2:2380,node3=https://node3:2380 \
    --initial-cluster-token etcd-cluster-1 \
    --initial-advertise-peer-urls https://node1:2380

Registry Failure Is Critical

Service registry failure can cascade to system-wide outage. Treat your registry with the same care as your database. Invest in monitoring, alerting, runbooks, and regular failure drills. When the registry is down, your services can't find each other.

Migration Between Registries

Organizations sometimes need to migrate between registries as requirements evolve. This is a high-risk operation requiring careful planning.

Common Migration Scenarios:

Zookeeper → etcd: Often as part of Kubernetes adoption or to simplify operations
etcd → Consul: When needing multi-datacenter or service mesh features
Any → Kubernetes Native: Consolidating on platform-native discovery

Migration Strategy: Dual-Write/Dual-Read

The safest migration approach:

Phase 1: Dual-Write

Services register with BOTH old and new registry
Clients continue reading from old registry
Verify data consistency between registries

Phase 2: Dual-Read

Clients read from new registry with fallback to old
Monitor for discrepancies
Gradually increase traffic to new registry

Phase 3: Cutover

Clients read only from new registry
Services continue dual-write (safety net)
Verify stability

Phase 4: Cleanup

Stop writing to old registry
Decommission old registry

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
class MigrationDiscoveryClient {
  private primaryRegistry: ServiceRegistry;
  private fallbackRegistry: ServiceRegistry;
  private migrationPhase: 'dual-write' | 'dual-read' | 'primary-only';
  
  async discoverService(serviceName: string): Promise<ServiceInstance[]> {
    switch (this.migrationPhase) {
      case 'dual-write':
        // Still reading from old (fallback) registry
        return this.fallbackRegistry.discover(serviceName);
        
      case 'dual-read':
        // Try new registry first, fall back if needed
        try {
          const instances = await this.primaryRegistry.discover(serviceName);
          if (instances.length > 0) {
            return instances;
          }
        } catch (error) {
          this.metrics.increment('discovery.primary.failures');
        }
        // Fallback to old registry
        return this.fallbackRegistry.discover(serviceName);
        
      case 'primary-only':
        return this.primaryRegistry.discover(serviceName);
    }
  }
  
  async registerService(service: ServiceDefinition): Promise<void> {
    // Always register to primary
    await this.primaryRegistry.register(service);
    
    // Also register to fallback during migration
    if (this.migrationPhase !== 'primary-only') {
      try {
        await this.fallbackRegistry.register(service);
      } catch (error) {
        // Don't fail if fallback registration fails
        this.log.warn('Fallback registration failed', { error });
      }
    }
  }
}

Migration Risk

Registry migrations are high-risk operations. Plan for weeks of dual-running, extensive testing, and rollback capability. Never rush a registry migration—the blast radius of failure is your entire service mesh.

Summary: Service Registries

We've deeply examined the three major service registries that power production distributed systems. Let's consolidate the essential insights:

Key Takeaways

•Zookeeper is the battle-tested coordination pioneer, best for environments already in its ecosystem (Kafka, Hadoop) but requires more operational expertise.
•etcd is the modern, simple key-value store that powers Kubernetes, excellent for cloud-native environments with simpler operational characteristics.
•Consul is the purpose-built service networking platform with native health checking, DNS discovery, and multi-datacenter support.
•Kubernetes-native discovery is often sufficient—don't add registry complexity unless you have concrete requirements beyond K8s capabilities.
•Operational concerns (monitoring, backup, cluster sizing) are as important as feature selection—plan for Day 2 operations.
•Registry migrations are high-risk and should use gradual, dual-system approaches with extensive testing.

What's Next:

Now that you understand dedicated service registries, we'll explore a ubiquitous alternative: DNS-based service discovery. DNS is the original distributed naming system, and modern systems use it creatively for service discovery. In the next page, we'll examine how DNS works for discovery, its limitations, and when to use DNS versus registry-based approaches.

Page Complete

You now have comprehensive knowledge of the major service registries—Zookeeper, etcd, and Consul. You understand their architectures, trade-offs, and when to choose each. This knowledge enables you to make informed decisions about service discovery infrastructure.

3 / 5

Loading learning content...

System Design (HLD)Service Discovery Mechanisms

Service Discovery Mechanisms

LevelIntermediate

Duration75 mins

TopicService Discovery Mechanisms

3 / 5

Service Registries: Consul, etcd, and Zookeeper

The Heart of Service Discovery

But the simplicity is deceiving. The registry must:

Handle high read volumes (every service request may trigger a discovery lookup)
Survive node failures without data loss
Provide consistent data across distributed clients
Scale to thousands of registered services
Respond with low latency to not bottleneck request paths

Three systems have emerged as the dominant production service registries: Consul, etcd, and Apache Zookeeper. Each has distinct design philosophies, trade-offs, and sweet spots.

What You Will Learn

Apache Zookeeper: The Coordination Pioneer

Design Philosophy

Hierarchical key-value namespace (like a filesystem)
Strong consistency guarantees via ZAB (Zookeeper Atomic Broadcast)
Ephemeral nodes that disappear when their creator disconnects
Watches for event notification
Sequential nodes for ordered operations

Service discovery is built on these primitives: services create ephemeral nodes under a path like /services/payment/instances/, and clients watch that path for changes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/services
  /payment-service
    /instances
      /payment-service-001 (ephemeral)
        {"host": "172.31.1.10", "port": 8080, "version": "2.3.1"}
      /payment-service-002 (ephemeral)
        {"host": "172.31.1.11", "port": 8080, "version": "2.3.1"}
  /inventory-service
    /instances
      /inventory-service-001 (ephemeral)
        {"host": "172.31.2.10", "port": 8081, "version": "1.5.0"}
  /catalog-service
    /instances
      /catalog-service-001 (ephemeral)
        {"host": "172.31.3.10", "port": 8082, "version": "3.1.0"}
      /catalog-service-002 (ephemeral)
        {"host": "172.31.3.11", "port": 8082, "version": "3.1.0"}

Architecture

┌─────────────────────────────────────────────────────────┐
│                   Zookeeper Ensemble                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │
│  │   Leader    │  │  Follower   │  │  Follower   │      │
│  │  (Writes)   │◄─┤  (Reads)    │◄─┤  (Reads)    │      │
│  │             │  │             │  │             │      │
│  └──────▲──────┘  └──────▲──────┘  └──────▲──────┘      │
│         │                │                │              │
│         │ ZAB Protocol (Consensus)        │              │
│         └────────────────┴────────────────┘              │
└─────────────────────────────────────────────────────────┘
           ▲                    ▲                    ▲
           │                    │                    │
    ┌──────┴──────┐      ┌──────┴──────┐      ┌──────┴──────┐
    │   Service   │      │   Service   │      │   Client    │
    │  Instance   │      │  Instance   │      │             │
    └─────────────┘      └─────────────┘      └─────────────┘

Key Characteristics:

Consensus Protocol: ZAB (Zookeeper Atomic Broadcast)
Consistency Model: Linearizable writes, sequential reads
Data Model: Hierarchical namespace with znodes
Session Semantics: Client maintains session; ephemeral nodes tied to session

Zookeeper Strengths

•Battle-tested: Deployed at massive scale (Yahoo, LinkedIn, Twitter) for over 15 years
•Strong consistency: Linearizable writes with total ordering guarantee
•Ephemeral nodes: Automatic cleanup when clients disconnect—perfect for service discovery
•Watches: Efficient event notification without polling
•Rich ecosystem: Native integration with Kafka, HBase, Solr, Hadoop
•Flexibility: Generic coordination primitives support many use cases beyond discovery

Zookeeper Challenges

•Operational complexity: Requires careful tuning, monitoring, and maintenance
•Write scalability limits: All writes go through leader—can bottleneck at high write volumes
•Java-centric: Native client is Java; other languages have varying quality clients
•No built-in health checking: Service discovery requires additional tooling (Curator, etc.)
•Session management overhead: Clients must maintain sessions with heartbeats
•Learning curve: Low-level primitives require building higher-level abstractions

Curator: Making Zookeeper Usable

etcd: The Kubernetes Foundation

Design Philosophy

etcd is designed as a simple, reliable key-value store with strong consistency. Unlike Zookeeper's hierarchical focus, etcd emphasizes:

Simple key-value operations with flat namespace
Strong consistency via Raft consensus
Watch semantics for reactive programming
Lease-based TTLs for ephemeral data
gRPC API (modern, efficient, well-documented)
Cloud-native operations (easy backup, restore, cluster membership changes)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Register a service instance with a lease (TTL)
# Grant a lease with 30-second TTL
$ etcdctl lease grant 30
lease 694d7e04c3d10f01 granted with TTL(30s)
 
# Register service instance under the lease
$ etcdctl put /services/payment-service/instances/i-abc123 \
    '{"host":"172.31.1.10","port":8080,"version":"2.3.1"}' \
    --lease=694d7e04c3d10f01
 
# Keep the lease alive (run in background)
$ etcdctl lease keep-alive 694d7e04c3d10f01
 
# Discover all instances of a service (prefix query)
$ etcdctl get /services/payment-service/instances --prefix
/services/payment-service/instances/i-abc123
{"host":"172.31.1.10","port":8080,"version":"2.3.1"}
/services/payment-service/instances/i-def456
{"host":"172.31.1.11","port":8080,"version":"2.3.1"}
 
# Watch for changes (reactive discovery)
$ etcdctl watch /services/payment-service/instances --prefix
PUT
/services/payment-service/instances/i-ghi789
{"host":"172.31.1.12","port":8080,"version":"2.3.1"}
DELETE
/services/payment-service/instances/i-abc123

Architecture

┌─────────────────────────────────────────────────────────┐
│                     etcd Cluster                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │
│  │   Leader    │  │  Follower   │  │  Follower   │      │
│  │             │◄─┤             │◄─┤             │      │
│  │ bbolt store │  │ bbolt store │  │ bbolt store │      │
│  └──────▲──────┘  └──────▲──────┘  └──────▲──────┘      │
│         │                │                │              │
│         │ Raft Protocol (Consensus)       │              │
│         └────────────────┴────────────────┘              │
└─────────────────────────────────────────────────────────┘
           ▲                    ▲                    ▲
    gRPC   │             gRPC   │             gRPC   │
    ┌──────┴──────┐      ┌──────┴──────┐      ┌──────┴──────┐
    │   Service   │      │   Service   │      │   Client    │
    │  Instance   │      │  Instance   │      │             │
    └─────────────┘      └─────────────┘      └─────────────┘

Key Characteristics:

Consensus Protocol: Raft (more understandable than Paxos/ZAB)
Consistency Model: Linearizable (all operations)
Data Model: Flat key-value with prefix queries
Session Semantics: Lease-based TTLs (not session-based like Zookeeper)

etcd Strengths

•Simple, focused design: Key-value operations are intuitive and well-documented
•Modern protocol: gRPC API is efficient, strongly-typed, and language-agnostic
•Raft consensus: Easier to understand and debug than Paxos/ZAB
•Kubernetes ecosystem: Default storage for K8s; excellent cloud-native tooling
•Operational simplicity: Easy backup/restore, automated compaction, member management
•Watch efficiency: Multiplexed watches with revisioned history
•Strong consistency by default: No surprising stale reads

etcd Challenges

•Limited to coordination workloads: Not designed for general-purpose storage
•Size limits: Recommended max database size ~8GB; not for large datasets
•Flat namespace: No native hierarchy (must use key prefixes)
•Write throughput ceiling: All writes through leader; ~10K writes/sec typical
•No built-in service discovery features: Need to build or use external tools
•Lease management overhead: Clients must renew leases to maintain registration

etcd and Kubernetes

HashiCorp Consul: The Full-Featured Platform

Design Philosophy

Consul is designed as a complete solution for service networking:

Native service discovery: Built-in service registration, discovery, and health checking
Distributed KV store: For configuration and coordination
Service mesh: Consul Connect provides secure service-to-service communication
Multi-datacenter: First-class support for geo-distributed deployments
DNS interface: Services discoverable via standard DNS queries

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
  "service": {
    "name": "payment-service",
    "id": "payment-service-i-abc123",
    "address": "172.31.1.10",
    "port": 8080,
    "tags": ["production", "v2.3.1", "us-east-1a"],
    "meta": {
      "version": "2.3.1",
      "protocol": "grpc",
      "owner": "payments-team"
    },
    "checks": [
      {
        "id": "http-check",
        "name": "HTTP Health Check",
        "http": "http://172.31.1.10:8080/health",
        "interval": "10s",
        "timeout": "3s"
      },
      {
        "id": "tcp-check",
        "name": "TCP Port Check",
        "tcp": "172.31.1.10:8080",
        "interval": "5s",
        "timeout": "1s"
      }
    ],
    "weights": {
      "passing": 100,
      "warning": 50
    }
  }
}

Architecture

┌─────────────────────── Datacenter 1 ───────────────────────┐
│  ┌──────────────────────────────────────────────────────┐  │
│  │                  Consul Server Cluster               │  │
│  │  ┌─────────┐    ┌─────────┐    ┌─────────┐          │  │
│  │  │ Leader  │    │Follower │    │Follower │          │  │
│  │  │         │◄──►│         │◄──►│         │          │  │
│  │  └────▲────┘    └────▲────┘    └────▲────┘          │  │
│  │       │ Raft          │              │               │  │
│  └───────┼───────────────┼──────────────┼───────────────┘  │
│          │               │              │                   │
│  ┌───────┼───────────────┼──────────────┼───────────────┐  │
│  │       ▼               ▼              ▼               │  │
│  │ ┌──────────┐   ┌──────────┐   ┌──────────┐          │  │
│  │ │ Consul   │   │ Consul   │   │ Consul   │          │  │
│  │ │ Agent    │   │ Agent    │   │ Agent    │          │  │
│  │ │(Client)  │   │(Client)  │   │(Client)  │          │  │
│  │ └────┬─────┘   └────┬─────┘   └────┬─────┘          │  │
│  │      │              │              │                 │  │
│  │ ┌────▼─────┐   ┌────▼─────┐   ┌────▼─────┐          │  │
│  │ │ Service  │   │ Service  │   │ Service  │          │  │
│  │ │ Instance │   │ Instance │   │ Instance │          │  │
│  │ └──────────┘   └──────────┘   └──────────┘          │  │
│  └──────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────┘
                              ▲
                              │ WAN Gossip
                              ▼
┌─────────────────────── Datacenter 2 ───────────────────────┐
│  (Similar structure with own server cluster)               │
└────────────────────────────────────────────────────────────┘

Consul has a unique two-tier architecture:

Server nodes: Form a consensus cluster using Raft, store all state
Client agents: Run on every node with services, handle discovery/health checking, forward to servers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Query services via HTTP API
$ curl http://localhost:8500/v1/catalog/service/payment-service
[
  {
    "ID": "i-abc123",
    "Node": "node-1",
    "Address": "172.31.1.10",
    "ServiceName": "payment-service",
    "ServicePort": 8080,
    "ServiceTags": ["production", "v2.3.1"],
    "ServiceMeta": {"version": "2.3.1"}
  }
]
 
# Query only healthy instances
$ curl http://localhost:8500/v1/health/service/payment-service?passing=true
 
# DNS-based discovery (SRV records)
$ dig @localhost -p 8600 payment-service.service.consul SRV
;; ANSWER SECTION:
payment-service.service.consul. 0 IN SRV 1 1 8080 i-abc123.node.dc1.consul.
 
# DNS-based discovery (A records)
$ dig @localhost -p 8600 payment-service.service.consul
;; ANSWER SECTION:
payment-service.service.consul. 0 IN A 172.31.1.10
payment-service.service.consul. 0 IN A 172.31.1.11

Consul Strengths

•First-class service discovery: Purpose-built features for registration, discovery, health checking
•Built-in health checking: Active health checks without additional tooling
•DNS interface: Services discoverable via standard DNS—no client library needed
•Multi-datacenter native: First-class support for geo-distributed deployments
•Service mesh (Consul Connect): mTLS, intentions, traffic management
•Agent architecture: Scales to thousands of nodes via client agents
•Rich metadata and tags: Flexible service annotation and filtering

Consul Challenges

•Operational complexity: Agent management, cluster sizing, upgrade coordination
•Resource overhead: Agent runs on every node—memory and CPU impact
•Feature sprawl: Many features means complexity; you might not need all of them
•Licensing concerns: Some advanced features require paid HashiCorp Enterprise license
•DNS TTL challenges: Default 0 TTL can overwhelm DNS; tuning required
•Learning curve: Rich feature set takes time to understand and configure properly

Consul's Sweet Spot

Head-to-Head Comparison

Let's systematically compare these three registries across the dimensions that matter for production deployments.

Service Registry Comparison Matrix
Dimension	Zookeeper	etcd	Consul
Primary Purpose	Coordination primitives	Key-value store	Service networking platform
Consensus Protocol	ZAB	Raft	Raft
Consistency Model	Linearizable writes, sequential reads	Linearizable	Linearizable (default)
Data Model	Hierarchical znodes	Flat key-value	Service catalog + KV
Native Health Checking	No	No	Yes
DNS Interface	No	No	Yes
Multi-Datacenter	Limited	Limited	First-class
Service Mesh	No	No	Yes (Consul Connect)
Typical Cluster Size	3-5 nodes	3-5 nodes	3-5 servers + many agents
Client Architecture	Direct connection	Direct connection	Local agent

Operational Characteristics Comparison
Characteristic	Zookeeper	etcd	Consul
Operational Complexity	High	Medium	Medium-High
Resource Footprint	Medium	Low	Medium (with agents)
Upgrade Difficulty	Medium	Low	Medium
Monitoring/Observability	Good (many metrics)	Excellent (Prometheus)	Excellent (built-in UI)
Documentation Quality	Good	Excellent	Excellent
Community Activity	Active (Apache)	Very Active (CNCF)	Active (HashiCorp)
Commercial Support	Confluent, Cloudera	CNCF ecosystem	HashiCorp Enterprise

Performance Considerations

Benchmark data varies by workload, but general characteristics:

Read Performance:

All three handle 10K-50K reads/sec per node depending on data size
etcd and Consul offer better read scaling via linearizable reads from any replica
Zookeeper offers fastest reads if you accept potentially stale data

Write Performance:

All limited by consensus (typically 1K-10K writes/sec)
etcd often benchmarks highest for pure write throughput
Consul's agent layer can buffer writes, but adds latency

Watch/Notification:

All support watches efficiently
Consul's blocking queries are simple but less efficient than true pushes
etcd's gRPC streaming is most efficient for high-volume watches
Zookeeper's watches are one-shot (must re-register after each event)

Benchmark Skepticism

Choosing the Right Registry

Selecting a service registry isn't primarily a performance decision—it's about fit with your ecosystem, team expertise, and requirements profile.

Choose Zookeeper When

•You're already running Kafka, HBase, or Hadoop ecosystem tools
•You need strong coordination primitives (distributed locks, leader election)
•Your organization has existing Zookeeper expertise
•You're in a Java-centric environment
•You need rock-solid consistency guarantees

Choose etcd When

•You're building on Kubernetes and want consistency with K8s components
•You value operational simplicity and modern tooling
•You need a straightforward key-value store with watches
•You prefer Raft's understandability over ZAB
•You want strong gRPC/Prometheus ecosystem integration

Choose Consul When

•Service discovery is your primary use case (not generic coordination)
•You need built-in health checking without additional tooling
•DNS-based discovery is valuable for your architecture
•You have multi-datacenter requirements
•You're considering service mesh but want something simpler than Istio
•You have a heterogeneous environment (VMs, containers, bare metal)
•You want a single solution for discovery, configuration, and service mesh

The Kubernetes Consideration

If you're running on Kubernetes, the calculus changes significantly:

Kubernetes already provides:

Service discovery via Services and DNS
Health checking via probes
Configuration via ConfigMaps/Secrets
Service mesh options (Istio, Linkerd)

When you might still need a registry:

Multi-cluster service discovery
Services running outside Kubernetes
Features Kubernetes doesn't provide (KV store, advanced service mesh)
Avoiding Kubernetes vendor lock-in

For pure Kubernetes environments, the default answer is increasingly: just use Kubernetes native discovery. External registries add operational burden without proportional benefit.

The Pragmatic Choice

Operational Essentials

Running a service registry in production requires attention to several critical operational concerns.

1. Cluster Sizing

All three systems typically run 3 or 5 node clusters:

3 nodes: Tolerates 1 failure (quorum requires 2)
5 nodes: Tolerates 2 failures (quorum requires 3)
7 nodes: Rarely needed; higher latency for writes

Even numbers don't help—quorum requirements mean 4 nodes and 3 nodes tolerate the same number of failures, but 4 nodes have higher coordination overhead.

2. Hardware Recommendations

Service registries are latency-sensitive:

Fast local storage: SSD/NVMe essential; avoid network storage
Low-latency network: Registry nodes should be in same availability zone
Adequate memory: Prevent swapping; give generous heap
Dedicated resources: Don't co-locate with variable workloads

Typical Production Hardware Recommendations
Component	Zookeeper	etcd	Consul Server
CPU	2-4 cores	2-4 cores	2-4 cores
Memory	4-8 GB	2-8 GB	4-8 GB
Storage	SSD, 20-50 GB	SSD, 20-50 GB	SSD, 20-50 GB
Network	1 Gbps, low latency	1 Gbps, low latency	1 Gbps, low latency
IOPS	500+	500+	500+

3. Monitoring and Alerting

Critical metrics to track:

Cluster health: Leader presence, follower count, election frequency
Latency: Operation latency histograms (p50, p95, p99)
Throughput: Operations per second
Storage: Disk usage, compaction health
Network: Inter-node communication health
Client connections: Connection count, connection errors

Alert on:

Leader election events (might indicate instability)
Latency exceeding SLO (typically p99 > 50ms)
Disk usage approaching capacity
Cluster member failures

4. Backup and Recovery

All three systems require backup strategies:

Regular snapshots: Automated, tested, stored off-cluster
Restoration testing: Regularly verify backups are restorable
Point-in-time recovery: Understand recovery point objectives (RPO)
Disaster recovery: Plan for complete cluster loss scenarios

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Create snapshot backup
$ etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y%m%d).db
 
# Verify snapshot
$ etcdctl snapshot status /backup/etcd-snapshot-20240115.db
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 3c9cd0d7 |   152843 |       1250 |     2.1 MB |
+----------+----------+------------+------------+
 
# Restore from snapshot (on new cluster)
$ etcdctl snapshot restore /backup/etcd-snapshot-20240115.db \
    --name node1 \
    --initial-cluster node1=https://node1:2380,node2=https://node2:2380,node3=https://node3:2380 \
    --initial-cluster-token etcd-cluster-1 \
    --initial-advertise-peer-urls https://node1:2380

Registry Failure Is Critical

Migration Between Registries

Organizations sometimes need to migrate between registries as requirements evolve. This is a high-risk operation requiring careful planning.

Common Migration Scenarios:

Zookeeper → etcd: Often as part of Kubernetes adoption or to simplify operations
etcd → Consul: When needing multi-datacenter or service mesh features
Any → Kubernetes Native: Consolidating on platform-native discovery

Migration Strategy: Dual-Write/Dual-Read

The safest migration approach:

Phase 1: Dual-Write

Services register with BOTH old and new registry
Clients continue reading from old registry
Verify data consistency between registries

Phase 2: Dual-Read

Clients read from new registry with fallback to old
Monitor for discrepancies
Gradually increase traffic to new registry

Phase 3: Cutover

Clients read only from new registry
Services continue dual-write (safety net)
Verify stability

Phase 4: Cleanup

Stop writing to old registry
Decommission old registry

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
class MigrationDiscoveryClient {
  private primaryRegistry: ServiceRegistry;
  private fallbackRegistry: ServiceRegistry;
  private migrationPhase: 'dual-write' | 'dual-read' | 'primary-only';
  
  async discoverService(serviceName: string): Promise<ServiceInstance[]> {
    switch (this.migrationPhase) {
      case 'dual-write':
        // Still reading from old (fallback) registry
        return this.fallbackRegistry.discover(serviceName);
        
      case 'dual-read':
        // Try new registry first, fall back if needed
        try {
          const instances = await this.primaryRegistry.discover(serviceName);
          if (instances.length > 0) {
            return instances;
          }
        } catch (error) {
          this.metrics.increment('discovery.primary.failures');
        }
        // Fallback to old registry
        return this.fallbackRegistry.discover(serviceName);
        
      case 'primary-only':
        return this.primaryRegistry.discover(serviceName);
    }
  }
  
  async registerService(service: ServiceDefinition): Promise<void> {
    // Always register to primary
    await this.primaryRegistry.register(service);
    
    // Also register to fallback during migration
    if (this.migrationPhase !== 'primary-only') {
      try {
        await this.fallbackRegistry.register(service);
      } catch (error) {
        // Don't fail if fallback registration fails
        this.log.warn('Fallback registration failed', { error });
      }
    }
  }
}

Migration Risk

Summary: Service Registries

We've deeply examined the three major service registries that power production distributed systems. Let's consolidate the essential insights:

Key Takeaways

•Zookeeper is the battle-tested coordination pioneer, best for environments already in its ecosystem (Kafka, Hadoop) but requires more operational expertise.
•etcd is the modern, simple key-value store that powers Kubernetes, excellent for cloud-native environments with simpler operational characteristics.
•Consul is the purpose-built service networking platform with native health checking, DNS discovery, and multi-datacenter support.
•Kubernetes-native discovery is often sufficient—don't add registry complexity unless you have concrete requirements beyond K8s capabilities.
•Operational concerns (monitoring, backup, cluster sizing) are as important as feature selection—plan for Day 2 operations.
•Registry migrations are high-risk and should use gradual, dual-system approaches with extensive testing.

What's Next:

Page Complete

3 / 5