Loading content...
Elasticsearch is distributed by design, but distribution doesn't automatically equal scalability. A poorly-configured 100-node cluster can be slower than a well-tuned 5-node cluster. Scaling Elasticsearch effectively requires understanding the bottlenecks, applying appropriate configurations, and monitoring the right metrics.
The journey from development (single node, gigabytes of data) to production (multi-cluster, petabytes of data) involves navigating many decisions: hardware selection, shard sizing, JVM tuning, index lifecycle management, and operational automation. Each decision affects performance, reliability, and cost.
This page provides the comprehensive guide to scaling Elasticsearch, consolidating lessons from organizations running some of the world's largest Elasticsearch deployments.
By the end of this page, you will understand: capacity planning methodology; hardware selection and sizing; JVM and OS tuning; index lifecycle management; performance monitoring; and operational best practices for production Elasticsearch clusters.
Capacity planning is the foundation of reliable Elasticsearch deployments. Under-provisioning causes outages during traffic spikes; over-provisioning wastes money. The goal is right-sizing—having enough capacity for expected load plus headroom for growth and failures.
The capacity planning methodology:
1. Quantify the workload:
2. Calculate storage requirements:
3. Determine memory requirements:
4. Calculate throughput capacity:
| Resource | Guideline | Notes |
|---|---|---|
| Storage per node | 2-10TB | SSD strongly recommended |
| JVM heap | 50% of RAM, max 31GB | Above 31GB, pointer compression disabled |
| Shards per node | < 600, ideally < 200 | More shards = more overhead |
| Shard size | 10-50GB | Smaller = more parallelism, larger = less overhead |
| Data to heap ratio | < 100 shards per GB heap | For stable cluster state |
| CPU cores | 8-32 per node | More for aggregation-heavy workloads |
| Network | 10Gbps+ | For fast recovery and rebalancing |
1234567891011121314151617181920212223
Workload:- 500GB/day ingestion- 90 days retention = 45TB total data- 1 replica = 90TB storage (2 copies)- 50% overhead for indexes = 135TB storage needed Storage sizing:- If using 8TB disks at 75% utilization (6TB usable)- Need: 135TB / 6TB = 23 data nodes minimum Memory sizing:- 23 nodes × 64GB RAM = 1.5TB cluster memory- 23 nodes × 30GB heap = 690GB heap- 23 nodes × 34GB OS cache = 782GB for file cache Shard sizing:- 45TB data / 30GB per shard = 1500 primary shards- With replica = 3000 total shards- 3000 shards / 23 nodes = 130 shards per node ✓ Add headroom:- 30% for growth/spikes = 30 data nodes- 3 dedicated master nodes = 33 total nodesCapacity calculations are estimates. Before production launch, run load tests with realistic data and query patterns. Tools like Rally (Elasticsearch's benchmarking tool) simulate realistic workloads and report throughput, latency, and resource utilization.
Different node roles have different hardware requirements. Optimizing hardware by role reduces cost while improving performance.
Master nodes:
Master nodes manage cluster state but don't store data. They need:
Dedicated masters are essential for production. Use 3 (or 5 for larger clusters) to maintain quorum during failures.
Storage considerations:
SSD vs HDD: SSDs are strongly recommended for all Elasticsearch workloads. Lucene's segment-based architecture involves random I/O during searches and significant sequential writes during indexing. SSDs provide:
HDDs may be acceptable only for cold/frozen tiers where access is infrequent.
Local vs Network storage: Local storage (direct-attached) is preferred for performance. Network-attached storage (NFS, EBS gp3) adds latency and introduces shared failure domains. If using cloud block storage, provision IOPS appropriately—default IOPS is often insufficient.
Cloud instance selection:
| Provider | Master Nodes | Hot Data Nodes | Warm/Cold Data Nodes |
|---|---|---|---|
| AWS | m6i.xlarge | i3.2xlarge, i3en.2xlarge | d3.2xlarge |
| GCP | e2-medium | n2-highmem-8 + local SSD | n2-highmem-8 + pd-balanced |
| Azure | Standard_D4s_v3 | Standard_L8s_v2 | Standard_E8s_v4 |
Network bandwidth is often overlooked. During recovery or rebalancing, nodes stream segment files at maximum speed. A 1Gbps network can take hours to recover a 500GB shard. Use 10Gbps+ for production clusters. In cloud environments, choose instances with high network bandwidth.
Elasticsearch runs on the JVM, and proper JVM configuration is critical for stable performance. Additionally, operating system settings affect I/O performance and process limits.
JVM Heap sizing:
The most important JVM setting is heap size. The golden rules:
Rule 1: 50% of physical memory for heap, 50% for OS cache Elasticsearch uses both JVM heap (for data structures) and OS file cache (for Lucene segments). Both need adequate memory.
Rule 2: Never exceed 31GB heap Above 31GB, the JVM disables compressed ordinary object pointers (compressed OOPs), meaning object references use 8 bytes instead of 4. A 32GB heap actually has less usable space than 31GB.
Rule 3: Set Xms and Xmx to the same value This prevents heap resizing during operation, which can cause pauses.
123456789101112131415
# Heap size - must be same for Xms and Xmx-Xms30g-Xmx30g # G1GC is now the default (Elasticsearch 8+)# No need to specify unless customizing # Disable explicit GC calls (from application code)-XX:+DisableExplicitGC # Pre-touch memory pages at startup (faster startup)-XX:+AlwaysPreTouch # GC logging (useful for diagnosing issues)-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64mOperating system settings:
Elasticsearch requires several OS settings adjusted from defaults:
12345678910111213141516
# /etc/security/limits.confelasticsearch - nofile 65535 # Open file descriptorselasticsearch - nproc 4096 # Process/thread limit # /etc/sysctl.confvm.max_map_count = 262144 # Memory mapping limit (required!)vm.swappiness = 1 # Minimize swapping # Disable swap entirely (or use mlockall)sudo swapoff -a # For Elasticsearch user, in /etc/security/limits.confelasticsearch - memlock unlimited # In elasticsearch.yml, enable memory lockingbootstrap.memory_lock: trueWhy disable swap?
Swapping JVM memory to disk causes severe performance degradation. Garbage collection operates on heap memory; if heap pages are swapped out, GC becomes I/O-bound. A GC pause that should take 100ms can take seconds.
bootstrap.memory_lock: true tells Elasticsearch to lock its memory in RAM, preventing the OS from swapping it. This requires the memlock ulimit to be set to unlimited.
If memory locking fails (due to insufficient ulimits), Elasticsearch may still start but without memory lock. Check node info API: GET /_nodes?filter_path=**.mlockall to verify memory is locked. Unlocked memory in production risks swap-induced freezes.
For time-series data (logs, metrics, events), data has different value and access patterns over time. Recent data is queried frequently and needs fast access. Older data is rarely accessed but may need retention for compliance.
Index Lifecycle Management (ILM) automates moving data through phases based on age or size, reducing costs while maintaining appropriate performance.
ILM phases:
Hot — Active indexing and frequent queries. Use fast SSDs, high-replica count.
Warm — No new indexing, occasional queries. Reduce replicas, move to slower storage.
Cold — Rare queries, archival access. Freeze indexes, searchable snapshots.
Frozen — Very rare access. Use searchable snapshots (data in object storage).
Delete — Remove data after retention period.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
PUT /_ilm/policy/logs_policy{ "policy": { "phases": { "hot": { "min_age": "0ms", "actions": { "rollover": { "max_size": "50gb", "max_age": "1d" }, "set_priority": { "priority": 100 } } }, "warm": { "min_age": "7d", "actions": { "shrink": { "number_of_shards": 1 }, "forcemerge": { "max_num_segments": 1 }, "allocate": { "require": { "tier": "warm" }, "number_of_replicas": 0 }, "set_priority": { "priority": 50 } } }, "cold": { "min_age": "30d", "actions": { "allocate": { "require": { "tier": "cold" } }, "freeze": {}, "set_priority": { "priority": 0 } } }, "delete": { "min_age": "90d", "actions": { "delete": {} } } } }}Key ILM actions:
rollover — Create a new index when conditions met (size, age, doc count). Essential for managing shard count over time.
shrink — Reduce primary shard count. Useful for warm phase where queries are less frequent.
forcemerge — Merge segments into fewer files. Optimizes storage and query performance for static indexes.
freeze — Mark index read-only and release memory. Frozen indexes have higher query latency but minimal resource usage.
allocate — Move index to different tier nodes using allocation filtering.
delete — Remove index entirely when retention expires.
12345678910111213141516171819202122232425262728293031
// Create index template that uses ILM policyPUT /_index_template/logs_template{ "index_patterns": ["logs-*"], "template": { "settings": { "number_of_shards": 3, "number_of_replicas": 1, "index.lifecycle.name": "logs_policy", "index.lifecycle.rollover_alias": "logs" }, "mappings": { "properties": { "@timestamp": { "type": "date" }, "message": { "type": "text" } } } }} // Bootstrap the aliasPUT /logs-000001{ "aliases": { "logs": { "is_write_index": true } }} // Now write to 'logs' alias - ILM handles rollover automaticallyElasticsearch 7.9+ introduced Data Streams, which simplify time-series workflows. Data streams automatically handle rollover, naming, and alias management. For new deployments, prefer data streams over manual rollover alias patterns.
Effective monitoring is essential for maintaining healthy Elasticsearch clusters. The right metrics reveal problems before they cause outages, while wrong metrics create noise without insight.
Essential metrics to monitor:
1234567891011121314151617
// Cluster healthGET /_cluster/health // Node statisticsGET /_nodes/stats/jvm,os,fs,indices // Index statistics GET /_stats/indexing,search,merge // Thread pool status (queued/rejected)GET /_cat/thread_pool?v&h=node_name,name,active,queue,rejected // Hot threads (what's consuming CPU)GET /_nodes/hot_threads // Shard allocation explanationGET /_cluster/allocation/explainQuery performance metrics:
Search latency — P50, P90, P99 query latencies. Track trends over time.
Query rate — Queries per second. Correlate with latency changes.
Slow queries — Enable slow log to capture queries exceeding thresholds:
PUT /products/_settings
{
"index.search.slowlog.threshold.query.warn": "2s",
"index.search.slowlog.threshold.query.info": "1s"
}
Search thread pool queues — If search queue grows, queries are waiting for resources. May need more nodes or query optimization.
Indexing performance metrics:
Indexing rate — Documents per second. Track against expected throughput.
Indexing latency — Time to index documents. Increases may indicate resource contention.
Bulk rejections — If bulk queue overflows, requests are rejected. Need slower ingestion or more capacity.
Merge time — Background merges are I/O intensive. Excessive merging indicates too many segments.
Elastic provides free monitoring features in Kibana. Metricbeat can ship Elasticsearch metrics to a monitoring cluster. For production, always monitor Elasticsearch from a separate cluster—you don't want monitoring to fail when your primary cluster has issues.
Production Elasticsearch clusters encounter recurring performance patterns. Recognizing these patterns accelerates diagnosis and resolution.
Problem: Slow searches / high latency
Symptoms: Query latencies increasing, users complaining about slow results.
Common causes:
Diagnosis:
12345678910111213141516171819202122
// Check if heap is the problemGET /_nodes/stats/jvm?filter_path=**.heap_used_percent // Look for GC pressureGET /_nodes/stats/jvm?filter_path=**.gc.collectors // Enable slow query logPUT /products/_settings{ "index.search.slowlog.threshold.query.warn": "2s", "index.search.slowlog.level": "info"} // Profile a specific queryPOST /products/_search{ "profile": true, "query": { "match": { "description": "slow query" } }} // Check cache statisticsGET /_nodes/stats/indices/query_cache,request_cacheProblem: Indexing rejections / bulk failures
Symptoms: Bulk API returns 429 errors, documents not appearing.
Common causes:
Solutions:
Problem: Cluster state too large / master instability
Symptoms: Master nodes struggling, cluster state updates slow, frequent master elections.
Common causes:
Solutions:
| Symptom | Likely Cause | First Action |
|---|---|---|
| Yellow cluster | Missing replicas | Check node count, disk space |
| Red cluster | Missing primaries | Check node health, shard allocation |
| High GC time | Heap pressure | Reduce shards, add nodes |
| Slow queries | Large result sets | Profile queries, add filters |
| Bulk rejections | Thread pool saturation | Reduce bulk size, add nodes |
| Frequent master elections | Cluster state issues | Add dedicated masters, reduce indexes |
| High disk I/O | Merge storms | Reduce indexing rate, force merge old indexes |
Most Elasticsearch problems trace to insufficient resources (heap, disk, CPU) or excessive load (too many shards, too many indexes, too complex queries). Before investigating exotic causes, verify basics: is the cluster under-provisioned for its workload?
Beyond configuration nuances, operational practices determine long-term cluster reliability. These patterns emerge from running Elasticsearch at scale.
1. Always use dedicated master nodes
Master stability affects the entire cluster. Never run masters on nodes with heavy data operations. Use 3 dedicated master nodes for most clusters; 5 for larger deployments. Configure voting configuration to handle failures.
12345678910111213141516
# Master-only node configurationnode.roles: [ master ]node.name: master-${HOSTNAME} # Initial master nodes for bootstrappingcluster.initial_master_nodes: - master-1 - master-2 - master-3 # Network settingsnetwork.host: _site_discovery.seed_hosts: - master-1:9300 - master-2:9300 - master-3:93002. Implement backup and recovery
Snapshots are critical for disaster recovery. Configure automated snapshots to a repository (S3, GCS, Azure Blob, or shared filesystem).
123456789101112131415161718192021222324252627
// Register S3 repositoryPUT /_snapshot/backup_repo{ "type": "s3", "settings": { "bucket": "elasticsearch-backups", "region": "us-east-1", "base_path": "production" }} // Create snapshot lifecycle policyPUT /_slm/policy/daily-snapshots{ "schedule": "0 0 * * * ?", // Daily at midnight UTC "name": "<daily-snap-{now/d}>", "repository": "backup_repo", "config": { "indices": ["*"], "include_global_state": true }, "retention": { "expire_after": "30d", "min_count": 5, "max_count": 50 }}3. Have a rolling restart procedure
For maintenance (upgrades, configuration changes), restart nodes one at a time with proper allocation management:
12345678910111213141516171819202122232425262728293031323334
# 1. Disable shard allocation (prevent unnecessary recovery)curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{ "persistent": { "cluster.routing.allocation.enable": "primaries" }}' # 2. Stop indexing if possible (flush translog)curl -X POST "localhost:9200/_flush/synced" # 3. Stop the nodesudo systemctl stop elasticsearch # 4. Perform maintenance (upgrade, config change) # 5. Start the nodesudo systemctl start elasticsearch # 6. Wait for node to rejoincurl "localhost:9200/_cat/nodes?v" # 7. Re-enable allocationcurl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{ "persistent": { "cluster.routing.allocation.enable": null }}' # 8. Wait for green statuscurl "localhost:9200/_cluster/health?wait_for_status=green" # 9. Repeat for next nodeA backup you haven't tested is not a backup. Periodically restore to a test cluster and verify data integrity. Practice your disaster recovery runbook before you need it in a crisis.
Scaling Elasticsearch is an ongoing process of capacity planning, configuration tuning, monitoring, and operational excellence. Let's consolidate the key principles:
Module Complete:
You now have a comprehensive understanding of Elasticsearch—from distributed architecture and sharding, through mapping and Query DSL, to production scaling. This knowledge enables you to design, implement, and operate Elasticsearch deployments that power search at scale.
Elasticsearch continues to evolve rapidly. Stay current with release notes, participate in the community, and continuously refine your deployments based on real-world performance data.
Congratulations! You've completed the Elasticsearch module, covering architecture, shards/replicas, mapping/analysis, Query DSL, and scaling. You're now equipped to design and operate production Elasticsearch clusters that deliver fast, relevant search at scale.