Loading content...
The Query DSL (Domain-Specific Language) is Elasticsearch's query language—a JSON-based syntax for expressing search queries, filters, and aggregations. Unlike SQL's declarative statements, Query DSL builds searches as nested JSON structures that describe what you're looking for and how to score it.
Mastering Query DSL is essential for building effective search experiences. It's the bridge between user intent ('find running shoes under $100') and the precise instructions Elasticsearch needs to execute that search efficiently.
Query DSL is both powerful and complex. It supports full-text search, exact matching, range queries, geo-spatial search, nested queries, compound boolean logic, custom scoring, and real-time aggregations. Understanding when and how to use each capability separates working searches from excellent ones.
By the end of this page, you will understand: the distinction between queries and filters; essential query types (match, term, bool, range, nested); how scoring affects result ordering; aggregations for faceted search and analytics; and query optimization patterns for production performance.
Every clause in a Query DSL request operates in one of two contexts, and understanding this distinction is crucial for both correctness and performance.
Query context: 'How well does this document match?'
In query context, clauses contribute to the relevance score—a floating-point number that determines result ordering. Higher scores mean better matches. Query context answers not just 'does it match?' but 'how well does it match?'
Filter context: 'Does this document match? Yes or no.'
In filter context, clauses produce binary inclusion/exclusion decisions. There's no scoring—documents either pass the filter or they don't. Filters are faster because they avoid scoring calculations and can be cached.
| Aspect | Query Context | Filter Context |
|---|---|---|
| Purpose | Relevance ranking | Binary inclusion |
| Output | Yes/no + score | Yes/no only |
| Caching | Not cached | Cached (bitsets) |
| Performance | Slower (scoring) | Faster (no scoring) |
| Use when | User search text | Exact filters (price, category) |
| DSL location | query: { ... } | filter: { ... } (in bool) |
12345678910111213141516171819202122232425
GET /products/_search{ "query": { "bool": { "must": [ // QUERY CONTEXT: affects score { "match": { "description": "comfortable running shoes" } } ], "filter": [ // FILTER CONTEXT: no score, cached { "term": { "category": "footwear" } }, { "range": { "price": { "lte": 150 } } }, { "term": { "in_stock": true } } ] } }} // Result: documents matching the filters AND the text query// Ordered by: how well description matches "comfortable running shoes"// The filters don't affect ordering—only inclusionThe performance implication:
Filter caching is a major performance advantage. When you filter on category: footwear, Elasticsearch builds a bitset—a compact representation of which documents match. This bitset is cached and reused across queries.
Subsequent queries with the same filter skip the filter evaluation entirely—the cached bitset provides instant results. For high-traffic applications, filter caching can reduce query times by orders of magnitude.
Best practice: Use query context only for the parts of the query that should affect ranking (typically user-entered search terms). Use filter context for all exact criteria (category, price range, date range, boolean flags).
Many developers put everything in query context by default. This is inefficient. Ask yourself: 'Does this clause need to affect result ordering?' If not, it belongs in filter context. Filters are faster and cached.
Query DSL provides dozens of query types. In practice, a handful cover 90% of use cases. Master these before exploring exotic alternatives.
match — Full-text search
The workhorse of text search. Analyzes the query string using the field's analyzer, then searches for those terms.
12345678910111213141516171819202122232425262728293031323334353637383940
// Basic match: any term can match{ "match": { "description": "comfortable running shoes" }}// Matches: "comfortable leather shoes" (2/3 terms)// Matches: "running outdoors" (1/3 terms)// Higher score for more matching terms // Match with operator: all terms must match{ "match": { "description": { "query": "comfortable running shoes", "operator": "and" } }}// Matches: only documents with ALL three terms // Match with minimum_should_match: at least N terms{ "match": { "description": { "query": "comfortable running shoes lightweight", "minimum_should_match": "75%" // 3 of 4 terms } }} // Match with fuzziness: typo tolerance{ "match": { "description": { "query": "runnung sheos", // Typos! "fuzziness": "AUTO" } }}term — Exact matching (no analysis)
Searches for the exact value you provide. No tokenization, no lowercasing—the value must match exactly what's in the inverted index. Used with keyword fields.
123456789101112131415161718
// Exact keyword match{ "term": { "status": "published" }} // Multiple exact values (OR){ "terms": { "category": ["electronics", "computers", "accessories"] }} // IMPORTANT: term queries on text fields usually fail!// Text fields are analyzed, so "Nike" is stored as "nike"// { "term": { "brand": "Nike" } } won't match// Use: { "term": { "brand.keyword": "Nike" } }match_phrase — Phrase search
Searches for terms in exact order and position. Essential for finding specific phrases.
12345678910111213141516171819
// Exact phrase{ "match_phrase": { "title": "machine learning" }}// Matches: "Introduction to machine learning"// Does NOT match: "machine translation and learning" // With slop: allow words between{ "match_phrase": { "title": { "query": "quick fox", "slop": 1 // Allow 1 word between } }}// Matches: "the quick brown fox" (1 word between quick and fox)range — Numeric and date ranges
Filters on ranges of values. Commonly used in filter context for performance.
1234567891011121314151617181920212223242526272829303132
// Numeric range{ "range": { "price": { "gte": 50, "lte": 200 } }} // Date range{ "range": { "created_at": { "gte": "2024-01-01", "lt": "2024-02-01", "format": "yyyy-MM-dd" } }} // Relative date (date math){ "range": { "timestamp": { "gte": "now-7d/d", // 7 days ago, rounded to day "lt": "now/d" // Start of today } }} // Range operators: gt (>), gte (>=), lt (<), lte (<=)Elasticsearch's date math (now-7d, now/M, etc.) enables dynamic time ranges without application-side date calculation. Use /d to round to day boundaries, /M for months. This keeps queries consistent and cache-friendly.
The bool query is the cornerstone of complex search logic. It combines multiple clauses using boolean operators, with fine-grained control over which clauses affect scoring vs filtering.
Boolean clause types:
must — Clauses that MUST match. Contributes to score. (AND, in query context)
filter — Clauses that MUST match. Does NOT contribute to score. (AND, in filter context)
should — Clauses that SHOULD match. Matching improves score. (OR, unless minimum_should_match)
must_not — Clauses that MUST NOT match. Does NOT contribute to score. (NOT, in filter context)
123456789101112131415161718192021222324252627282930
GET /products/_search{ "query": { "bool": { "must": [ // All of these MUST match (AND) // These affect the relevance score { "match": { "name": "laptop" } } ], "filter": [ // All of these MUST match (AND) // These do NOT affect score, are cached { "term": { "brand": "apple" } }, { "range": { "price": { "lte": 2000 } } } ], "should": [ // Any of these SHOULD match (OR) // Matching improves score but isn't required { "term": { "features": "touchscreen" } }, { "term": { "features": "lightweight" } } ], "must_not": [ // None of these should match (NOT) // Excludes documents, no scoring { "term": { "refurbished": true } } ], "minimum_should_match": 1 // At least 1 should clause required } }}Nesting boolean queries:
Boolean queries can be nested to express complex logic. Each nested bool can have its own must/should/filter/must_not clauses.
123456789101112131415161718192021222324252627282930313233
// Goal: (laptop OR desktop) AND (apple OR microsoft) AND price < 1500{ "query": { "bool": { "must": [ // First condition: type is laptop OR desktop { "bool": { "should": [ { "term": { "type": "laptop" } }, { "term": { "type": "desktop" } } ], "minimum_should_match": 1 } }, // Second condition: brand is apple OR microsoft { "bool": { "should": [ { "term": { "brand": "apple" } }, { "term": { "brand": "microsoft" } } ], "minimum_should_match": 1 } } ], "filter": [ // Third condition: price constraint (no scoring needed) { "range": { "price": { "lt": 1500 } } } ] } }}Boosting clauses:
You can weight clauses differently using boost. Higher boost values increase a clause's scoring contribution.
1234567891011121314
{ "query": { "bool": { "should": [ // Title matches worth more than description matches { "match": { "title": { "query": "elasticsearch", "boost": 3 } } }, { "match": { "description": { "query": "elasticsearch", "boost": 1 } } }, // Exact phrase in title is even more valuable { "match_phrase": { "title": { "query": "elasticsearch", "boost": 5 } } } ] } }}Boost values are relative multipliers. boost: 2 doesn't mean 'twice as important'—it multiplies the clause's score contribution by 2. Start with small differences (1, 2, 3) and tune based on result quality. Extreme boosts (100+) often cause unpredictable relevance.
When users search, they typically don't know which field contains the information. A search for 'MacBook Pro' might match the product name, description, brand, or category. Multi-field queries search across multiple fields simultaneously.
multi_match — Search multiple fields
The most common approach for user-facing search boxes.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
// Basic multi_match: search multiple fields{ "multi_match": { "query": "macbook pro", "fields": ["name", "description", "brand"] }} // With field boosting{ "multi_match": { "query": "macbook pro", "fields": ["name^3", "brand^2", "description"] }}// name matches worth 3x, brand 2x, description 1x // Type: best_fields (default)// Uses score from the single best matching field{ "multi_match": { "query": "macbook pro", "fields": ["name", "description"], "type": "best_fields", "tie_breaker": 0.3 // Add 30% of other matching fields }} // Type: most_fields// Combines scores from all matching fields{ "multi_match": { "query": "macbook", "fields": ["name", "name.stemmed", "name.autocomplete"], "type": "most_fields" }} // Type: cross_fields// Treats fields as one combined field (for names split across fields){ "multi_match": { "query": "John Smith", "fields": ["first_name", "last_name"], "type": "cross_fields", "operator": "and" }}// Matches: first_name="John", last_name="Smith"| Type | Behavior | Use Case |
|---|---|---|
| best_fields | Score from best field; tie_breaker adds others | Default, most search boxes |
| most_fields | Sum scores from all matching fields | Same content in multiple analyzers |
| cross_fields | Terms can match across different fields | Person names, addresses |
| phrase | Like match_phrase across fields | Exact phrase search |
| phrase_prefix | Like match_phrase_prefix across fields | Autocomplete |
query_string — Powerful but dangerous
The query_string query supports Lucene query syntax, allowing users to express complex queries directly. While powerful, it exposes your system to syntax errors and potential abuse.
123456789101112131415161718192021222324
// User can input Lucene syntax{ "query_string": { "query": "name:laptop AND brand:apple", "default_field": "description" }} // Supported syntax:// AND, OR, NOT, +, -// Phrases: "exact phrase"// Wildcards: lap* (prefix), l?ptop (single char)// Field targeting: name:value// Ranges: price:[100 TO 500]// Boosting: term^2 // simple_query_string: safer, doesn't throw on syntax errors{ "simple_query_string": { "query": "laptop +apple -refurbished", "fields": ["name", "description"], "default_operator": "and" }}query_string throws exceptions on syntax errors. If users input unbalanced quotes or invalid syntax, your search breaks. simple_query_string is more forgiving—it ignores syntax errors and does its best. Always use simple_query_string for untrusted input.
Elasticsearch's document model is flat by default, but real data often has structure: products with reviews, orders with line items, users with multiple addresses. The nested type and nested query preserve object boundaries within arrays.
The problem with object arrays:
By default, Elasticsearch flattens object arrays. This loses the association between fields within each object.
12345678910111213141516171819202122
// Document with object arrayPUT /products/_doc/1{ "name": "Widget", "reviews": [ { "user": "alice", "rating": 5 }, { "user": "bob", "rating": 2 } ]} // What you'd expect to work:// "Find products where alice gave rating >= 4"// Query: reviews.user = alice AND reviews.rating >= 4 // But with default mapping, ES stores:// reviews.user: ["alice", "bob"]// reviews.rating: [5, 2]// The association between alice→5 and bob→2 is LOST // This query incorrectly matches:// "Where bob gave rating >= 4"// Because there exists reviews.user="bob" and reviews.rating=5The solution: nested type and nested query
Mapping the field as nested preserves object boundaries. Each nested object is indexed as a hidden separate document, maintaining field associations.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
// Create index with nested mappingPUT /products{ "mappings": { "properties": { "name": { "type": "text" }, "reviews": { "type": "nested", "properties": { "user": { "type": "keyword" }, "rating": { "type": "integer" }, "comment": { "type": "text" } } } } }} // Query with nested queryGET /products/_search{ "query": { "nested": { "path": "reviews", "query": { "bool": { "must": [ { "term": { "reviews.user": "alice" } }, { "range": { "reviews.rating": { "gte": 4 } } } ] } } } }}// Now correctly matches only if alice's rating >= 4 // Access nested scores with score_mode{ "nested": { "path": "reviews", "query": { "range": { "reviews.rating": { "gte": 4 } } }, "score_mode": "avg" // avg, sum, max, min, none }}Inner hits: returning matching nested objects
By default, nested queries return the parent document without indicating which nested objects matched. Use inner_hits to retrieve the matching nested objects.
123456789101112131415161718192021222324252627282930
{ "query": { "nested": { "path": "reviews", "query": { "range": { "reviews.rating": { "gte": 4 } } }, "inner_hits": { "size": 3, // Return up to 3 matching reviews "sort": [{ "reviews.rating": "desc" }], "_source": ["reviews.user", "reviews.rating"] } } }} // Response includes which reviews matched:{ "hits": [{ "_source": { "name": "Widget", "reviews": [...] }, "inner_hits": { "reviews": { "hits": [ { "_source": { "user": "alice", "rating": 5 } }, { "_source": { "user": "carol", "rating": 4 } } ] } } }]}Each nested object is indexed as a separate hidden document. A product with 100 reviews creates 101 documents internally. This multiplies indexing time, storage, and query complexity. Use nested sparingly—often, denormalization or separate indexes are better for high-cardinality arrays.
Aggregations compute statistics and group your data—enabling faceted navigation, analytics dashboards, and real-time metrics. They run alongside queries in a single request, avoiding round trips.
Aggregation types:
Metric aggregations — Calculate values: sum, avg, min, max, cardinality, percentiles
Bucket aggregations — Group documents: terms, date_histogram, range, filters
Pipeline aggregations — Operate on other aggregations' output: derivative, moving_avg, cumulative_sum
12345678910111213141516171819202122232425262728293031
GET /orders/_search{ "size": 0, // Don't return documents, just aggregations "aggs": { "total_revenue": { "sum": { "field": "amount" } }, "average_order": { "avg": { "field": "amount" } }, "order_count": { "value_count": { "field": "order_id" } }, "unique_customers": { "cardinality": { "field": "customer_id" } }, "amount_stats": { "stats": { "field": "amount" } }, // min, max, avg, sum, count "amount_percentiles": { "percentiles": { "field": "amount", "percents": [50, 90, 95, 99] } } }} // Response:{ "aggregations": { "total_revenue": { "value": 1250000.00 }, "average_order": { "value": 125.00 }, "order_count": { "value": 10000 }, "unique_customers": { "value": 3247 }, "amount_stats": { "count": 10000, "min": 5.0, "max": 999.0, "avg": 125.0, "sum": 1250000.0 } }}Bucket aggregations for faceted search:
Faceted search (the filter sidebar on e-commerce sites) is powered by bucket aggregations. Each bucket represents a category, and the count shows how many documents match.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
GET /products/_search{ "query": { "match": { "description": "running shoes" } }, "aggs": { "by_brand": { "terms": { "field": "brand.keyword", "size": 10 } }, "by_price_range": { "range": { "field": "price", "ranges": [ { "to": 50, "key": "budget" }, { "from": 50, "to": 100, "key": "mid" }, { "from": 100, "to": 200, "key": "premium" }, { "from": 200, "key": "luxury" } ] } }, "by_rating": { "histogram": { "field": "rating", "interval": 1 } } }} // Response includes facet counts:{ "aggregations": { "by_brand": { "buckets": [ { "key": "Nike", "doc_count": 45 }, { "key": "Adidas", "doc_count": 38 }, { "key": "Asics", "doc_count": 22 } ] }, "by_price_range": { "buckets": [ { "key": "budget", "doc_count": 15 }, { "key": "mid", "doc_count": 52 }, { "key": "premium", "doc_count": 33 } ] } }}Nested aggregations:
Aggregations can be nested—bucket aggregations contain sub-aggregations that run within each bucket.
123456789101112131415161718
{ "aggs": { "by_category": { "terms": { "field": "category.keyword" }, "aggs": { "average_price": { "avg": { "field": "price" } }, "by_brand": { "terms": { "field": "brand.keyword", "size": 3 }, "aggs": { "avg_rating": { "avg": { "field": "rating" } } } } } } }} // Result: category → avg price + top brands → avg ratingWhen you only need aggregation results (analytics dashboards, facet counts), set size: 0 to skip document retrieval entirely. This significantly improves performance by avoiding the fetch phase.
Well-structured queries can be orders of magnitude faster than naive implementations. These patterns help you write efficient queries for production workloads.
1. Filter first, then query
Filters reduce the candidate set before expensive text matching. Structure bool queries with restrictive filters before broad text queries.
1234567891011121314151617181920212223242526
// Efficient: filter narrows candidates before text matching{ "query": { "bool": { "filter": [ { "term": { "status": "active" } }, // 50% of docs { "range": { "created_at": { "gte": "now-30d" } } } // 10% remain ], "must": [ { "match": { "content": "complex search query" } } // Runs on 5% of docs ] } }} // Less efficient: text search runs on all docs, then filtered{ "query": { "bool": { "must": [ { "match": { "content": "complex search query" } }, // Runs on 100%! { "term": { "status": "active" } } // Filter after scoring ] } }}2. Avoid scripting in hot paths
Script queries are flexible but slow. They execute custom code for every matching document. Use them for offline analytics, not real-time search.
1234567891011121314
// Slow: script-based sorting{ "sort": { "_script": { "script": "doc['price'].value * doc['quantity'].value" } }} // Fast: precompute at index time// Add a 'total' field during indexing: total = price * quantity{ "sort": { "total": "desc" }}3. Profile slow queries
The profile API reveals where time is spent within queries, helping identify bottlenecks.
123456789101112131415161718192021222324252627
{ "profile": true, "query": { "bool": { "must": [{ "match": { "title": "elasticsearch" } }], "filter": [{ "range": { "date": { "gte": "2024-01-01" } } }] } }} // Response includes timing breakdown:{ "profile": { "shards": [{ "searches": [{ "query": [{ "type": "BooleanQuery", "time_in_nanos": 123456, "children": [ { "type": "TermQuery", "time_in_nanos": 45000 }, { "type": "IndexOrDocValuesQuery", "time_in_nanos": 12000 } ] }] }] }] }}Elasticsearch caches aggregation and count results per shard. Identical queries (with size: 0) on stable segments hit this cache. The cache is invalidated when shards refresh, so static indexes get better cache hit rates. Consider longer refresh intervals for analytics-heavy indexes.
Query DSL is vast, but the patterns you've learned cover the majority of production search needs. Let's consolidate the key principles:
What's next:
With Query DSL mastered, we'll explore Scaling Elasticsearch—the patterns for growing from single-node development to multi-cluster production deployments. You'll learn shard allocation strategies, capacity planning, performance tuning, and operational best practices.
You now have a comprehensive understanding of Elasticsearch Query DSL—the language for expressing search requests. From simple term queries to complex boolean aggregations, you can construct queries that find exactly what users need. Next, we examine scaling Elasticsearch for production workloads.