System Design (HLD)Search Relevance Tuning

Search Relevance Tuning

LevelAdvanced

Duration75 mins

TopicSearch Relevance Tuning

2 / 5

Boosting Fields and Queries: Controlling Relevance

The Art of Tuning Relevance

Understanding relevance factors is only the first step. The real power comes from boosting—the ability to adjust the relative importance of different signals to achieve your specific relevance goals.

Consider an e-commerce search: a match in the product title should count more than a match in the description. A match on an exact SKU should rank highest of all. Recent products might deserve a freshness boost. Products with high ratings might need an authority boost.

Boosting is how we translate these business requirements into ranking behavior. It's the primary tool search engineers use to tune relevance without retraining machine learning models or restructuring indexes.

This page provides a comprehensive guide to boosting strategies—from simple field weights to sophisticated function scores that incorporate arbitrary business logic.

What You Will Learn

By the end of this page, you will understand the mathematics of boosting, how to configure field-level and query-level boosts in search engines like Elasticsearch, the trade-offs between different boosting approaches, and best practices for implementing business logic through boost functions.

The Mathematics of Boosting

At its core, boosting is score multiplication (or addition). When we "boost" a field by a factor of 2, we're multiplying the contribution of that field to the final score by 2. When we "boost" a document based on recency, we're multiplying its base score by a freshness factor.

The basic boosting equation:

Final_Score = Base_Score × Boost_Factor

Or with multiple boosts:

Final_Score = Base_Score × Boost_1 × Boost_2 × ... × Boost_n

Why multiplication (usually)?

Multiplicative boosts preserve the relative ordering of documents within a boosted set. If document A scores 10 and document B scores 5 before boosting, a 2x boost makes them 20 and 10—A is still ranked higher.

Additive boosts can flip rankings, which is sometimes desirable but harder to control:

A: 10 + 5 (boost) = 15
B: 5 + 15 (boost) = 20  // B now ranks higher

The danger of unbounded boosts:

Naive boosting can destroy result quality. If you boost one factor by 1000x, it will dominate everything else, regardless of how irrelevant the document might be on other dimensions. Effective boosting requires normalization and careful calibration.

boost_mathematics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
import math
from dataclasses import dataclass
from typing import List, Dict, Optional
from enum import Enum
 
class BoostMode(Enum):
    """How to combine boost with base score."""
    MULTIPLY = "multiply"    # score * boost (most common)
    ADD = "add"              # score + boost
    REPLACE = "replace"      # boost (ignore base score)
    MAX = "max"              # max(score, boost)
    MIN = "min"              # min(score, boost)
    AVG = "avg"              # (score + boost) / 2
 
class ScoreMode(Enum):
    """How to combine multiple boosts."""
    MULTIPLY = "multiply"    # boost1 * boost2 * ...
    SUM = "sum"              # boost1 + boost2 + ...
    AVG = "avg"              # average of boosts
    MAX = "max"              # maximum boost
    MIN = "min"              # minimum boost
    FIRST = "first"          # first matching boost
 
@dataclass
class BoostConfig:
    """Configuration for a single boost factor."""
    name: str
    value: float
    mode: BoostMode = BoostMode.MULTIPLY
    
    # Normalization parameters
    min_value: Optional[float] = None  # Clamp boost to minimum
    max_value: Optional[float] = None  # Clamp boost to maximum
    
    def normalize(self, boost: float) -> float:
        """Apply normalization constraints to boost value."""
        if self.min_value is not None:
            boost = max(self.min_value, boost)
        if self.max_value is not None:
            boost = min(self.max_value, boost)
        return boost
 
class BoostCalculator:
    """
    Demonstrates different boosting strategies and their effects.
    """
    
    @staticmethod
    def multiplicative_boost(base_score: float, boost_factor: float) -> float:
        """
        Standard multiplicative boost.
        Preserves relative ordering within boosted set.
        """
        return base_score * boost_factor
    
    @staticmethod
    def additive_boost(base_score: float, boost_amount: float) -> float:
        """
        Additive boost, Can flip rankings.
        Useful when you want absolute priority for certain signals.
        """
        return base_score + boost_amount
    
    @staticmethod
    def logarithmic_boost(base_score: float, factor: float, 
                          log_base: float = math.e) -> float:
        """
        Log-dampened boost. Prevents extreme boosts from dominating.
        Useful for popularity signals that span orders of magnitude.
        
        Example: page views range from 10 to 10,000,000
                 Raw multiplicative boost would make high-view pages
                 dominate regardless of relevance.
                 Log dampening: log(10) = 2.3, log(10M) = 16.1
        """
        return base_score * (1 + factor * math.log(1 + base_score, log_base))
    
    @staticmethod
    def saturation_boost(base_score: float, boost_value: float,
                        pivot: float = 1.0) -> float:
        """
        Sigmoid-like saturation boost.
        Provides diminishing returns for very large boost values.
        
        Formula: score * (boost / (boost + pivot))
        
        When boost >> pivot: factor approaches 1.0
        When boost = pivot: factor = 0.5
        When boost << pivot: factor approaches 0
        """
        boost_factor = boost_value / (boost_value + pivot)
        return base_score * boost_factor
 
 
def combine_multiple_boosts(base_score: float, 
                            boosts: List[BoostConfig],
                            score_mode: ScoreMode = ScoreMode.MULTIPLY) -> float:
    """
    Combine multiple boost factors according to score mode.
    
    Example use case: E-commerce search with multiple boosts
    - Field boost: title match = 2.0
    - Freshness boost: 0.8 - 1.2 based on age
    - Popularity boost: 0.9 - 1.5 based on sales
    - Quality boost: 0.5 - 1.0 based on rating
    """
    if not boosts:
        return base_score
    
    boost_values = [b.normalize(b.value) for b in boosts]
    
    if score_mode == ScoreMode.MULTIPLY:
        combined = 1.0
        for bv in boost_values:
            combined *= bv
            
    elif score_mode == ScoreMode.SUM:
        combined = sum(boost_values)
        
    elif score_mode == ScoreMode.AVG:
        combined = sum(boost_values) / len(boost_values)
        
    elif score_mode == ScoreMode.MAX:
        combined = max(boost_values)
        
    elif score_mode == ScoreMode.MIN:
        combined = min(boost_values)
        
    elif score_mode == ScoreMode.FIRST:
        combined = boost_values[0] if boost_values else 1.0
        
    else:
        combined = 1.0
    
    return base_score * combined
 
 
def boost_impact_demonstration():
    """
    Demonstrate how different boost strategies affect ranking.
    """
    # Scenario: 3 documents with different base scores
    documents = [
        {"id": "A", "base_score": 10.0, "popularity": 1000, "freshness": 0.9},
        {"id": "B", "base_score": 8.0, "popularity": 5000, "freshness": 0.6},
        {"id": "C", "base_score": 5.0, "popularity": 50000, "freshness": 1.0},
    ]
    
    print("Base ranking: A (10.0) > B (8.0) > C (5.0)")
    print()
    
    # Apply different boost strategies
    strategies = [
        ("Raw multiplicative popularity", lambda d: d["base_score"] * d["popularity"]),
        ("Log popularity", lambda d: d["base_score"] * math.log10(d["popularity"])),
        ("Sqrt popularity", lambda d: d["base_score"] * math.sqrt(d["popularity"])),
        ("Saturation popularity", lambda d: BoostCalculator.saturation_boost(
            d["base_score"], d["popularity"], pivot=10000)),
        ("Freshness only", lambda d: d["base_score"] * d["freshness"]),
        ("Combined (log pop × freshness)", lambda d: d["base_score"] * 
            math.log10(d["popularity"]) * d["freshness"]),
    ]
    
    for name, scorer in strategies:
        scored = [(d["id"], scorer(d)) for d in documents]
        scored.sort(key=lambda x: -x[1])
        ranking = " > ".join([f'{d[0]} ({d[1]:.1f})' for d in scored])
        print(f"{name}:")
        print(f"  {ranking}")
        print()

Field-Level Boosting

Documents typically have multiple fields: title, body, tags, categories, metadata. Not all fields are equally important for relevance. A match in the title is usually a stronger signal than a match somewhere in the body text.

Field boost fundamentals:

Field boosting assigns different importance weights to different fields. When computing relevance scores, matches in higher-weighted fields contribute more to the total score.

Common field boost patterns:

Typical Field Boost Values by Domain
Domain	Field	Typical Boost	Rationale
E-commerce	Product SKU	10-50x	Exact match overwhelmingly relevant
E-commerce	Product Name	3-5x	Primary identifier, high signal
E-commerce	Brand	2-3x	Strong navigational intent
E-commerce	Description	1x (baseline)	Useful but noisy
E-commerce	Reviews	0.5x	Tangentially relevant content
Blog/CMS	Title	5-10x	Summary of content
Blog/CMS	H1/H2 Headings	2-3x	Section summaries
Blog/CMS	Body	1x	Main content
Blog/CMS	Tags/Categories	3-4x	Explicit classification
Documentation	Page Title	5x	Navigational anchor
Documentation	Code Examples	2-3x	High-value technical content
Documentation	Body Text	1x	Explanatory content

field_boosting_elasticsearch.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
  "query": {
    "multi_match": {
      "query": "wireless headphones",
      "type": "best_fields",
      "fields": [
        "sku^50",
        "product_name^5",
        "brand^3",
        "category^2",
        "description^1",
        "reviews^0.5"
      ],
      "tie_breaker": 0.3
    }
  }
}
 
// Explanation of parameters:
//
// "type": "best_fields"
//   - Uses the MAXIMUM score from any single field
//   - Good when you expect the match to be in ONE relevant field
//   - Other types:
//     - "most_fields": SUM scores from all matching fields
//     - "cross_fields": Treats fields as one big field
//     - "phrase": Phrase match across fields
//     - "phrase_prefix": Typeahead-style matching
//
// Field boost syntax: "field_name^boost_value"
//   - sku^50: SKU matches are 50x more important than description
//   - reviews^0.5: Review matches are LESS important than description
//
// "tie_breaker": 0.3
//   - When using best_fields, secondary field matches contribute
//     30% of their score (prevents ignoring partial matches)
//   - 0.0 = only best field counts
//   - 1.0 = sum all field scores (defeats purpose of best_fields)

The N-Gram Boost Trap

When using n-gram or edge-n-gram fields for partial matching, be careful with boosting. These fields match many more documents than exact match fields. A 10x boost on an n-gram field can flood results with poor partial matches. Consider using separate boost values for your ngram subfields, typically lower than the parent field.

Query-Time Boosting

Field boosts are configured in the query and apply uniformly to all searches. Query-time boosts are dynamic—they can vary based on the query, user, time, or other runtime context.

Use cases for query-time boosting:

Merchandising: Boost specific products during promotions
Personalization: Boost brands a user has previously purchased
Seasonal relevance: Boost winter gear in December
Inventory management: Boost in-stock items over out-of-stock
Business rules: Boost higher-margin products

The boosting query in Elasticsearch:

query_time_boosting.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
  "query": {
    "boosting": {
      "positive": {
        "multi_match": {
          "query": "winter jacket",
          "fields": ["product_name^3", "description"]
        }
      },
      "negative": {
        "term": {
          "out_of_stock": true
        }
      },
      "negative_boost": 0.2
    }
  }
}
 
// The boosting query has two parts:
// 1. "positive": Documents matching this get normal scores
// 2. "negative": Documents ALSO matching this get penalized
//
// "negative_boost": 0.2 means:
//   - Out-of-stock items have their scores multiplied by 0.2
//   - They're not excluded, just demoted
//   - Good for soft filtering (prefer in-stock, show out-of-stock last)

Building dynamic boost clauses programmatically:

In production, boost clauses are often constructed dynamically based on runtime context:

dynamic_boost_builder.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
from dataclasses import dataclass, field
from typing import List, Dict, Any, Optional
from datetime import datetime
import json
 
@dataclass
class UserContext:
    """User context for personalized boosting."""
    user_id: Optional[str] = None
    location: Optional[str] = None
    preferred_brands: List[str] = field(default_factory=list)
    purchase_history_categories: List[str] = field(default_factory=list)
    price_sensitivity: str = "medium"  # low, medium, high
    is_premium_member: bool = False
 
@dataclass
class SearchContext:
    """Search-time context for dynamic boosting."""
    current_time: datetime = field(default_factory=datetime.now)
    is_holiday_season: bool = False
    active_promotions: List[str] = field(default_factory=list)
    inventory_priority: str = "balanced"  # in_stock_first, balanced, include_all
 
class DynamicBoostBuilder:
    """
    Constructs dynamic query-time boosts based on context.
    
    This pattern separates boost logic from query construction,
    making it testable and configurable.
    """
    
    def __init__(self, user_ctx: UserContext, search_ctx: SearchContext):
        self.user = user_ctx
        self.search = search_ctx
        self.should_clauses: List[Dict] = []
    
    def add_brand_affinity_boosts(self) -> 'DynamicBoostBuilder':
        """Boost brands the user has shown preference for."""
        for brand in self.user.preferred_brands:
            self.should_clauses.append({
                "term": {
                    "brand.keyword": {
                        "value": brand,
                        "boost": 1.8  # Significant but not overwhelming
                    }
                }
            })
        return self
    
    def add_category_affinity_boosts(self) -> 'DynamicBoostBuilder':
        """Boost categories from user's purchase history."""
        for category in self.user.purchase_history_categories[:5]:  # Limit
            self.should_clauses.append({
                "term": {
                    "category": {
                        "value": category,
                        "boost": 1.3
                    }
                }
            })
        return self
    
    def add_promotion_boosts(self) -> 'DynamicBoostBuilder':
        """Boost actively promoted products."""
        if self.search.active_promotions:
            self.should_clauses.append({
                "terms": {
                    "promotion_id": self.search.active_promotions,
                    "boost": 2.0
                }
            })
        return self
    
    def add_inventory_boosts(self) -> 'DynamicBoostBuilder':
        """Boost/penalize based on inventory status."""
        if self.search.inventory_priority == "in_stock_first":
            self.should_clauses.append({
                "term": {
                    "in_stock": {
                        "value": True,
                        "boost": 2.0
                    }
                }
            })
            # Optionally add negative boost for out of stock
        return self
    
    def add_freshness_boost(self, field: str = "created_at", 
                           scale: str = "30d",
                           boost: float = 1.5) -> 'DynamicBoostBuilder':
        """Add time-decay boost for recent items."""
        self.should_clauses.append({
            "function_score": {
                "functions": [{
                    "gauss": {
                        field: {
                            "origin": "now",
                            "scale": scale,
                            "decay": 0.5
                        }
                    },
                    "weight": boost
                }],
                "boost_mode": "multiply"
            }
        })
        return self
    
    def add_price_sensitivity_boost(self) -> 'DynamicBoostBuilder':
        """Boost based on user's price sensitivity."""
        if self.user.price_sensitivity == "high":
            # Price-sensitive users: boost lower prices
            self.should_clauses.append({
                "function_score": {
                    "functions": [{
                        "script_score": {
                            "script": {
                                "source": """
                                    double price = doc['price'].value;
                                    double maxPrice = 1000;
                                    return 1 + ((maxPrice - price) / maxPrice) * 0.5;
                                """
                            }
                        }
                    }],
                    "boost_mode": "multiply"
                }
            })
        return self
    
    def add_premium_member_boosts(self) -> 'DynamicBoostBuilder':
        """Premium members see premium products boosted."""
        if self.user.is_premium_member:
            self.should_clauses.append({
                "term": {
                    "is_premium_product": {
                        "value": True,
                        "boost": 1.4
                    }
                }
            })
        return self
    
    def build(self) -> List[Dict]:
        """Return the constructed boost clauses."""
        return self.should_clauses
    
    def build_complete_query(self, base_query: Dict) -> Dict:
        """Wrap base query with dynamic boosts."""
        if not self.should_clauses:
            return base_query
        
        return {
            "bool": {
                "must": base_query,
                "should": self.should_clauses
            }
        }
 
 
# Example usage
def create_personalized_search_query(
    query_text: str,
    user: UserContext,
    search: SearchContext
) -> Dict:
    """
    Build a fully personalized search query.
    """
    # Base text query
    base_query = {
        "multi_match": {
            "query": query_text,
            "fields": ["product_name^5", "brand^3", "description"],
            "type": "best_fields",
            "tie_breaker": 0.3
        }
    }
    
    # Build dynamic boosts
    boost_builder = DynamicBoostBuilder(user, search)
    complete_query = (
        boost_builder
        .add_brand_affinity_boosts()
        .add_category_affinity_boosts()
        .add_promotion_boosts()
        .add_inventory_boosts()
        .add_premium_member_boosts()
        .build_complete_query(base_query)
    )
    
    return {"query": complete_query}
 
 
# Demonstration
if __name__ == "__main__":
    user = UserContext(
        user_id="user_123",
        preferred_brands=["Sony", "Bose"],
        purchase_history_categories=["Electronics", "Headphones"],
        is_premium_member=True
    )
    
    search = SearchContext(
        active_promotions=["HOLIDAY2024"],
        inventory_priority="in_stock_first"
    )
    
    query = create_personalized_search_query("wireless headphones", user, search)
    print(json.dumps(query, indent=2))

Function Score Queries

Simple term-based boosts are limited—they're binary (match or no match) with fixed boost values. Function score queries provide much richer boosting based on numeric field values, geographic distance, time decay, and custom scripts.

Available function types in Elasticsearch:

Function Score Types

•weight — Static multiplier, simplest function
•field_value_factor — Boost based on a numeric field value (popularity, rating, price)
•decay functions (gauss, exp, linear) — Distance-based decay for dates, geo points, or numbers
•random_score — Consistent random score for result variation
•script_score — Custom scoring logic via Painless scripts

function_score_examples.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "running shoes",
          "fields": ["product_name^3", "description"]
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "popularity_score",
            "factor": 1.2,
            "modifier": "log1p",
            "missing": 1
          }
        }
      ],
      "boost_mode": "multiply",
      "score_mode": "multiply"
    }
  }
}
 
// field_value_factor explained:
//
// "field": "popularity_score"
//   - The numeric field to use for boosting
//
// "factor": 1.2
//   - Multiplied with the field value before applying modifier
//
// "modifier": "log1p"
//   - How to transform the field value:
//   - none: Use raw value (dangerous for large values)
//   - log: log(value)
//   - log1p: log(1 + value) - handles 0 gracefully
//   - log2p: log(2 + value)
//   - ln: natural log
//   - ln1p: ln(1 + value)
//   - ln2p: ln(2 + value)
//   - square: value^2
//   - sqrt: sqrt(value)
//   - reciprocal: 1/value
//
// "missing": 1
//   - Value to use if field is missing (prevents scoring errors)
//
// For popularity_score=100:
// boost = log1p(1.2 * 100) = log1p(120) ≈ 4.8

Script Performance

Script scores are evaluated for every matching document. For high-traffic queries, they can be a performance bottleneck. Consider: (1) Using params to avoid recompilation, (2) Keeping scripts simple, (3) Caching expensive computations in indexed fields, (4) Using stored scripts for production.

Combining Boosts: Score and Boost Modes

When using multiple scoring functions, two key parameters control how they combine: score_mode (how multiple functions combine with each other) and boost_mode (how the combined function score combines with the query score).

Score modes:

Score Mode Options
Mode	Formula	When to Use
multiply	f1 × f2 × ... × fn	When all factors should contribute proportionally
sum	f1 + f2 + ... + fn	When factors are additive (different signals)
avg	(f1 + f2 + ... + fn) / n	When you want balanced contribution
first	f1	When only the first matching function matters
max	max(f1, f2, ..., fn)	When you want the strongest signal only
min	min(f1, f2, ..., fn)	When you want the weakest signal (conservative)

Boost modes:

Boost Mode Options
Mode	Formula	When to Use
multiply	query_score × function_score	Standard; functions modify relevance
replace	function_score	Ignore text relevance; rank by function only
sum	query_score + function_score	When function adds absolute importance
avg	(query_score + function_score) / 2	Balance relevance with function
max	max(query_score, function_score)	Use whichever is higher
min	min(query_score, function_score)	Conservative scoring

coordinate_boosting.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
from enum import Enum
from typing import List, Dict, Any
from dataclasses import dataclass
 
class ScoreMode(Enum):
    MULTIPLY = "multiply"
    SUM = "sum"
    AVG = "avg"
    MAX = "max"
    MIN = "min"
    FIRST = "first"
 
class BoostMode(Enum):
    MULTIPLY = "multiply"
    REPLACE = "replace"
    SUM = "sum"
    AVG = "avg"
    MAX = "max"
    MIN = "min"
 
@dataclass
class ScoringConfig:
    """Configuration for how to combine scores."""
    score_mode: ScoreMode
    boost_mode: BoostMode
    max_boost: float = 10.0  # Prevent runaway scores
 
def choose_scoring_strategy(use_case: str) -> ScoringConfig:
    """
    Select appropriate score/boost modes for common use cases.
    """
    strategies = {
        # E-commerce: All factors contribute, text relevance matters
        "ecommerce_default": ScoringConfig(
            score_mode=ScoreMode.MULTIPLY,
            boost_mode=BoostMode.MULTIPLY,
            max_boost=5.0
        ),
        
        # Recommendations: Signals are independent, add them up
        "recommendations": ScoringConfig(
            score_mode=ScoreMode.SUM,
            boost_mode=BoostMode.MULTIPLY,
            max_boost=10.0
        ),
        
        # Location-based: Proximity can dominate
        "local_search": ScoringConfig(
            score_mode=ScoreMode.MULTIPLY,
            boost_mode=BoostMode.MULTIPLY,
            max_boost=3.0  # Lower to prevent far-but-relevant being buried
        ),
        
        # News/Time-sensitive: Freshness is critical
        "news_search": ScoringConfig(
            score_mode=ScoreMode.MULTIPLY,
            boost_mode=BoostMode.MULTIPLY,
            max_boost=5.0
        ),
        
        # Trending: Popularity/velocity can override relevance
        "trending": ScoringConfig(
            score_mode=ScoreMode.MAX,
            boost_mode=BoostMode.REPLACE,
            max_boost=100.0  # Allow strong trending signals
        ),
        
        # Balanced personalization: Don't let personalization dominate
        "personalized": ScoringConfig(
            score_mode=ScoreMode.AVG,
            boost_mode=BoostMode.MULTIPLY,
            max_boost=2.0  # Conservative to maintain relevance
        ),
    }
    
    return strategies.get(use_case, strategies["ecommerce_default"])
 
 
def build_function_score_query(
    base_query: Dict,
    functions: List[Dict],
    config: ScoringConfig
) -> Dict:
    """
    Build a properly configured function_score query.
    """
    return {
        "function_score": {
            "query": base_query,
            "functions": functions,
            "score_mode": config.score_mode.value,
            "boost_mode": config.boost_mode.value,
            "max_boost": config.max_boost
        }
    }
 
 
# Example: E-commerce query with multiple boost functions
def ecommerce_search_example():
    base_query = {
        "multi_match": {
            "query": "wireless headphones",
            "fields": ["product_name^3", "brand^2", "description"],
            "type": "best_fields"
        }
    }
    
    functions = [
        # Popularity boost (log dampened)
        {
            "field_value_factor": {
                "field": "num_sold",
                "modifier": "log1p",
                "factor": 0.5,
                "missing": 1
            },
            "weight": 1
        },
        # Rating boost
        {
            "field_value_factor": {
                "field": "avg_rating",
                "modifier": "none",
                "missing": 3
            },
            "weight": 0.5
        },
        # Freshness boost (new arrivals)
        {
            "gauss": {
                "created_at": {
                    "origin": "now",
                    "scale": "30d",
                    "decay": 0.5
                }
            },
            "weight": 0.3
        },
        # In-stock boost
        {
            "filter": {
                "term": {"in_stock": True}
            },
            "weight": 1.2
        },
        # Promotion boost (current sales)
        {
            "filter": {
                "term": {"is_on_sale": True}
            },
            "weight": 1.3
        }
    ]
    
    config = choose_scoring_strategy("ecommerce_default")
    return build_function_score_query(base_query, functions, config)

Boosting Best Practices

Boosting is powerful but dangerous. Poorly calibrated boosts can destroy search quality. Here are battle-tested best practices from production search systems.

Boosting Best Practices

•Start small, measure constantly — Begin with boost factors close to 1.0 (e.g., 1.1, 1.2). Measure impact with offline metrics (NDCG) and online metrics (CTR) before increasing.
•Normalize before boosting — Raw field values often need normalization. A popularity of 1,000,000 vs 100 will break multiplicative boosting. Use log, sqrt, or min-max normalization.
•Cap boost effects — Always set max_boost in function_score queries. Unbounded boosts will eventually cause problems when edge cases appear in production.
•Test with edge cases — What happens when fields are missing? When values are 0? When values are extremely large? Verify boost behavior for all conditions.
•Document your boost rationale — Future engineers (including yourself) need to understand why title has 3x boost. Write it down.
•Use version control for queries — Treat query templates as code. Review boost changes like you'd review code changes.
•A/B test significant changes — Never deploy major boost changes to all users at once. Test with a small percentage first.
•Monitor boost impact over time — User behavior changes, inventory changes, content changes. Boosts that worked last month may not work today.

Common Boosting Mistakes

•Boosting without measuring baseline performance
•Using raw field values without normalization
•No max_boost limits on function scores
•Multiplicative stacking without consideration
•Ignoring edge cases and missing values
•Optimizing for business metrics at expense of relevance

Signs of Good Boosting

•Relevance metrics (NDCG, MRR) remain stable or improve
•Boost factors are documented with rationale
•Edge cases are explicitly handled
•Changes are A/B tested before full rollout
•Monitoring alerts on score distribution changes
•Regular review and pruning of obsolete boosts

The Business vs. Relevance Tension

Business stakeholders often request boosts for commercial reasons: 'Boost our private label products,' 'Promote high-margin items,' 'Show featured vendors first.' These requests are valid but can conflict with user relevance expectations. Always quantify the relevance impact and make the trade-off explicit. A 20% CTR decrease to boost margin by 5% may or may not be worthwhile.

Summary: Mastering Boost Control

Boosting is the primary lever for tuning search relevance without retraining ML models or restructuring data. Understanding its mechanics—from simple field weights to sophisticated function scores—enables you to translate business and user requirements into ranking behavior.

Key Takeaways

•Boosts are score multipliers (usually) — Multiplicative boosts preserve relative ordering within boosted sets; additive boosts can flip rankings.
•Field boosts set baseline importance — Title matches should count more than body matches. Configure this in your query structure.
•Query-time boosts add dynamics — Promotions, personalization, inventory status, and seasonal factors can all be handled with dynamic boost clauses.
•Function scores enable sophisticated logic — Field value factors, decay functions, and scripts allow complex boosting based on numeric values, distances, and custom formulas.
•Score and boost modes matter — How multiple boosts combine (multiply, sum, avg, max) significantly affects final ranking. Choose deliberately.
•Normalization prevents disasters — Raw numeric values will break your scoring. Use log, sqrt, or custom normalization for fields with wide value ranges.
•Always cap and measure — Set max_boost limits, measure relevance impact, and A/B test changes before full deployment.

What's next:

Boosting affects all users equally—everyone gets the same field weights and function scores. The next page explores personalization, where we tailor relevance to individual users based on their history, preferences, and context.

Page Complete

You now understand how to control search relevance through boosting—from simple field weights to complex function scores. This is the hands-on tuning layer that translates relevance factors into practical ranking behavior. Next, we'll explore how to make relevance personal for each user.

2 / 5

Loading learning content...

System Design (HLD)Search Relevance Tuning

Search Relevance Tuning

LevelAdvanced

Duration75 mins

TopicSearch Relevance Tuning

2 / 5

Boosting Fields and Queries: Controlling Relevance

The Art of Tuning Relevance

This page provides a comprehensive guide to boosting strategies—from simple field weights to sophisticated function scores that incorporate arbitrary business logic.

What You Will Learn

The Mathematics of Boosting

The basic boosting equation:

Final_Score = Base_Score × Boost_Factor

Or with multiple boosts:

Final_Score = Base_Score × Boost_1 × Boost_2 × ... × Boost_n

Why multiplication (usually)?

Additive boosts can flip rankings, which is sometimes desirable but harder to control:

A: 10 + 5 (boost) = 15
B: 5 + 15 (boost) = 20  // B now ranks higher

The danger of unbounded boosts:

boost_mathematics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
import math
from dataclasses import dataclass
from typing import List, Dict, Optional
from enum import Enum
 
class BoostMode(Enum):
    """How to combine boost with base score."""
    MULTIPLY = "multiply"    # score * boost (most common)
    ADD = "add"              # score + boost
    REPLACE = "replace"      # boost (ignore base score)
    MAX = "max"              # max(score, boost)
    MIN = "min"              # min(score, boost)
    AVG = "avg"              # (score + boost) / 2
 
class ScoreMode(Enum):
    """How to combine multiple boosts."""
    MULTIPLY = "multiply"    # boost1 * boost2 * ...
    SUM = "sum"              # boost1 + boost2 + ...
    AVG = "avg"              # average of boosts
    MAX = "max"              # maximum boost
    MIN = "min"              # minimum boost
    FIRST = "first"          # first matching boost
 
@dataclass
class BoostConfig:
    """Configuration for a single boost factor."""
    name: str
    value: float
    mode: BoostMode = BoostMode.MULTIPLY
    
    # Normalization parameters
    min_value: Optional[float] = None  # Clamp boost to minimum
    max_value: Optional[float] = None  # Clamp boost to maximum
    
    def normalize(self, boost: float) -> float:
        """Apply normalization constraints to boost value."""
        if self.min_value is not None:
            boost = max(self.min_value, boost)
        if self.max_value is not None:
            boost = min(self.max_value, boost)
        return boost
 
class BoostCalculator:
    """
    Demonstrates different boosting strategies and their effects.
    """
    
    @staticmethod
    def multiplicative_boost(base_score: float, boost_factor: float) -> float:
        """
        Standard multiplicative boost.
        Preserves relative ordering within boosted set.
        """
        return base_score * boost_factor
    
    @staticmethod
    def additive_boost(base_score: float, boost_amount: float) -> float:
        """
        Additive boost, Can flip rankings.
        Useful when you want absolute priority for certain signals.
        """
        return base_score + boost_amount
    
    @staticmethod
    def logarithmic_boost(base_score: float, factor: float, 
                          log_base: float = math.e) -> float:
        """
        Log-dampened boost. Prevents extreme boosts from dominating.
        Useful for popularity signals that span orders of magnitude.
        
        Example: page views range from 10 to 10,000,000
                 Raw multiplicative boost would make high-view pages
                 dominate regardless of relevance.
                 Log dampening: log(10) = 2.3, log(10M) = 16.1
        """
        return base_score * (1 + factor * math.log(1 + base_score, log_base))
    
    @staticmethod
    def saturation_boost(base_score: float, boost_value: float,
                        pivot: float = 1.0) -> float:
        """
        Sigmoid-like saturation boost.
        Provides diminishing returns for very large boost values.
        
        Formula: score * (boost / (boost + pivot))
        
        When boost >> pivot: factor approaches 1.0
        When boost = pivot: factor = 0.5
        When boost << pivot: factor approaches 0
        """
        boost_factor = boost_value / (boost_value + pivot)
        return base_score * boost_factor
 
 
def combine_multiple_boosts(base_score: float, 
                            boosts: List[BoostConfig],
                            score_mode: ScoreMode = ScoreMode.MULTIPLY) -> float:
    """
    Combine multiple boost factors according to score mode.
    
    Example use case: E-commerce search with multiple boosts
    - Field boost: title match = 2.0
    - Freshness boost: 0.8 - 1.2 based on age
    - Popularity boost: 0.9 - 1.5 based on sales
    - Quality boost: 0.5 - 1.0 based on rating
    """
    if not boosts:
        return base_score
    
    boost_values = [b.normalize(b.value) for b in boosts]
    
    if score_mode == ScoreMode.MULTIPLY:
        combined = 1.0
        for bv in boost_values:
            combined *= bv
            
    elif score_mode == ScoreMode.SUM:
        combined = sum(boost_values)
        
    elif score_mode == ScoreMode.AVG:
        combined = sum(boost_values) / len(boost_values)
        
    elif score_mode == ScoreMode.MAX:
        combined = max(boost_values)
        
    elif score_mode == ScoreMode.MIN:
        combined = min(boost_values)
        
    elif score_mode == ScoreMode.FIRST:
        combined = boost_values[0] if boost_values else 1.0
        
    else:
        combined = 1.0
    
    return base_score * combined
 
 
def boost_impact_demonstration():
    """
    Demonstrate how different boost strategies affect ranking.
    """
    # Scenario: 3 documents with different base scores
    documents = [
        {"id": "A", "base_score": 10.0, "popularity": 1000, "freshness": 0.9},
        {"id": "B", "base_score": 8.0, "popularity": 5000, "freshness": 0.6},
        {"id": "C", "base_score": 5.0, "popularity": 50000, "freshness": 1.0},
    ]
    
    print("Base ranking: A (10.0) > B (8.0) > C (5.0)")
    print()
    
    # Apply different boost strategies
    strategies = [
        ("Raw multiplicative popularity", lambda d: d["base_score"] * d["popularity"]),
        ("Log popularity", lambda d: d["base_score"] * math.log10(d["popularity"])),
        ("Sqrt popularity", lambda d: d["base_score"] * math.sqrt(d["popularity"])),
        ("Saturation popularity", lambda d: BoostCalculator.saturation_boost(
            d["base_score"], d["popularity"], pivot=10000)),
        ("Freshness only", lambda d: d["base_score"] * d["freshness"]),
        ("Combined (log pop × freshness)", lambda d: d["base_score"] * 
            math.log10(d["popularity"]) * d["freshness"]),
    ]
    
    for name, scorer in strategies:
        scored = [(d["id"], scorer(d)) for d in documents]
        scored.sort(key=lambda x: -x[1])
        ranking = " > ".join([f'{d[0]} ({d[1]:.1f})' for d in scored])
        print(f"{name}:")
        print(f"  {ranking}")
        print()

Field-Level Boosting

Field boost fundamentals:

Field boosting assigns different importance weights to different fields. When computing relevance scores, matches in higher-weighted fields contribute more to the total score.

Common field boost patterns:

Typical Field Boost Values by Domain
Domain	Field	Typical Boost	Rationale
E-commerce	Product SKU	10-50x	Exact match overwhelmingly relevant
E-commerce	Product Name	3-5x	Primary identifier, high signal
E-commerce	Brand	2-3x	Strong navigational intent
E-commerce	Description	1x (baseline)	Useful but noisy
E-commerce	Reviews	0.5x	Tangentially relevant content
Blog/CMS	Title	5-10x	Summary of content
Blog/CMS	H1/H2 Headings	2-3x	Section summaries
Blog/CMS	Body	1x	Main content
Blog/CMS	Tags/Categories	3-4x	Explicit classification
Documentation	Page Title	5x	Navigational anchor
Documentation	Code Examples	2-3x	High-value technical content
Documentation	Body Text	1x	Explanatory content

field_boosting_elasticsearch.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
  "query": {
    "multi_match": {
      "query": "wireless headphones",
      "type": "best_fields",
      "fields": [
        "sku^50",
        "product_name^5",
        "brand^3",
        "category^2",
        "description^1",
        "reviews^0.5"
      ],
      "tie_breaker": 0.3
    }
  }
}
 
// Explanation of parameters:
//
// "type": "best_fields"
//   - Uses the MAXIMUM score from any single field
//   - Good when you expect the match to be in ONE relevant field
//   - Other types:
//     - "most_fields": SUM scores from all matching fields
//     - "cross_fields": Treats fields as one big field
//     - "phrase": Phrase match across fields
//     - "phrase_prefix": Typeahead-style matching
//
// Field boost syntax: "field_name^boost_value"
//   - sku^50: SKU matches are 50x more important than description
//   - reviews^0.5: Review matches are LESS important than description
//
// "tie_breaker": 0.3
//   - When using best_fields, secondary field matches contribute
//     30% of their score (prevents ignoring partial matches)
//   - 0.0 = only best field counts
//   - 1.0 = sum all field scores (defeats purpose of best_fields)

The N-Gram Boost Trap

Query-Time Boosting

Field boosts are configured in the query and apply uniformly to all searches. Query-time boosts are dynamic—they can vary based on the query, user, time, or other runtime context.

Use cases for query-time boosting:

Merchandising: Boost specific products during promotions
Personalization: Boost brands a user has previously purchased
Seasonal relevance: Boost winter gear in December
Inventory management: Boost in-stock items over out-of-stock
Business rules: Boost higher-margin products

The boosting query in Elasticsearch:

query_time_boosting.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
  "query": {
    "boosting": {
      "positive": {
        "multi_match": {
          "query": "winter jacket",
          "fields": ["product_name^3", "description"]
        }
      },
      "negative": {
        "term": {
          "out_of_stock": true
        }
      },
      "negative_boost": 0.2
    }
  }
}
 
// The boosting query has two parts:
// 1. "positive": Documents matching this get normal scores
// 2. "negative": Documents ALSO matching this get penalized
//
// "negative_boost": 0.2 means:
//   - Out-of-stock items have their scores multiplied by 0.2
//   - They're not excluded, just demoted
//   - Good for soft filtering (prefer in-stock, show out-of-stock last)

Building dynamic boost clauses programmatically:

In production, boost clauses are often constructed dynamically based on runtime context:

dynamic_boost_builder.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
from dataclasses import dataclass, field
from typing import List, Dict, Any, Optional
from datetime import datetime
import json
 
@dataclass
class UserContext:
    """User context for personalized boosting."""
    user_id: Optional[str] = None
    location: Optional[str] = None
    preferred_brands: List[str] = field(default_factory=list)
    purchase_history_categories: List[str] = field(default_factory=list)
    price_sensitivity: str = "medium"  # low, medium, high
    is_premium_member: bool = False
 
@dataclass
class SearchContext:
    """Search-time context for dynamic boosting."""
    current_time: datetime = field(default_factory=datetime.now)
    is_holiday_season: bool = False
    active_promotions: List[str] = field(default_factory=list)
    inventory_priority: str = "balanced"  # in_stock_first, balanced, include_all
 
class DynamicBoostBuilder:
    """
    Constructs dynamic query-time boosts based on context.
    
    This pattern separates boost logic from query construction,
    making it testable and configurable.
    """
    
    def __init__(self, user_ctx: UserContext, search_ctx: SearchContext):
        self.user = user_ctx
        self.search = search_ctx
        self.should_clauses: List[Dict] = []
    
    def add_brand_affinity_boosts(self) -> 'DynamicBoostBuilder':
        """Boost brands the user has shown preference for."""
        for brand in self.user.preferred_brands:
            self.should_clauses.append({
                "term": {
                    "brand.keyword": {
                        "value": brand,
                        "boost": 1.8  # Significant but not overwhelming
                    }
                }
            })
        return self
    
    def add_category_affinity_boosts(self) -> 'DynamicBoostBuilder':
        """Boost categories from user's purchase history."""
        for category in self.user.purchase_history_categories[:5]:  # Limit
            self.should_clauses.append({
                "term": {
                    "category": {
                        "value": category,
                        "boost": 1.3
                    }
                }
            })
        return self
    
    def add_promotion_boosts(self) -> 'DynamicBoostBuilder':
        """Boost actively promoted products."""
        if self.search.active_promotions:
            self.should_clauses.append({
                "terms": {
                    "promotion_id": self.search.active_promotions,
                    "boost": 2.0
                }
            })
        return self
    
    def add_inventory_boosts(self) -> 'DynamicBoostBuilder':
        """Boost/penalize based on inventory status."""
        if self.search.inventory_priority == "in_stock_first":
            self.should_clauses.append({
                "term": {
                    "in_stock": {
                        "value": True,
                        "boost": 2.0
                    }
                }
            })
            # Optionally add negative boost for out of stock
        return self
    
    def add_freshness_boost(self, field: str = "created_at", 
                           scale: str = "30d",
                           boost: float = 1.5) -> 'DynamicBoostBuilder':
        """Add time-decay boost for recent items."""
        self.should_clauses.append({
            "function_score": {
                "functions": [{
                    "gauss": {
                        field: {
                            "origin": "now",
                            "scale": scale,
                            "decay": 0.5
                        }
                    },
                    "weight": boost
                }],
                "boost_mode": "multiply"
            }
        })
        return self
    
    def add_price_sensitivity_boost(self) -> 'DynamicBoostBuilder':
        """Boost based on user's price sensitivity."""
        if self.user.price_sensitivity == "high":
            # Price-sensitive users: boost lower prices
            self.should_clauses.append({
                "function_score": {
                    "functions": [{
                        "script_score": {
                            "script": {
                                "source": """
                                    double price = doc['price'].value;
                                    double maxPrice = 1000;
                                    return 1 + ((maxPrice - price) / maxPrice) * 0.5;
                                """
                            }
                        }
                    }],
                    "boost_mode": "multiply"
                }
            })
        return self
    
    def add_premium_member_boosts(self) -> 'DynamicBoostBuilder':
        """Premium members see premium products boosted."""
        if self.user.is_premium_member:
            self.should_clauses.append({
                "term": {
                    "is_premium_product": {
                        "value": True,
                        "boost": 1.4
                    }
                }
            })
        return self
    
    def build(self) -> List[Dict]:
        """Return the constructed boost clauses."""
        return self.should_clauses
    
    def build_complete_query(self, base_query: Dict) -> Dict:
        """Wrap base query with dynamic boosts."""
        if not self.should_clauses:
            return base_query
        
        return {
            "bool": {
                "must": base_query,
                "should": self.should_clauses
            }
        }
 
 
# Example usage
def create_personalized_search_query(
    query_text: str,
    user: UserContext,
    search: SearchContext
) -> Dict:
    """
    Build a fully personalized search query.
    """
    # Base text query
    base_query = {
        "multi_match": {
            "query": query_text,
            "fields": ["product_name^5", "brand^3", "description"],
            "type": "best_fields",
            "tie_breaker": 0.3
        }
    }
    
    # Build dynamic boosts
    boost_builder = DynamicBoostBuilder(user, search)
    complete_query = (
        boost_builder
        .add_brand_affinity_boosts()
        .add_category_affinity_boosts()
        .add_promotion_boosts()
        .add_inventory_boosts()
        .add_premium_member_boosts()
        .build_complete_query(base_query)
    )
    
    return {"query": complete_query}
 
 
# Demonstration
if __name__ == "__main__":
    user = UserContext(
        user_id="user_123",
        preferred_brands=["Sony", "Bose"],
        purchase_history_categories=["Electronics", "Headphones"],
        is_premium_member=True
    )
    
    search = SearchContext(
        active_promotions=["HOLIDAY2024"],
        inventory_priority="in_stock_first"
    )
    
    query = create_personalized_search_query("wireless headphones", user, search)
    print(json.dumps(query, indent=2))

Function Score Queries

Available function types in Elasticsearch:

Function Score Types

•weight — Static multiplier, simplest function
•field_value_factor — Boost based on a numeric field value (popularity, rating, price)
•decay functions (gauss, exp, linear) — Distance-based decay for dates, geo points, or numbers
•random_score — Consistent random score for result variation
•script_score — Custom scoring logic via Painless scripts

function_score_examples.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "running shoes",
          "fields": ["product_name^3", "description"]
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "popularity_score",
            "factor": 1.2,
            "modifier": "log1p",
            "missing": 1
          }
        }
      ],
      "boost_mode": "multiply",
      "score_mode": "multiply"
    }
  }
}
 
// field_value_factor explained:
//
// "field": "popularity_score"
//   - The numeric field to use for boosting
//
// "factor": 1.2
//   - Multiplied with the field value before applying modifier
//
// "modifier": "log1p"
//   - How to transform the field value:
//   - none: Use raw value (dangerous for large values)
//   - log: log(value)
//   - log1p: log(1 + value) - handles 0 gracefully
//   - log2p: log(2 + value)
//   - ln: natural log
//   - ln1p: ln(1 + value)
//   - ln2p: ln(2 + value)
//   - square: value^2
//   - sqrt: sqrt(value)
//   - reciprocal: 1/value
//
// "missing": 1
//   - Value to use if field is missing (prevents scoring errors)
//
// For popularity_score=100:
// boost = log1p(1.2 * 100) = log1p(120) ≈ 4.8

Script Performance

Combining Boosts: Score and Boost Modes

Score modes:

Score Mode Options
Mode	Formula	When to Use
multiply	f1 × f2 × ... × fn	When all factors should contribute proportionally
sum	f1 + f2 + ... + fn	When factors are additive (different signals)
avg	(f1 + f2 + ... + fn) / n	When you want balanced contribution
first	f1	When only the first matching function matters
max	max(f1, f2, ..., fn)	When you want the strongest signal only
min	min(f1, f2, ..., fn)	When you want the weakest signal (conservative)

Boost modes:

Boost Mode Options
Mode	Formula	When to Use
multiply	query_score × function_score	Standard; functions modify relevance
replace	function_score	Ignore text relevance; rank by function only
sum	query_score + function_score	When function adds absolute importance
avg	(query_score + function_score) / 2	Balance relevance with function
max	max(query_score, function_score)	Use whichever is higher
min	min(query_score, function_score)	Conservative scoring

coordinate_boosting.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
from enum import Enum
from typing import List, Dict, Any
from dataclasses import dataclass
 
class ScoreMode(Enum):
    MULTIPLY = "multiply"
    SUM = "sum"
    AVG = "avg"
    MAX = "max"
    MIN = "min"
    FIRST = "first"
 
class BoostMode(Enum):
    MULTIPLY = "multiply"
    REPLACE = "replace"
    SUM = "sum"
    AVG = "avg"
    MAX = "max"
    MIN = "min"
 
@dataclass
class ScoringConfig:
    """Configuration for how to combine scores."""
    score_mode: ScoreMode
    boost_mode: BoostMode
    max_boost: float = 10.0  # Prevent runaway scores
 
def choose_scoring_strategy(use_case: str) -> ScoringConfig:
    """
    Select appropriate score/boost modes for common use cases.
    """
    strategies = {
        # E-commerce: All factors contribute, text relevance matters
        "ecommerce_default": ScoringConfig(
            score_mode=ScoreMode.MULTIPLY,
            boost_mode=BoostMode.MULTIPLY,
            max_boost=5.0
        ),
        
        # Recommendations: Signals are independent, add them up
        "recommendations": ScoringConfig(
            score_mode=ScoreMode.SUM,
            boost_mode=BoostMode.MULTIPLY,
            max_boost=10.0
        ),
        
        # Location-based: Proximity can dominate
        "local_search": ScoringConfig(
            score_mode=ScoreMode.MULTIPLY,
            boost_mode=BoostMode.MULTIPLY,
            max_boost=3.0  # Lower to prevent far-but-relevant being buried
        ),
        
        # News/Time-sensitive: Freshness is critical
        "news_search": ScoringConfig(
            score_mode=ScoreMode.MULTIPLY,
            boost_mode=BoostMode.MULTIPLY,
            max_boost=5.0
        ),
        
        # Trending: Popularity/velocity can override relevance
        "trending": ScoringConfig(
            score_mode=ScoreMode.MAX,
            boost_mode=BoostMode.REPLACE,
            max_boost=100.0  # Allow strong trending signals
        ),
        
        # Balanced personalization: Don't let personalization dominate
        "personalized": ScoringConfig(
            score_mode=ScoreMode.AVG,
            boost_mode=BoostMode.MULTIPLY,
            max_boost=2.0  # Conservative to maintain relevance
        ),
    }
    
    return strategies.get(use_case, strategies["ecommerce_default"])
 
 
def build_function_score_query(
    base_query: Dict,
    functions: List[Dict],
    config: ScoringConfig
) -> Dict:
    """
    Build a properly configured function_score query.
    """
    return {
        "function_score": {
            "query": base_query,
            "functions": functions,
            "score_mode": config.score_mode.value,
            "boost_mode": config.boost_mode.value,
            "max_boost": config.max_boost
        }
    }
 
 
# Example: E-commerce query with multiple boost functions
def ecommerce_search_example():
    base_query = {
        "multi_match": {
            "query": "wireless headphones",
            "fields": ["product_name^3", "brand^2", "description"],
            "type": "best_fields"
        }
    }
    
    functions = [
        # Popularity boost (log dampened)
        {
            "field_value_factor": {
                "field": "num_sold",
                "modifier": "log1p",
                "factor": 0.5,
                "missing": 1
            },
            "weight": 1
        },
        # Rating boost
        {
            "field_value_factor": {
                "field": "avg_rating",
                "modifier": "none",
                "missing": 3
            },
            "weight": 0.5
        },
        # Freshness boost (new arrivals)
        {
            "gauss": {
                "created_at": {
                    "origin": "now",
                    "scale": "30d",
                    "decay": 0.5
                }
            },
            "weight": 0.3
        },
        # In-stock boost
        {
            "filter": {
                "term": {"in_stock": True}
            },
            "weight": 1.2
        },
        # Promotion boost (current sales)
        {
            "filter": {
                "term": {"is_on_sale": True}
            },
            "weight": 1.3
        }
    ]
    
    config = choose_scoring_strategy("ecommerce_default")
    return build_function_score_query(base_query, functions, config)

Boosting Best Practices

Boosting is powerful but dangerous. Poorly calibrated boosts can destroy search quality. Here are battle-tested best practices from production search systems.

Boosting Best Practices

•Start small, measure constantly — Begin with boost factors close to 1.0 (e.g., 1.1, 1.2). Measure impact with offline metrics (NDCG) and online metrics (CTR) before increasing.
•Normalize before boosting — Raw field values often need normalization. A popularity of 1,000,000 vs 100 will break multiplicative boosting. Use log, sqrt, or min-max normalization.
•Cap boost effects — Always set max_boost in function_score queries. Unbounded boosts will eventually cause problems when edge cases appear in production.
•Test with edge cases — What happens when fields are missing? When values are 0? When values are extremely large? Verify boost behavior for all conditions.
•Document your boost rationale — Future engineers (including yourself) need to understand why title has 3x boost. Write it down.
•Use version control for queries — Treat query templates as code. Review boost changes like you'd review code changes.
•A/B test significant changes — Never deploy major boost changes to all users at once. Test with a small percentage first.
•Monitor boost impact over time — User behavior changes, inventory changes, content changes. Boosts that worked last month may not work today.

Common Boosting Mistakes

•Boosting without measuring baseline performance
•Using raw field values without normalization
•No max_boost limits on function scores
•Multiplicative stacking without consideration
•Ignoring edge cases and missing values
•Optimizing for business metrics at expense of relevance

Signs of Good Boosting

•Relevance metrics (NDCG, MRR) remain stable or improve
•Boost factors are documented with rationale
•Edge cases are explicitly handled
•Changes are A/B tested before full rollout
•Monitoring alerts on score distribution changes
•Regular review and pruning of obsolete boosts

The Business vs. Relevance Tension

Summary: Mastering Boost Control

Key Takeaways

•Boosts are score multipliers (usually) — Multiplicative boosts preserve relative ordering within boosted sets; additive boosts can flip rankings.
•Field boosts set baseline importance — Title matches should count more than body matches. Configure this in your query structure.
•Query-time boosts add dynamics — Promotions, personalization, inventory status, and seasonal factors can all be handled with dynamic boost clauses.
•Function scores enable sophisticated logic — Field value factors, decay functions, and scripts allow complex boosting based on numeric values, distances, and custom formulas.
•Score and boost modes matter — How multiple boosts combine (multiply, sum, avg, max) significantly affects final ranking. Choose deliberately.
•Normalization prevents disasters — Raw numeric values will break your scoring. Use log, sqrt, or custom normalization for fields with wide value ranges.
•Always cap and measure — Set max_boost limits, measure relevance impact, and A/B test changes before full deployment.

What's next:

Page Complete

2 / 5