Database Management SystemsRanking Functions

SQL Ranking Functions

LevelIntermediate

Duration60 mins

TopicRanking Functions

4 / 5

NTILE() Function

Dividing Data into Equal Groups

The previous ranking functions—ROW_NUMBER, RANK, and DENSE_RANK—assign positions based on ordering. But sometimes you don't need positions; you need categories. You want to divide your data into groups of roughly equal size: quartiles, deciles, percentiles, or any custom bucket count.

NTILE() serves this purpose. Given N buckets, it distributes rows as evenly as possible across those buckets, assigning each row a bucket number from 1 to N.

This capability is fundamental for:

Statistical analysis — Quartiles (4), quintiles (5), deciles (10), percentiles (100)
Load balancing — Distributing work items across N worker processes
Segmentation — Dividing customers into equal-sized behavioral groups
Fair distribution — Splitting data for testing, sampling, or parallel processing

What You Will Master

By the end of this page, you will deeply understand NTILE()'s bucket distribution mechanics, how it handles rows that don't divide evenly, and how to apply it for percentile calculations, load balancing, stratified sampling, and other equal-division scenarios.

Fundamental Definition and Syntax

NTILE(n) is a window function that divides an ordered partition into n approximately equal groups (tiles), assigning each row an integer from 1 to n indicating its group membership.

Formal Definition:

NTILE(n) partitions the rows in the ordered partition into n groups of approximately equal size, assigning tile numbers 1 through n. If the partition has more rows than tiles, each tile receives at least one row. If the partition has fewer rows than tiles, some tiles remain empty (the maximum tile number equals the row count).

The Distribution Rule:

For N rows distributed into T tiles:
- Each tile receives at least ⌊N/T⌋ rows
- The first (N mod T) tiles receive one extra row

NTILE() Syntax
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Basic Syntax
NTILE(number_of_tiles) OVER (
    [PARTITION BY partition_expression, ...]
    ORDER BY sort_expression [ASC | DESC], ...
)
 
-- Split into 4 groups (quartiles)
NTILE(4) OVER (ORDER BY score DESC)
 
-- Split each department into 3 tiers
NTILE(3) OVER (
    PARTITION BY department
    ORDER BY performance_score DESC
)
 
-- Split into 100 groups (percentiles)
NTILE(100) OVER (ORDER BY metric_value)
 
-- Key difference from other ranking functions:
-- NTILE takes an argument: the number of tiles (buckets)

Key Properties of NTILE()

•Argument Required — Unlike other ranking functions, NTILE requires the tile count as an argument
•Return Range — Returns integers 1 to n (where n is the argument, or row count if fewer rows)
•Equal Distribution — Tiles have sizes differing by at most 1 row
•Order Dependent — Which rows go to which tile depends on ORDER BY
•Partition Scoped — Distribution is independent within each partition

Distribution Mechanics: How Rows Are Assigned

Understanding NTILE's row assignment algorithm is essential for predictable results.

The Algorithm:

Count rows N in the partition
Calculate base size: ⌊N/T⌋ (integer division)
Calculate remainder: N mod T
First R tiles get (base + 1) rows; remaining tiles get (base) rows

Example: 10 Rows into 4 Tiles

N = 10, T = 4
Base = ⌊10/4⌋ = 2
Remainder = 10 mod 4 = 2
Distribution: First 2 tiles get 3 rows; last 2 tiles get 2 rows
Result: 3, 3, 2, 2 = 10 total ✓

Distribution Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- 10 salespeople divided into 4 performance quartiles
CREATE TABLE salespeople (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    total_sales DECIMAL(12,2)
);
 
INSERT INTO salespeople VALUES
(1, 'Alice', 100000),
(2, 'Bob', 90000),
(3, 'Carol', 85000),
(4, 'David', 80000),
(5, 'Emma', 75000),
(6, 'Frank', 70000),
(7, 'Grace', 65000),
(8, 'Henry', 60000),
(9, 'Ivy', 55000),
(10, 'Jack', 50000);
 
-- Assign to quartiles (4 groups)
SELECT 
    name,
    total_sales,
    NTILE(4) OVER (ORDER BY total_sales DESC) AS quartile
FROM salespeople;

10 Rows Divided into 4 Tiles
name	total_sales	quartile
Alice	100000	1
Bob	90000	1
Carol	85000	1
David	80000	2
Emma	75000	2
Frank	70000	2
Grace	65000	3
Henry	60000	3
Ivy	55000	4
Jack	50000	4

Analysis:

10 rows ÷ 4 tiles = 2 remainder 2
Tiles 1 and 2 get 3 rows each (the extra rows)
Tiles 3 and 4 get 2 rows each
Total: 3 + 3 + 2 + 2 = 10 ✓

Why ORDER BY Matters:

The ORDER BY determines which rows go into which tiles:

ORDER BY sales DESC: Tile 1 = top performers, Tile 4 = bottom performers
ORDER BY sales ASC: Tile 1 = bottom performers, Tile 4 = top performers
Without ORDER BY: Assignment is arbitrary

Always Specify ORDER BY

Unlike other ranking functions where ORDER BY affects the rank value, with NTILE it also determines group membership. Without ORDER BY, rows are assigned to tiles arbitrarily—usually useless. Always include ORDER BY for meaningful results.

Edge Cases: Uneven Divisions and Boundaries

Real-world data rarely divides evenly. Understanding NTILE's boundary conditions prevents surprises.

Edge Case 1: Fewer Rows Than Tiles

If you request 10 tiles but have only 7 rows:

Fewer Rows Than Tiles
1
2
3
4
5
6
7
8
-- 7 rows into 10 tiles
WITH sample AS (
    SELECT n FROM generate_series(1, 7) AS n
)
SELECT 
    n AS value,
    NTILE(10) OVER (ORDER BY n) AS tile
FROM sample;

7 Rows into 10 Tiles
value	tile
1	1
2	2
3	3
4	4
5	5
6	6
7	7

Result: Each row gets its own tile (1-7). Tiles 8, 9, 10 have zero rows—they simply don't appear in the output.

Edge Case 2: One Row Per Tile

When rows exactly equal tiles:

Equal Rows and Tiles
1
2
3
4
5
6
7
8
9
10
-- 5 rows into 5 tiles: perfect division
WITH sample AS (
    SELECT n FROM generate_series(1, 5) AS n
)
SELECT 
    n AS value,
    NTILE(5) OVER (ORDER BY n) AS tile
FROM sample;
 
-- Result: Each row gets exactly one unique tile (1, 2, 3, 4, 5)

Edge Case 3: All Rows in One Tile

NTILE(1) is valid—all rows go to tile 1:

Single Tile
1
2
3
4
5
6
-- All rows in tile 1
SELECT 
    name,
    NTILE(1) OVER (ORDER BY total_sales) AS tile
FROM salespeople;
-- Every row returns tile = 1

Edge Case 4: Ties in ORDER BY

NTILE must assign each row to exactly one tile, even if ORDER BY values are tied. The assignment among tied rows is arbitrary:

Handling Ties
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Ties in ORDER BY: arbitrary assignment within ties
CREATE TABLE scores (id INT, score INT);
INSERT INTO scores VALUES
(1, 100), (2, 100), (3, 100), (4, 90), (5, 90), (6, 80);
 
SELECT 
    id,
    score,
    NTILE(2) OVER (ORDER BY score DESC) AS tile
FROM scores;
 
-- Possible result (tile assignment for tied scores is arbitrary):
-- 100: tiles 1, 1, 1 (or 1, 1, 2 - depends on engine)
-- 90: tiles 1, 2
-- 80: tile 2
 
-- Add secondary order for determinism:
SELECT 
    id,
    score,
    NTILE(2) OVER (ORDER BY score DESC, id) AS tile
FROM scores;

Deterministic Tile Assignment

For reproducible results, include enough ORDER BY columns to break all ties. Adding a unique key (like ID) as the final sort column ensures deterministic tile assignment.

Statistical Applications: Percentiles and Quartiles

NTILE's primary use case is computing statistical divisions: quartiles, quintiles, deciles, and percentiles.

Common Statistical Divisions:

Division	NTILE Arg	Description
Halves	2	Median split
Tertiles	3	Three equal groups
Quartiles	4	25% each (Q1, Q2, Q3, Q4)
Quintiles	5	20% each
Deciles	10	10% each
Percentiles	100	1% each

Application 1: Quartile Analysis

Quartile Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Analyze customer spending by quartile
WITH customer_quartiles AS (
    SELECT 
        customer_id,
        customer_name,
        total_lifetime_value,
        NTILE(4) OVER (ORDER BY total_lifetime_value DESC) AS value_quartile
    FROM customers
)
SELECT 
    value_quartile,
    CASE value_quartile
        WHEN 1 THEN 'Top 25% (VIP)'
        WHEN 2 THEN 'Upper Mid (Growth)'
        WHEN 3 THEN 'Lower Mid (Develop)'
        WHEN 4 THEN 'Bottom 25% (Nurture)'
    END AS segment_name,
    COUNT(*) AS customer_count,
    AVG(total_lifetime_value) AS avg_ltv,
    MIN(total_lifetime_value) AS min_ltv,
    MAX(total_lifetime_value) AS max_ltv
FROM customer_quartiles
GROUP BY value_quartile
ORDER BY value_quartile;

Application 2: Percentile Rank

Computing each row's percentile position:

Percentile Calculation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Calculate percentile rank for each employee's salary
WITH percentiles AS (
    SELECT 
        employee_id,
        employee_name,
        department,
        salary,
        NTILE(100) OVER (ORDER BY salary) AS percentile,
        NTILE(100) OVER (
            PARTITION BY department 
            ORDER BY salary
        ) AS dept_percentile
    FROM employees
)
SELECT 
    employee_name,
    department,
    salary,
    percentile AS global_percentile,
    dept_percentile AS department_percentile,
    CASE 
        WHEN percentile >= 90 THEN 'Top 10%'
        WHEN percentile >= 75 THEN 'Top 25%'
        WHEN percentile >= 50 THEN 'Above Median'
        ELSE 'Below Median'
    END AS performance_tier
FROM percentiles
ORDER BY global_percentile DESC;
 
-- Note: NTILE(100) assigns 1 to lowest 1%, 100 to highest 1%
-- For conventional percentile (100 = top), ORDER BY ascending

NTILE vs PERCENT_RANK

NTILE(100) gives integer percentiles 1-100 with equal row counts per tile. PERCENT_RANK() gives continuous percentile values 0-1 based on relative position. Use NTILE when you need discrete buckets; use PERCENT_RANK for precise percentile values.

Application 3: Decile Analysis for Performance Evaluation

Dividing employees into 10 performance buckets:

Decile Performance Evaluation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Decile-based performance evaluation
WITH performance_deciles AS (
    SELECT 
        employee_id,
        employee_name,
        performance_score,
        NTILE(10) OVER (ORDER BY performance_score DESC) AS decile
    FROM employee_reviews
    WHERE review_year = 2024
)
SELECT 
    decile,
    CONCAT('Decile ', decile, ' (', (decile - 1) * 10, '-', decile * 10, ' percentile)') AS description,
    COUNT(*) AS employee_count,
    ROUND(AVG(performance_score), 2) AS avg_score,
    CASE 
        WHEN decile <= 2 THEN 'Exceptional - Promote'
        WHEN decile <= 4 THEN 'Strong - Bonus'
        WHEN decile <= 7 THEN 'Meeting Expectations'
        WHEN decile <= 9 THEN 'Needs Improvement'
        ELSE 'Performance Plan'
    END AS action_item
FROM performance_deciles
GROUP BY decile
ORDER BY decile;

Practical Applications Beyond Statistics

NTILE's equal distribution capability extends beyond statistical analysis to operational use cases.

Application 1: Load Balancing Across Workers

Distributing tasks evenly across N processing nodes:

Load Balancing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Distribute orders to 5 fulfillment warehouses
-- Each warehouse gets roughly equal order count
WITH order_assignment AS (
    SELECT 
        order_id,
        order_date,
        total_amount,
        NTILE(5) OVER (ORDER BY order_id) AS warehouse_id
    FROM pending_orders
)
SELECT 
    warehouse_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_value
FROM order_assignment
GROUP BY warehouse_id
ORDER BY warehouse_id;
 
-- Alternative: Balance by value, not count
WITH value_balanced AS (
    SELECT 
        order_id,
        total_amount,
        SUM(total_amount) OVER (ORDER BY total_amount DESC) AS running_total,
        NTILE(5) OVER (ORDER BY total_amount DESC) AS warehouse_id
    FROM pending_orders
)
SELECT * FROM value_balanced;

Application 2: Stratified Sampling

Selecting representative samples from each tier:

Stratified Sampling
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Select 2 samples from each quartile of customer spending
WITH quartiled AS (
    SELECT 
        customer_id,
        total_spending,
        NTILE(4) OVER (ORDER BY total_spending) AS spending_quartile,
        ROW_NUMBER() OVER (
            PARTITION BY NTILE(4) OVER (ORDER BY total_spending)
            ORDER BY RANDOM()
        ) AS random_rank
    FROM customers
)
-- Workaround: can't nest window functions directly
, rerank AS (
    SELECT 
        customer_id,
        total_spending,
        spending_quartile,
        ROW_NUMBER() OVER (
            PARTITION BY spending_quartile 
            ORDER BY RANDOM()
        ) AS rand_within_quartile
    FROM (
        SELECT 
            customer_id,
            total_spending,
            NTILE(4) OVER (ORDER BY total_spending) AS spending_quartile
        FROM customers
    ) t
)
SELECT 
    customer_id,
    total_spending,
    spending_quartile
FROM rerank
WHERE rand_within_quartile <= 2
ORDER BY spending_quartile, total_spending;

Application 3: Batch Processing

Dividing large datasets into processable chunks:

Batch Processing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Create 10 batches for nightly processing
-- Process one batch per hour overnight
CREATE VIEW processing_batches AS
SELECT 
    record_id,
    data_payload,
    NTILE(10) OVER (ORDER BY record_id) AS batch_number,
    CURRENT_DATE AS scheduled_date
FROM large_processing_queue
WHERE status = 'pending';
 
-- Query for specific batch
SELECT * FROM processing_batches WHERE batch_number = 3;
 
-- Verify batch sizes are balanced
SELECT 
    batch_number,
    COUNT(*) AS record_count,
    MIN(record_id) AS first_id,
    MAX(record_id) AS last_id
FROM processing_batches
GROUP BY batch_number
ORDER BY batch_number;

Batch Size Control

To create batches of specific size rather than specific count, divide total rows by desired batch size: NTILE(CEIL(COUNT(*) / 1000.0)) for ~1000-row batches. Compute the count separately and use it as the NTILE argument in a CTE.

PARTITION BY: Independent Distributions

NTILE operates independently within each partition, creating separate equal distributions per group.

Example: Department-Level Quintiles

Partitioned NTILE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- Assign performance quintiles within each department
SELECT 
    department,
    employee_name,
    performance_score,
    NTILE(5) OVER (
        PARTITION BY department 
        ORDER BY performance_score DESC
    ) AS dept_quintile,
    NTILE(5) OVER (
        ORDER BY performance_score DESC
    ) AS global_quintile
FROM employees
ORDER BY department, dept_quintile;

Department vs Global Quintiles
department	employee_name	performance_score	dept_quintile	global_quintile
Engineering	Alice	95	1	1
Engineering	Bob	92	1	1
Engineering	Carol	88	2	2
Engineering	David	85	2	2
Engineering	Emma	80	3	3
Sales	Frank	78	1	3
Sales	Grace	75	1	4
Sales	Henry	72	2	4
Sales	Ivy	70	2	5
Sales	Jack	65	3	5

Key Observation:

Frank has dept_quintile 1 (top 20% of Sales) but global_quintile 3 (middle globally)
Emma has dept_quintile 3 (middle of Engineering) but global_quintile 3 (similar globally)

This captures the nuance: being top performer in a lower-performing department vs. middle performer in a high-performing department. Both quintile perspectives are valuable:

Department quintile: Fair comparison within peer group
Global quintile: Absolute position in organization

Unequal Partition Sizes

When partitions have different row counts, NTILE independently distributes within each. A 5-row partition with NTILE(4) gives tiles of size 2, 1, 1, 1. A 100-row partition gives 25-row tiles. The tile sizes aren't comparable across partitions.

Advanced NTILE Patterns

NTILE enables sophisticated analytical patterns for segmentation and distribution analysis.

Pattern 1: Dynamic Tile Count

Using a variable or calculation for the tile count:

Dynamic Tile Count
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Create tiles based on data characteristics
-- Example: One tile per 100 rows (approximately)
WITH metadata AS (
    SELECT 
        COUNT(*) AS total_rows,
        GREATEST(1, CEIL(COUNT(*) / 100.0)) AS tile_count
    FROM large_table
)
SELECT 
    t.*,
    NTILE(m.tile_count::int) OVER (ORDER BY t.sort_key) AS tile_num
FROM large_table t
CROSS JOIN metadata m;
 
-- Note: Some databases don't allow NTILE with a non-literal argument
-- Alternative using ROW_NUMBER:
WITH numbered AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY sort_key) AS rn, COUNT(*) OVER () AS total
    FROM large_table
)
SELECT 
    *,
    CEIL(rn * 10.0 / total) AS manual_decile
FROM numbered;

Pattern 2: Cross-Tile Analysis

Comparing metrics across tile boundaries:

Cross-Tile Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Analyze behavior differences between customer quartiles
WITH customer_tiles AS (
    SELECT 
        customer_id,
        total_spending,
        order_count,
        avg_order_value,
        NTILE(4) OVER (ORDER BY total_spending DESC) AS spending_quartile
    FROM customer_metrics
)
SELECT 
    spending_quartile,
    COUNT(*) AS customer_count,
    AVG(total_spending) AS avg_spending,
    AVG(order_count) AS avg_order_count,
    AVG(avg_order_value) AS avg_order_value,
    -- Compare to previous quartile
    LAG(AVG(total_spending)) OVER (ORDER BY spending_quartile) AS prev_quartile_spending,
    AVG(total_spending) / NULLIF(LAG(AVG(total_spending)) OVER (ORDER BY spending_quartile), 0) AS spending_ratio
FROM customer_tiles
GROUP BY spending_quartile
ORDER BY spending_quartile;
 
-- This shows: How much more do Q1 customers spend vs Q2? Q2 vs Q3?

Pattern 3: Tile-Based Cohort Creation

Creating balanced cohorts for A/B testing or experimentation:

Cohort Creation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Create balanced test/control cohorts
-- Stratified by spending level to ensure fair comparison
WITH stratified AS (
    SELECT 
        customer_id,
        spending_level,
        -- First, stratify by spending
        NTILE(10) OVER (ORDER BY total_spending) AS spending_stratum
    FROM customers
),
cohort_assigned AS (
    SELECT 
        customer_id,
        spending_stratum,
        -- Within each stratum, randomly assign to test/control
        NTILE(2) OVER (
            PARTITION BY spending_stratum 
            ORDER BY RANDOM()
        ) AS cohort
    FROM stratified
)
SELECT 
    customer_id,
    CASE cohort WHEN 1 THEN 'test' ELSE 'control' END AS cohort_name,
    spending_stratum
FROM cohort_assigned;
 
-- Verify balance: each stratum should have ~50% test, ~50% control
SELECT 
    spending_stratum,
    SUM(CASE cohort WHEN 1 THEN 1 ELSE 0 END) AS test_count,
    SUM(CASE cohort WHEN 2 THEN 1 ELSE 0 END) AS control_count
FROM cohort_assigned
GROUP BY spending_stratum
ORDER BY spending_stratum;

Stratified Experiments

Using NTILE with PARTITION BY enables stratified randomization—ensuring test/control groups have similar distributions across important dimensions. This reduces confounding and improves experiment validity.

NTILE vs Alternative Approaches

Understanding when NTILE is the right tool—and when alternatives work better—improves solution quality.

NTILE vs Alternatives
Approach	Best For	Limitation
NTILE(n)	Equal-count buckets	Doesn't consider value distributions
Width-based bucketing	Equal-width value ranges	May produce very unequal counts
PERCENT_RANK()	Continuous percentile values	No discrete bucket assignment
CUME_DIST()	Cumulative distribution	No direct bucket assignment
CASE with thresholds	Custom bucket boundaries	Requires knowing thresholds in advance

NTILE vs Width-Based Buckets
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Compare NTILE (equal count) vs width-based (equal range)
 
-- NTILE: Equal number of rows per bucket
SELECT 
    value,
    NTILE(4) OVER (ORDER BY value) AS ntile_bucket
FROM (VALUES (10), (20), (30), (35), (37), (38), (39), (100)) AS t(value);
-- Each bucket gets 2 rows regardless of value distribution
 
-- Width-based: Equal value range per bucket
WITH bounds AS (
    SELECT MIN(value) AS min_val, MAX(value) AS max_val FROM sample_data
),
width_bucketed AS (
    SELECT 
        value,
        FLOOR((value - min_val) / ((max_val - min_val) / 4.0)) + 1 AS width_bucket
    FROM sample_data, bounds
)
SELECT * FROM width_bucketed;
-- Buckets may have very different row counts
 
-- NTILE with skewed data:
-- Values: 1, 2, 3, 4, 5, 100, 200, 300
-- NTILE(4): (1,2), (3,4), (5,100), (200,300) - 2 rows each
-- Width(4): (1-75), (75-150), (150-225), (225-300) - very unequal counts

When to Use NTILE:

You need groups of approximately equal count
You're computing percentiles, quartiles, or deciles
You want to divide work evenly across N processors
The value distribution doesn't require equal-width buckets

When to Use Alternatives:

You need equal value ranges (use width-based formulas)
You have specific threshold values (use CASE statements)
You need continuous percentile values (use PERCENT_RANK or CUME_DIST)
You need value-aware grouping (use clustering approaches)

NTILE Is Count-Based, Not Value-Based

NTILE divides rows equally, not values. A dataset with values 1-10 and 1000 gets even row distribution even though the value 1000 is an extreme outlier. For outlier-sensitive bucketing, consider percentile-based approaches or preprocessing.

Summary: Mastering NTILE()

NTILE() provides equal-count distribution into N buckets—essential for percentile analysis, load balancing, and stratified operations. Let's consolidate the key knowledge:

Key Takeaways

•Definition — NTILE(n) divides ordered rows into n approximately equal groups, assigning tile numbers 1 to n
•Distribution Rule — If N rows into T tiles: first (N mod T) tiles get one extra row; sizes differ by at most 1
•ORDER BY Required — Determines which rows go to which tiles; without it, assignment is arbitrary
•Edge Cases — Fewer rows than tiles = some tiles empty; ties = arbitrary assignment within ties
•Statistical Use — NTILE(4) for quartiles, NTILE(10) for deciles, NTILE(100) for percentiles
•Practical Use — Load balancing, batch processing, cohort creation, stratified sampling
•Count-Based — Distributes by row count, not value ranges; each tile has similar row counts

Complete Ranking Function Summary
Function	Assigns	Ties	Range	Primary Use
ROW_NUMBER()	Unique sequence	Arbitrary	1 to N	Pagination, deduplication
RANK()	Position with gaps	Same rank	1 to N	Competition rankings
DENSE_RANK()	Position no gaps	Same rank	1 to D	Nth value queries
NTILE(n)	Bucket assignment	Arbitrary within	1 to n	Percentiles, load balancing

What's Next:

With all four ranking functions mastered, we'll explore practical ranking use cases in the final page of this module. We'll see how to combine these functions for complex analytical scenarios, handle real-world edge cases, and apply ranking patterns to solve business problems.

Page Complete

You now possess comprehensive knowledge of NTILE()—from distribution mechanics through advanced patterns. With all four ranking functions understood, you're equipped to handle percentile analysis, load balancing, stratified sampling, and any equal-distribution scenario.

4 / 5

Loading learning content...

Database Management SystemsRanking Functions

SQL Ranking Functions

LevelIntermediate

Duration60 mins

TopicRanking Functions

4 / 5

NTILE() Function

Dividing Data into Equal Groups

NTILE() serves this purpose. Given N buckets, it distributes rows as evenly as possible across those buckets, assigning each row a bucket number from 1 to N.

This capability is fundamental for:

Statistical analysis — Quartiles (4), quintiles (5), deciles (10), percentiles (100)
Load balancing — Distributing work items across N worker processes
Segmentation — Dividing customers into equal-sized behavioral groups
Fair distribution — Splitting data for testing, sampling, or parallel processing

What You Will Master

Fundamental Definition and Syntax

NTILE(n) is a window function that divides an ordered partition into n approximately equal groups (tiles), assigning each row an integer from 1 to n indicating its group membership.

Formal Definition:

NTILE(n) partitions the rows in the ordered partition into n groups of approximately equal size, assigning tile numbers 1 through n. If the partition has more rows than tiles, each tile receives at least one row. If the partition has fewer rows than tiles, some tiles remain empty (the maximum tile number equals the row count).

The Distribution Rule:

For N rows distributed into T tiles:
- Each tile receives at least ⌊N/T⌋ rows
- The first (N mod T) tiles receive one extra row

NTILE() Syntax
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Basic Syntax
NTILE(number_of_tiles) OVER (
    [PARTITION BY partition_expression, ...]
    ORDER BY sort_expression [ASC | DESC], ...
)
 
-- Split into 4 groups (quartiles)
NTILE(4) OVER (ORDER BY score DESC)
 
-- Split each department into 3 tiers
NTILE(3) OVER (
    PARTITION BY department
    ORDER BY performance_score DESC
)
 
-- Split into 100 groups (percentiles)
NTILE(100) OVER (ORDER BY metric_value)
 
-- Key difference from other ranking functions:
-- NTILE takes an argument: the number of tiles (buckets)

Key Properties of NTILE()

•Argument Required — Unlike other ranking functions, NTILE requires the tile count as an argument
•Return Range — Returns integers 1 to n (where n is the argument, or row count if fewer rows)
•Equal Distribution — Tiles have sizes differing by at most 1 row
•Order Dependent — Which rows go to which tile depends on ORDER BY
•Partition Scoped — Distribution is independent within each partition

Distribution Mechanics: How Rows Are Assigned

Understanding NTILE's row assignment algorithm is essential for predictable results.

The Algorithm:

Count rows N in the partition
Calculate base size: ⌊N/T⌋ (integer division)
Calculate remainder: N mod T
First R tiles get (base + 1) rows; remaining tiles get (base) rows

Example: 10 Rows into 4 Tiles

N = 10, T = 4
Base = ⌊10/4⌋ = 2
Remainder = 10 mod 4 = 2
Distribution: First 2 tiles get 3 rows; last 2 tiles get 2 rows
Result: 3, 3, 2, 2 = 10 total ✓

Distribution Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- 10 salespeople divided into 4 performance quartiles
CREATE TABLE salespeople (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    total_sales DECIMAL(12,2)
);
 
INSERT INTO salespeople VALUES
(1, 'Alice', 100000),
(2, 'Bob', 90000),
(3, 'Carol', 85000),
(4, 'David', 80000),
(5, 'Emma', 75000),
(6, 'Frank', 70000),
(7, 'Grace', 65000),
(8, 'Henry', 60000),
(9, 'Ivy', 55000),
(10, 'Jack', 50000);
 
-- Assign to quartiles (4 groups)
SELECT 
    name,
    total_sales,
    NTILE(4) OVER (ORDER BY total_sales DESC) AS quartile
FROM salespeople;

10 Rows Divided into 4 Tiles
name	total_sales	quartile
Alice	100000	1
Bob	90000	1
Carol	85000	1
David	80000	2
Emma	75000	2
Frank	70000	2
Grace	65000	3
Henry	60000	3
Ivy	55000	4
Jack	50000	4

Analysis:

10 rows ÷ 4 tiles = 2 remainder 2
Tiles 1 and 2 get 3 rows each (the extra rows)
Tiles 3 and 4 get 2 rows each
Total: 3 + 3 + 2 + 2 = 10 ✓

Why ORDER BY Matters:

The ORDER BY determines which rows go into which tiles:

ORDER BY sales DESC: Tile 1 = top performers, Tile 4 = bottom performers
ORDER BY sales ASC: Tile 1 = bottom performers, Tile 4 = top performers
Without ORDER BY: Assignment is arbitrary

Always Specify ORDER BY

Edge Cases: Uneven Divisions and Boundaries

Real-world data rarely divides evenly. Understanding NTILE's boundary conditions prevents surprises.

Edge Case 1: Fewer Rows Than Tiles

If you request 10 tiles but have only 7 rows:

Fewer Rows Than Tiles
1
2
3
4
5
6
7
8
-- 7 rows into 10 tiles
WITH sample AS (
    SELECT n FROM generate_series(1, 7) AS n
)
SELECT 
    n AS value,
    NTILE(10) OVER (ORDER BY n) AS tile
FROM sample;

7 Rows into 10 Tiles
value	tile
1	1
2	2
3	3
4	4
5	5
6	6
7	7

Result: Each row gets its own tile (1-7). Tiles 8, 9, 10 have zero rows—they simply don't appear in the output.

Edge Case 2: One Row Per Tile

When rows exactly equal tiles:

Equal Rows and Tiles
1
2
3
4
5
6
7
8
9
10
-- 5 rows into 5 tiles: perfect division
WITH sample AS (
    SELECT n FROM generate_series(1, 5) AS n
)
SELECT 
    n AS value,
    NTILE(5) OVER (ORDER BY n) AS tile
FROM sample;
 
-- Result: Each row gets exactly one unique tile (1, 2, 3, 4, 5)

Edge Case 3: All Rows in One Tile

NTILE(1) is valid—all rows go to tile 1:

Single Tile
1
2
3
4
5
6
-- All rows in tile 1
SELECT 
    name,
    NTILE(1) OVER (ORDER BY total_sales) AS tile
FROM salespeople;
-- Every row returns tile = 1

Edge Case 4: Ties in ORDER BY

NTILE must assign each row to exactly one tile, even if ORDER BY values are tied. The assignment among tied rows is arbitrary:

Handling Ties
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Ties in ORDER BY: arbitrary assignment within ties
CREATE TABLE scores (id INT, score INT);
INSERT INTO scores VALUES
(1, 100), (2, 100), (3, 100), (4, 90), (5, 90), (6, 80);
 
SELECT 
    id,
    score,
    NTILE(2) OVER (ORDER BY score DESC) AS tile
FROM scores;
 
-- Possible result (tile assignment for tied scores is arbitrary):
-- 100: tiles 1, 1, 1 (or 1, 1, 2 - depends on engine)
-- 90: tiles 1, 2
-- 80: tile 2
 
-- Add secondary order for determinism:
SELECT 
    id,
    score,
    NTILE(2) OVER (ORDER BY score DESC, id) AS tile
FROM scores;

Deterministic Tile Assignment

For reproducible results, include enough ORDER BY columns to break all ties. Adding a unique key (like ID) as the final sort column ensures deterministic tile assignment.

Statistical Applications: Percentiles and Quartiles

NTILE's primary use case is computing statistical divisions: quartiles, quintiles, deciles, and percentiles.

Common Statistical Divisions:

Division	NTILE Arg	Description
Halves	2	Median split
Tertiles	3	Three equal groups
Quartiles	4	25% each (Q1, Q2, Q3, Q4)
Quintiles	5	20% each
Deciles	10	10% each
Percentiles	100	1% each

Application 1: Quartile Analysis

Quartile Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Analyze customer spending by quartile
WITH customer_quartiles AS (
    SELECT 
        customer_id,
        customer_name,
        total_lifetime_value,
        NTILE(4) OVER (ORDER BY total_lifetime_value DESC) AS value_quartile
    FROM customers
)
SELECT 
    value_quartile,
    CASE value_quartile
        WHEN 1 THEN 'Top 25% (VIP)'
        WHEN 2 THEN 'Upper Mid (Growth)'
        WHEN 3 THEN 'Lower Mid (Develop)'
        WHEN 4 THEN 'Bottom 25% (Nurture)'
    END AS segment_name,
    COUNT(*) AS customer_count,
    AVG(total_lifetime_value) AS avg_ltv,
    MIN(total_lifetime_value) AS min_ltv,
    MAX(total_lifetime_value) AS max_ltv
FROM customer_quartiles
GROUP BY value_quartile
ORDER BY value_quartile;

Application 2: Percentile Rank

Computing each row's percentile position:

Percentile Calculation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Calculate percentile rank for each employee's salary
WITH percentiles AS (
    SELECT 
        employee_id,
        employee_name,
        department,
        salary,
        NTILE(100) OVER (ORDER BY salary) AS percentile,
        NTILE(100) OVER (
            PARTITION BY department 
            ORDER BY salary
        ) AS dept_percentile
    FROM employees
)
SELECT 
    employee_name,
    department,
    salary,
    percentile AS global_percentile,
    dept_percentile AS department_percentile,
    CASE 
        WHEN percentile >= 90 THEN 'Top 10%'
        WHEN percentile >= 75 THEN 'Top 25%'
        WHEN percentile >= 50 THEN 'Above Median'
        ELSE 'Below Median'
    END AS performance_tier
FROM percentiles
ORDER BY global_percentile DESC;
 
-- Note: NTILE(100) assigns 1 to lowest 1%, 100 to highest 1%
-- For conventional percentile (100 = top), ORDER BY ascending

NTILE vs PERCENT_RANK

Application 3: Decile Analysis for Performance Evaluation

Dividing employees into 10 performance buckets:

Decile Performance Evaluation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Decile-based performance evaluation
WITH performance_deciles AS (
    SELECT 
        employee_id,
        employee_name,
        performance_score,
        NTILE(10) OVER (ORDER BY performance_score DESC) AS decile
    FROM employee_reviews
    WHERE review_year = 2024
)
SELECT 
    decile,
    CONCAT('Decile ', decile, ' (', (decile - 1) * 10, '-', decile * 10, ' percentile)') AS description,
    COUNT(*) AS employee_count,
    ROUND(AVG(performance_score), 2) AS avg_score,
    CASE 
        WHEN decile <= 2 THEN 'Exceptional - Promote'
        WHEN decile <= 4 THEN 'Strong - Bonus'
        WHEN decile <= 7 THEN 'Meeting Expectations'
        WHEN decile <= 9 THEN 'Needs Improvement'
        ELSE 'Performance Plan'
    END AS action_item
FROM performance_deciles
GROUP BY decile
ORDER BY decile;

Practical Applications Beyond Statistics

NTILE's equal distribution capability extends beyond statistical analysis to operational use cases.

Application 1: Load Balancing Across Workers

Distributing tasks evenly across N processing nodes:

Load Balancing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Distribute orders to 5 fulfillment warehouses
-- Each warehouse gets roughly equal order count
WITH order_assignment AS (
    SELECT 
        order_id,
        order_date,
        total_amount,
        NTILE(5) OVER (ORDER BY order_id) AS warehouse_id
    FROM pending_orders
)
SELECT 
    warehouse_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_value
FROM order_assignment
GROUP BY warehouse_id
ORDER BY warehouse_id;
 
-- Alternative: Balance by value, not count
WITH value_balanced AS (
    SELECT 
        order_id,
        total_amount,
        SUM(total_amount) OVER (ORDER BY total_amount DESC) AS running_total,
        NTILE(5) OVER (ORDER BY total_amount DESC) AS warehouse_id
    FROM pending_orders
)
SELECT * FROM value_balanced;

Application 2: Stratified Sampling

Selecting representative samples from each tier:

Stratified Sampling
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Select 2 samples from each quartile of customer spending
WITH quartiled AS (
    SELECT 
        customer_id,
        total_spending,
        NTILE(4) OVER (ORDER BY total_spending) AS spending_quartile,
        ROW_NUMBER() OVER (
            PARTITION BY NTILE(4) OVER (ORDER BY total_spending)
            ORDER BY RANDOM()
        ) AS random_rank
    FROM customers
)
-- Workaround: can't nest window functions directly
, rerank AS (
    SELECT 
        customer_id,
        total_spending,
        spending_quartile,
        ROW_NUMBER() OVER (
            PARTITION BY spending_quartile 
            ORDER BY RANDOM()
        ) AS rand_within_quartile
    FROM (
        SELECT 
            customer_id,
            total_spending,
            NTILE(4) OVER (ORDER BY total_spending) AS spending_quartile
        FROM customers
    ) t
)
SELECT 
    customer_id,
    total_spending,
    spending_quartile
FROM rerank
WHERE rand_within_quartile <= 2
ORDER BY spending_quartile, total_spending;

Application 3: Batch Processing

Dividing large datasets into processable chunks:

Batch Processing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Create 10 batches for nightly processing
-- Process one batch per hour overnight
CREATE VIEW processing_batches AS
SELECT 
    record_id,
    data_payload,
    NTILE(10) OVER (ORDER BY record_id) AS batch_number,
    CURRENT_DATE AS scheduled_date
FROM large_processing_queue
WHERE status = 'pending';
 
-- Query for specific batch
SELECT * FROM processing_batches WHERE batch_number = 3;
 
-- Verify batch sizes are balanced
SELECT 
    batch_number,
    COUNT(*) AS record_count,
    MIN(record_id) AS first_id,
    MAX(record_id) AS last_id
FROM processing_batches
GROUP BY batch_number
ORDER BY batch_number;

Batch Size Control

PARTITION BY: Independent Distributions

NTILE operates independently within each partition, creating separate equal distributions per group.

Example: Department-Level Quintiles

Partitioned NTILE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- Assign performance quintiles within each department
SELECT 
    department,
    employee_name,
    performance_score,
    NTILE(5) OVER (
        PARTITION BY department 
        ORDER BY performance_score DESC
    ) AS dept_quintile,
    NTILE(5) OVER (
        ORDER BY performance_score DESC
    ) AS global_quintile
FROM employees
ORDER BY department, dept_quintile;

Department vs Global Quintiles
department	employee_name	performance_score	dept_quintile	global_quintile
Engineering	Alice	95	1	1
Engineering	Bob	92	1	1
Engineering	Carol	88	2	2
Engineering	David	85	2	2
Engineering	Emma	80	3	3
Sales	Frank	78	1	3
Sales	Grace	75	1	4
Sales	Henry	72	2	4
Sales	Ivy	70	2	5
Sales	Jack	65	3	5

Key Observation:

Frank has dept_quintile 1 (top 20% of Sales) but global_quintile 3 (middle globally)
Emma has dept_quintile 3 (middle of Engineering) but global_quintile 3 (similar globally)

This captures the nuance: being top performer in a lower-performing department vs. middle performer in a high-performing department. Both quintile perspectives are valuable:

Department quintile: Fair comparison within peer group
Global quintile: Absolute position in organization

Unequal Partition Sizes

Advanced NTILE Patterns

NTILE enables sophisticated analytical patterns for segmentation and distribution analysis.

Pattern 1: Dynamic Tile Count

Using a variable or calculation for the tile count:

Dynamic Tile Count
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Create tiles based on data characteristics
-- Example: One tile per 100 rows (approximately)
WITH metadata AS (
    SELECT 
        COUNT(*) AS total_rows,
        GREATEST(1, CEIL(COUNT(*) / 100.0)) AS tile_count
    FROM large_table
)
SELECT 
    t.*,
    NTILE(m.tile_count::int) OVER (ORDER BY t.sort_key) AS tile_num
FROM large_table t
CROSS JOIN metadata m;
 
-- Note: Some databases don't allow NTILE with a non-literal argument
-- Alternative using ROW_NUMBER:
WITH numbered AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY sort_key) AS rn, COUNT(*) OVER () AS total
    FROM large_table
)
SELECT 
    *,
    CEIL(rn * 10.0 / total) AS manual_decile
FROM numbered;

Pattern 2: Cross-Tile Analysis

Comparing metrics across tile boundaries:

Cross-Tile Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Analyze behavior differences between customer quartiles
WITH customer_tiles AS (
    SELECT 
        customer_id,
        total_spending,
        order_count,
        avg_order_value,
        NTILE(4) OVER (ORDER BY total_spending DESC) AS spending_quartile
    FROM customer_metrics
)
SELECT 
    spending_quartile,
    COUNT(*) AS customer_count,
    AVG(total_spending) AS avg_spending,
    AVG(order_count) AS avg_order_count,
    AVG(avg_order_value) AS avg_order_value,
    -- Compare to previous quartile
    LAG(AVG(total_spending)) OVER (ORDER BY spending_quartile) AS prev_quartile_spending,
    AVG(total_spending) / NULLIF(LAG(AVG(total_spending)) OVER (ORDER BY spending_quartile), 0) AS spending_ratio
FROM customer_tiles
GROUP BY spending_quartile
ORDER BY spending_quartile;
 
-- This shows: How much more do Q1 customers spend vs Q2? Q2 vs Q3?

Pattern 3: Tile-Based Cohort Creation

Creating balanced cohorts for A/B testing or experimentation:

Cohort Creation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Create balanced test/control cohorts
-- Stratified by spending level to ensure fair comparison
WITH stratified AS (
    SELECT 
        customer_id,
        spending_level,
        -- First, stratify by spending
        NTILE(10) OVER (ORDER BY total_spending) AS spending_stratum
    FROM customers
),
cohort_assigned AS (
    SELECT 
        customer_id,
        spending_stratum,
        -- Within each stratum, randomly assign to test/control
        NTILE(2) OVER (
            PARTITION BY spending_stratum 
            ORDER BY RANDOM()
        ) AS cohort
    FROM stratified
)
SELECT 
    customer_id,
    CASE cohort WHEN 1 THEN 'test' ELSE 'control' END AS cohort_name,
    spending_stratum
FROM cohort_assigned;
 
-- Verify balance: each stratum should have ~50% test, ~50% control
SELECT 
    spending_stratum,
    SUM(CASE cohort WHEN 1 THEN 1 ELSE 0 END) AS test_count,
    SUM(CASE cohort WHEN 2 THEN 1 ELSE 0 END) AS control_count
FROM cohort_assigned
GROUP BY spending_stratum
ORDER BY spending_stratum;

Stratified Experiments

NTILE vs Alternative Approaches

Understanding when NTILE is the right tool—and when alternatives work better—improves solution quality.

NTILE vs Alternatives
Approach	Best For	Limitation
NTILE(n)	Equal-count buckets	Doesn't consider value distributions
Width-based bucketing	Equal-width value ranges	May produce very unequal counts
PERCENT_RANK()	Continuous percentile values	No discrete bucket assignment
CUME_DIST()	Cumulative distribution	No direct bucket assignment
CASE with thresholds	Custom bucket boundaries	Requires knowing thresholds in advance

NTILE vs Width-Based Buckets
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Compare NTILE (equal count) vs width-based (equal range)
 
-- NTILE: Equal number of rows per bucket
SELECT 
    value,
    NTILE(4) OVER (ORDER BY value) AS ntile_bucket
FROM (VALUES (10), (20), (30), (35), (37), (38), (39), (100)) AS t(value);
-- Each bucket gets 2 rows regardless of value distribution
 
-- Width-based: Equal value range per bucket
WITH bounds AS (
    SELECT MIN(value) AS min_val, MAX(value) AS max_val FROM sample_data
),
width_bucketed AS (
    SELECT 
        value,
        FLOOR((value - min_val) / ((max_val - min_val) / 4.0)) + 1 AS width_bucket
    FROM sample_data, bounds
)
SELECT * FROM width_bucketed;
-- Buckets may have very different row counts
 
-- NTILE with skewed data:
-- Values: 1, 2, 3, 4, 5, 100, 200, 300
-- NTILE(4): (1,2), (3,4), (5,100), (200,300) - 2 rows each
-- Width(4): (1-75), (75-150), (150-225), (225-300) - very unequal counts

When to Use NTILE:

You need groups of approximately equal count
You're computing percentiles, quartiles, or deciles
You want to divide work evenly across N processors
The value distribution doesn't require equal-width buckets

When to Use Alternatives:

You need equal value ranges (use width-based formulas)
You have specific threshold values (use CASE statements)
You need continuous percentile values (use PERCENT_RANK or CUME_DIST)
You need value-aware grouping (use clustering approaches)

NTILE Is Count-Based, Not Value-Based

Summary: Mastering NTILE()

NTILE() provides equal-count distribution into N buckets—essential for percentile analysis, load balancing, and stratified operations. Let's consolidate the key knowledge:

Key Takeaways

•Definition — NTILE(n) divides ordered rows into n approximately equal groups, assigning tile numbers 1 to n
•Distribution Rule — If N rows into T tiles: first (N mod T) tiles get one extra row; sizes differ by at most 1
•ORDER BY Required — Determines which rows go to which tiles; without it, assignment is arbitrary
•Edge Cases — Fewer rows than tiles = some tiles empty; ties = arbitrary assignment within ties
•Statistical Use — NTILE(4) for quartiles, NTILE(10) for deciles, NTILE(100) for percentiles
•Practical Use — Load balancing, batch processing, cohort creation, stratified sampling
•Count-Based — Distributes by row count, not value ranges; each tile has similar row counts

Complete Ranking Function Summary
Function	Assigns	Ties	Range	Primary Use
ROW_NUMBER()	Unique sequence	Arbitrary	1 to N	Pagination, deduplication
RANK()	Position with gaps	Same rank	1 to N	Competition rankings
DENSE_RANK()	Position no gaps	Same rank	1 to D	Nth value queries
NTILE(n)	Bucket assignment	Arbitrary within	1 to n	Percentiles, load balancing

What's Next:

Page Complete

4 / 5