Loading learning content...
Hierarchies are everywhere in data: organizational charts, file systems, product category trees, geographic subdivisions, biological taxonomies, and document outlines. When a 1:N unary relationship is applied to an entity type, the resulting structure is a hierarchy or tree.
The hierarchical structure is perhaps the most important application of self-referential relationships in practical database design. Understanding how to model, implement, and query hierarchies efficiently is essential for any database professional.
This page dives deep into hierarchical structures—their formal properties, common implementation patterns, query techniques, and performance considerations. Mastering hierarchies transforms your ability to model complex organizational and categorical structures that appear in virtually every business domain.
By the end of this page, you will understand the formal properties of tree structures, multiple implementation strategies for hierarchies in relational databases (adjacency list, nested sets, path enumeration, closure tables), their tradeoffs, and common query patterns. You'll be equipped to choose the right approach for your specific requirements.
A hierarchy formed by a 1:N unary relationship is formally a rooted tree. Understanding tree properties is essential for designing correct hierarchical models.
Formal Definition:
A rooted tree T = (V, E, r) consists of:
Such that:
| Term | Definition | Example (Org Chart) |
|---|---|---|
| Root | Node with no parent | CEO |
| Leaf | Node with no children | Entry-level employee |
| Internal Node | Node with at least one child | Any manager |
| Parent | Immediate ancestor of a node | Direct supervisor |
| Child | Immediate descendant of a node | Direct report |
| Ancestor | Any node on path from root to current node | Any higher-level supervisor |
| Descendant | Any node reachable by following child links | Anyone in an org subtree |
| Sibling | Nodes sharing the same parent | Employees with same supervisor |
| Depth | Distance from root (root = 0) | Organizational level |
| Height | Maximum depth of any leaf | Deepest org level |
| Subtree | A node and all its descendants | A department and its staff |
Tree Structure: Organizational Hierarchy Depth 0 (Root): ┌─────────────┐ │ CEO │ ← Root node │ (Alice) │ └──────┬──────┘ │ ┌─────────────────┼─────────────────┐ │ │ │Depth 1: ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │ VP Eng │ │VP Sales │ │ VP Ops │ │ (Bob) │ │ (Carol) │ │ (Dave) │ └────┬────┘ └────┬────┘ └─────────┘ │ │ ↑ ┌────┴────┐ │ Leaf node │ │ │Depth 2: ┌────┐ ┌────┐ ┌────┐ │Mgr1│ │Mgr2│ │Mgr3│ ← Internal nodes │(Ed)│ │(Fay)│ │(Grace)│ └─┬──┘ └──┬─┘ └──┬──┘ │ │ │Depth 3: ┌─┴─┐ ┌─┴─┐ ┌──┴──┐ │Dev│ │Dev│ │ Rep │ ← Leaf nodes └───┘ └───┘ └─────┘ Properties of this tree:• Root: CEO (Alice)• Height: 3• CEO's children: VP Eng, VP Sales, VP Ops• VP Eng's ancestors: CEO• Mgr1's descendants: Dev (one level)• Ed and Fay are siblings (same parent: Bob)A strict tree has ONE root. Some business domains have multiple roots (multiple top-level categories, multiple CEOs in subsidiaries). In such cases, you can model as a 'forest' (multiple trees) or add a virtual root node that serves as parent to all top-level nodes.
The adjacency list model is the most intuitive and commonly used approach for implementing hierarchies. Each row stores a reference to its parent (or NULL for roots).
Implementation:
1234567891011121314151617181920212223242526272829303132333435
-- Adjacency List: Each node stores its parent's IDCREATE TABLE Category ( category_id INT PRIMARY KEY, name VARCHAR(100) NOT NULL, parent_id INT, -- NULL = root category CONSTRAINT fk_parent FOREIGN KEY (parent_id) REFERENCES Category(category_id) ON DELETE CASCADE); -- Sample data: Product category hierarchyINSERT INTO Category VALUES (1, 'Electronics', NULL); -- RootINSERT INTO Category VALUES (2, 'Computers', 1); INSERT INTO Category VALUES (3, 'Phones', 1);INSERT INTO Category VALUES (4, 'Laptops', 2);INSERT INTO Category VALUES (5, 'Desktops', 2);INSERT INTO Category VALUES (6, 'Smartphones', 3);INSERT INTO Category VALUES (7, 'Tablets', 3);INSERT INTO Category VALUES (8, 'Gaming Laptops', 4);INSERT INTO Category VALUES (9, 'Business Laptops', 4); /*Resulting hierarchy:Electronics (1)├── Computers (2)│ ├── Laptops (4)│ │ ├── Gaming Laptops (8)│ │ └── Business Laptops (9)│ └── Desktops (5)└── Phones (3) ├── Smartphones (6) └── Tablets (7)*/Query Patterns for Adjacency List:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- Query 1: Get immediate children of a categorySELECT * FROM Category WHERE parent_id = 2; -- Children of 'Computers' -- Query 2: Get parent of a categorySELECT p.* FROM Category cJOIN Category p ON c.parent_id = p.category_idWHERE c.category_id = 4; -- Parent of 'Laptops' -- Query 3: Get root categoriesSELECT * FROM Category WHERE parent_id IS NULL; -- Query 4: Get leaf categories (no children)SELECT c.* FROM Category cLEFT JOIN Category child ON child.parent_id = c.category_idWHERE child.category_id IS NULL; -- Query 5: Get ALL descendants (using recursive CTE)WITH RECURSIVE Descendants AS ( SELECT category_id, name, parent_id, 0 AS depth FROM Category WHERE category_id = 1 -- Start from Electronics UNION ALL SELECT c.category_id, c.name, c.parent_id, d.depth + 1 FROM Category c JOIN Descendants d ON c.parent_id = d.category_id)SELECT * FROM Descendants ORDER BY depth, name; -- Query 6: Get ALL ancestors (path to root)WITH RECURSIVE Ancestors AS ( SELECT category_id, name, parent_id, 0 AS dist FROM Category WHERE category_id = 8 -- Start from Gaming Laptops UNION ALL SELECT c.category_id, c.name, c.parent_id, a.dist + 1 FROM Category c JOIN Ancestors a ON a.parent_id = c.category_id)SELECT * FROM Ancestors ORDER BY dist DESC; -- Root firstUse adjacency list when: hierarchy changes frequently (adds, deletes, moves); you mostly query parent-child relationships; depth is shallow (< 10 levels); or your database supports recursive CTEs. It's the best default choice for most applications.
Path enumeration stores the full path from root to each node as a delimited string. This denormalizes ancestor information for faster queries.
Implementation:
1234567891011121314151617181920212223242526272829
-- Path Enumeration: Store full path to each nodeCREATE TABLE Category ( category_id INT PRIMARY KEY, name VARCHAR(100) NOT NULL, path VARCHAR(1000) NOT NULL, -- e.g., '/1/2/4/8/' CHECK (path LIKE '/%/') -- Must start and end with delimiter); -- Sample data with materialized pathsINSERT INTO Category VALUES (1, 'Electronics', '/1/');INSERT INTO Category VALUES (2, 'Computers', '/1/2/');INSERT INTO Category VALUES (3, 'Phones', '/1/3/');INSERT INTO Category VALUES (4, 'Laptops', '/1/2/4/');INSERT INTO Category VALUES (5, 'Desktops', '/1/2/5/');INSERT INTO Category VALUES (6, 'Smartphones', '/1/3/6/');INSERT INTO Category VALUES (7, 'Tablets', '/1/3/7/');INSERT INTO Category VALUES (8, 'Gaming Laptops', '/1/2/4/8/');INSERT INTO Category VALUES (9, 'Business Laptops', '/1/2/4/9/'); /*Path interpretation:- '/1/' = Electronics (root)- '/1/2/' = Electronics > Computers- '/1/2/4/' = Electronics > Computers > Laptops- '/1/2/4/8/' = Electronics > Computers > Laptops > Gaming Laptops The path encodes the full ancestry chain!*/Query Patterns for Path Enumeration:
Path enumeration enables powerful queries using string pattern matching:
1234567891011121314151617181920212223242526272829303132
-- Query 1: Get ALL descendants of Computers (id=2)-- All descendants have paths STARTING WITH '/1/2/'SELECT * FROM Category WHERE path LIKE '/1/2/%'; -- Query 2: Get ALL ancestors of Gaming Laptops (id=8)-- Ancestors are entries whose path is a PREFIX of '/1/2/4/8/'SELECT * FROM Category WHERE '/1/2/4/8/' LIKE path || '%'ORDER BY LENGTH(path); -- Query 3: Get depth of a nodeSELECT (LENGTH(path) - LENGTH(REPLACE(path, '/', ''))) - 1 AS depthFROM Category WHERE category_id = 8; -- Returns 3 -- Query 4: Find immediate parent-- For path '/1/2/4/8/', parent is last ID before final segmentSELECT * FROM CategoryWHERE path = '/1/2/4/' -- Manually extracted parent path -- OR dynamically:SELECT * FROM Category pWHERE (SELECT path FROM Category WHERE category_id = 8) LIKE p.path || '%' AND LENGTH(p.path) = LENGTH((SELECT path FROM Category WHERE category_id = 8)) - 2 - LENGTH(CAST(8 AS VARCHAR)); -- Complex calculation -- Query 5: Get root (path = just this node's ID)SELECT * FROM Category WHERE path = '/' || CAST(category_id AS VARCHAR) || '/'; -- Query 6: Order siblings by path (natural tree order)SELECT * FROM Category ORDER BY path;Use path enumeration when: you frequently query entire subtrees; hierarchy is relatively stable; you need tree ordering often; your database lacks recursive CTEs; or you need to display breadcrumb navigation. Avoid if nodes move frequently.
The nested sets model assigns each node a left and right number such that descendants are contained within the parent's range. This encodes the tree structure through numeric intervals.
The Key Insight:
If we number nodes by traversing the tree depth-first, assigning 'left' on enter and 'right' on exit, then:
Implementation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- Nested Sets: Each node has left-right intervalCREATE TABLE Category ( category_id INT PRIMARY KEY, name VARCHAR(100) NOT NULL, lft INT NOT NULL, -- 'left' is reserved keyword rgt INT NOT NULL, -- 'right' is reserved keyword CHECK (lft < rgt)); -- Sample data with nested set numberingINSERT INTO Category VALUES (1, 'Electronics', 1, 18);INSERT INTO Category VALUES (2, 'Computers', 2, 11);INSERT INTO Category VALUES (3, 'Phones', 12, 17);INSERT INTO Category VALUES (4, 'Laptops', 3, 8);INSERT INTO Category VALUES (5, 'Desktops', 9, 10);INSERT INTO Category VALUES (6, 'Smartphones', 13, 14);INSERT INTO Category VALUES (7, 'Tablets', 15, 16);INSERT INTO Category VALUES (8, 'Gaming Laptops', 4, 5);INSERT INTO Category VALUES (9, 'Business Laptops', 6, 7); /*Nested set numbering via depth-first traversal: Electronics (1,18)├── Computers (2,11)│ ├── Laptops (3,8)│ │ ├── Gaming Laptops (4,5)│ │ └── Business Laptops (6,7)│ └── Desktops (9,10)└── Phones (12,17) ├── Smartphones (13,14) └── Tablets (15,16) Visual representation of intervals:1 [Electronics -------------------------------------------- 18] 2 [Computers ----------------------------- 11] 3 [Laptops -------------- 8] 4 [Gaming] 5 6 [Business] 7 9 [Desktops] 10 12 [Phones ----------------------- 17] 13 [Smart] 14 15 [Tablets] 16 Key insight: Children's intervals are NESTED within parent's interval!*/Query Patterns for Nested Sets:
Nested sets enable elegant single-query solutions for hierarchy operations:
12345678910111213141516171819202122232425262728293031323334
-- Query 1: Get ALL descendants of Computers (lft=2, rgt=11)-- Descendants have lft BETWEEN parent's lft and rgtSELECT * FROM Category WHERE lft > 2 AND rgt < 11ORDER BY lft; -- Returns in tree order! -- Query 2: Get ALL ancestors of Gaming Laptops (lft=4)-- Ancestors' intervals CONTAIN this node's lftSELECT * FROM CategoryWHERE lft < 4 AND rgt > 5ORDER BY lft; -- Returns root first! -- Query 3: Check if node A is ancestor of node BSELECT CASE WHEN (SELECT lft FROM Category WHERE category_id = 2) < (SELECT lft FROM Category WHERE category_id = 8) AND (SELECT rgt FROM Category WHERE category_id = 2) > (SELECT rgt FROM Category WHERE category_id = 8) THEN 'Yes' ELSE 'No'END AS is_ancestor; -- Query 4: Get depth of each nodeSELECT c.name, (SELECT COUNT(*) FROM Category p WHERE p.lft < c.lft AND p.rgt > c.rgt) AS depthFROM Category cORDER BY c.lft; -- Query 5: Get leaf nodes (no one is between their lft and rgt)SELECT * FROM Category WHERE rgt = lft + 1; -- Query 6: Count descendantsSELECT name, (rgt - lft - 1) / 2 AS descendant_countFROM Category;Inserting a node in nested sets requires updating lft/rgt values for ALL nodes to the right. In a tree with 10,000 nodes, inserting a single node might update 5,000+ rows. Use nested sets only for hierarchies that rarely change.
The closure table model stores ALL ancestor-descendant relationships explicitly in a separate table. While it uses more storage, it provides fast queries and reasonable modification performance.
Implementation:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
-- Closure Table: Store ALL paths between ancestors and descendantsCREATE TABLE Category ( category_id INT PRIMARY KEY, name VARCHAR(100) NOT NULL); CREATE TABLE Category_Closure ( ancestor_id INT NOT NULL, descendant_id INT NOT NULL, depth INT NOT NULL, -- Distance between nodes PRIMARY KEY (ancestor_id, descendant_id), FOREIGN KEY (ancestor_id) REFERENCES Category(category_id), FOREIGN KEY (descendant_id) REFERENCES Category(category_id)); -- Main table dataINSERT INTO Category VALUES (1, 'Electronics');INSERT INTO Category VALUES (2, 'Computers');INSERT INTO Category VALUES (4, 'Laptops');INSERT INTO Category VALUES (8, 'Gaming Laptops'); -- Closure table: ALL ancestor-descendant pairs-- Including self-references (depth = 0) -- Electronics relationshipsINSERT INTO Category_Closure VALUES (1, 1, 0); -- SelfINSERT INTO Category_Closure VALUES (1, 2, 1); -- Electronics > ComputersINSERT INTO Category_Closure VALUES (1, 4, 2); -- Electronics > LaptopsINSERT INTO Category_Closure VALUES (1, 8, 3); -- Electronics > Gaming Laptops -- Computers relationshipsINSERT INTO Category_Closure VALUES (2, 2, 0); -- SelfINSERT INTO Category_Closure VALUES (2, 4, 1); -- Computers > LaptopsINSERT INTO Category_Closure VALUES (2, 8, 2); -- Computers > Gaming Laptops -- Laptops relationshipsINSERT INTO Category_Closure VALUES (4, 4, 0); -- SelfINSERT INTO Category_Closure VALUES (4, 8, 1); -- Laptops > Gaming Laptops -- Gaming Laptops (leaf)INSERT INTO Category_Closure VALUES (8, 8, 0); -- Self only /*The closure table pre-computes all paths!For N nodes with average depth D, storage is O(N * D)*/Query Patterns for Closure Table:
Closure tables enable extremely simple and fast queries:
123456789101112131415161718192021222324252627282930313233343536373839404142
-- Query 1: Get ALL descendants of Computers (id=2)SELECT c.* FROM Category cJOIN Category_Closure cc ON c.category_id = cc.descendant_idWHERE cc.ancestor_id = 2 AND cc.depth > 0; -- Query 2: Get ALL ancestors of Gaming Laptops (id=8)SELECT c.* FROM Category cJOIN Category_Closure cc ON c.category_id = cc.ancestor_idWHERE cc.descendant_id = 8 AND cc.depth > 0ORDER BY cc.depth DESC; -- Root first -- Query 3: Get immediate children (depth = 1 only)SELECT c.* FROM Category cJOIN Category_Closure cc ON c.category_id = cc.descendant_idWHERE cc.ancestor_id = 2 AND cc.depth = 1; -- Query 4: Get immediate parentSELECT c.* FROM Category cJOIN Category_Closure cc ON c.category_id = cc.ancestor_idWHERE cc.descendant_id = 8 AND cc.depth = 1; -- Query 5: Check if A is ancestor of BSELECT COUNT(*) > 0 AS is_ancestorFROM Category_ClosureWHERE ancestor_id = 2 AND descendant_id = 8; -- Query 6: Get subtree sizeSELECT COUNT(*) - 1 AS descendant_count -- Exclude selfFROM Category_ClosureWHERE ancestor_id = 2; -- Insert new node (Gaming Laptops child: "RTX 4090 Laptops", id=10)-- 1. Insert to main tableINSERT INTO Category VALUES (10, 'RTX 4090 Laptops'); -- 2. Copy all ancestors' closure entries + add selfINSERT INTO Category_Closure (ancestor_id, descendant_id, depth)SELECT cc.ancestor_id, 10, cc.depth + 1FROM Category_Closure ccWHERE cc.descendant_id = 8 -- Parent nodeUNION ALLSELECT 10, 10, 0; -- Self-referenceUse closure table when: you need fast reads AND reasonable writes; hierarchy depth varies significantly; you query ancestors and descendants equally often; you want explicit control over what's stored. It's often the best balanced choice.
Each hierarchy model has distinct tradeoffs. The right choice depends on your specific read/write patterns, hierarchy characteristics, and database capabilities.
Complexity Analysis:
| Operation | Adjacency List | Path Enum | Nested Sets | Closure Table |
|---|---|---|---|---|
| Get children | O(1) | O(n) | O(n) | O(1) |
| Get parent | O(1) | O(n)* | O(log n)** | O(1) |
| Get all descendants | O(n)* | O(n) | O(n) | O(descendants) |
| Get all ancestors | O(depth)* | O(n) | O(log n) | O(depth) |
| Insert leaf | O(1) | O(1) | O(n) | O(depth) |
| Delete leaf | O(1) | O(1) | O(n) | O(depth) |
| Move subtree | O(1) | O(subtree) | O(n) | O(subtree * depth) |
| Storage | O(n) | O(n * depth) | O(n) | O(n * depth) |
*Requires recursive CTE **With index on lft column
Decision Guide:
Many production systems combine approaches: adjacency list for integrity + path column for queries; or adjacency list + closure table for complex queries. Choose based on actual query patterns, and don't be afraid to combine techniques.
Beyond the core models, several practical considerations affect hierarchy implementation:
Handling Multiple Roots:
Some hierarchies have multiple top-level nodes (multiple product lines, geographic regions). Options:
1234567891011121314
-- Virtual root approachINSERT INTO Category VALUES (0, 'Root', NULL); -- Virtual rootUPDATE Category SET parent_id = 0 WHERE parent_id IS NULL; -- Advantage: Single tree, uniform queries-- Disadvantage: Extra node in all results, must filter out -- Forest approach (multiple real roots)SELECT * FROM Category WHERE parent_id IS NULL; -- Get all roots-- Each root defines an independent subtree -- Flag approachALTER TABLE Category ADD COLUMN is_root BOOLEAN DEFAULT FALSE;UPDATE Category SET is_root = TRUE WHERE parent_id IS NULL;Enforcing Tree Constraints:
Relational databases don't natively enforce that a self-referencing structure is a tree (acyclic, single-rooted). Additional safeguards:
Indexing Strategies:
Proper indexing is critical for hierarchy performance:
123456789101112131415
-- Adjacency list: Index parent_id for child lookupsCREATE INDEX idx_employee_supervisor ON Employee(supervisor_id); -- Path enumeration: Index for LIKE prefix queries-- Note: Standard B-tree works for prefix matches (LIKE 'value%')CREATE INDEX idx_category_path ON Category(path); -- Nested sets: Index both lft and rgt for range queriesCREATE INDEX idx_category_lft ON Category(lft);CREATE INDEX idx_category_rgt ON Category(rgt);-- Or composite: (lft, rgt) for covering index on subtree queries -- Closure table: Index for both query directionsCREATE INDEX idx_closure_ancestor ON Category_Closure(ancestor_id, depth);CREATE INDEX idx_closure_descendant ON Category_Closure(descendant_id, depth);Some databases limit recursive CTE depth (e.g., PostgreSQL default is 1000, SQL Server is 100). For very deep hierarchies, you may need to adjust these limits or use non-recursive approaches like path enumeration.
Hierarchical data structures are fundamental to database design. Let's consolidate the key insights from this comprehensive treatment:
What's Next:
The final page of this module presents Examples (Employee-Manager)—a complete, worked example of modeling, implementing, and querying an organizational hierarchy. You'll see all the concepts from this module applied to a realistic scenario, cementing your understanding through practical application.
You now understand hierarchical structures in depth—from formal tree properties through four distinct implementation models. You can analyze tradeoffs and choose the appropriate model for any hierarchical data requirement. The final page will bring these concepts together in a complete practical example.