Loading content...
Inserting elements into a BST one by one can produce a terribly unbalanced tree. Insert [1, 2, 3, 4, 5] into an empty BST, and you get a linked list with O(n) height. But what if we start with sorted data? Can we construct a balanced BST directly, guaranteeing O(log n) height?
The answer is yes—and the algorithm is elegant. This is the sorted array to balanced BST problem, a classic example of divide-and-conquer applied to tree construction.
By the end of this page, you will understand why sorted data enables efficient balanced BST construction, master the divide-and-conquer algorithm for building height-balanced BSTs, analyze why choosing the middle element as root guarantees balance, and extend the pattern to sorted linked lists and multiple balancing objectives.
To appreciate the solution, let's first understand the problem. If we insert sorted elements into a BST sequentially:
123456789101112131415161718192021222324252627282930313233343536
def insert(root, val): """Standard BST insertion.""" if root is None: return TreeNode(val) if val < root.val: root.left = insert(root.left, val) else: root.right = insert(root.right, val) return root # Naive construction from sorted arraydef build_bst_naive(sorted_arr): """ Insert elements one by one. PROBLEM: For sorted input, this creates a degenerate tree! """ root = None for val in sorted_arr: root = insert(root, val) return root # Example: build_bst_naive([1, 2, 3, 4, 5])# Creates:# 1# \# 2# \# 3# \# 4# \# 5# Height = 5 (should be 3 for balanced)Why This Happens:
When elements are already sorted, each new element is greater than all previous elements. It always goes to the right of every existing node, creating a right-skewed chain.
The Consequences:
We need a smarter construction strategy.
The critical observation is this:
In a sorted array, the middle element is the ideal root for a balanced BST.
Why?
BST property is maintained: All elements to the left of the middle are smaller; all to the right are larger. This is exactly what BST requires.
Balance is achieved: By choosing the middle, roughly half the elements go to the left subtree and half to the right. This minimizes height.
Recursion applies perfectly: The left half is itself a sorted array—apply the same logic to build the left subtree. Same for the right half.
Visualizing the Construction:
For sorted array [1, 2, 3, 4, 5, 6, 7]:
The result is a perfectly balanced BST with height 3 instead of height 7. Every level is completely filled (for n = 2^k - 1 elements).
This is the divide-and-conquer paradigm in action: solve smaller subproblems (left half, right half), combine solutions (attach as subtrees of middle element).
This algorithm mirrors how binary search works: pick the middle, decide which half to explore. The resulting BST is also the tree that binary search would implicitly traverse. The array's binary search structure becomes explicit in the tree structure.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
class TreeNode: def __init__(self, val=0, left=None, right=None): self.val = val self.left = left self.right = right def sorted_array_to_bst(nums): """ Convert sorted array to height-balanced BST. A height-balanced BST is defined as a binary tree where the depth of the two subtrees of every node never differs by more than 1. Time Complexity: O(n) - visit each element exactly once Space Complexity: O(log n) - recursion depth for balanced tree (O(n) if counting the output tree) Args: nums: Sorted array of integers (ascending order) Returns: Root of the height-balanced BST """ def build(left, right): """Build BST from nums[left:right+1].""" # Base case: empty range if left > right: return None # Choose middle element as root # For even-length arrays, this chooses the left-middle mid = (left + right) // 2 # Create root node root = TreeNode(nums[mid]) # Recursively build left subtree from elements before mid root.left = build(left, mid - 1) # Recursively build right subtree from elements after mid root.right = build(mid + 1, right) return root if not nums: return None return build(0, len(nums) - 1) # Alternative: Using array slicing (less efficient but cleaner)def sorted_array_to_bst_slicing(nums): """ Version using array slices. Easier to understand but O(n log n) time due to slicing overhead. Use index-based version for production. """ if not nums: return None mid = len(nums) // 2 root = TreeNode(nums[mid]) root.left = sorted_array_to_bst_slicing(nums[:mid]) root.right = sorted_array_to_bst_slicing(nums[mid + 1:]) return root # Variant: Choosing right-middle for even-length arraysdef sorted_array_to_bst_right_middle(nums): """ For arrays with even length, choose the right-middle. Produces a different valid balanced BST. """ def build(left, right): if left > right: return None # Choose right-middle for even lengths mid = (left + right + 1) // 2 # Rounds up instead of down root = TreeNode(nums[mid]) root.left = build(left, mid - 1) root.right = build(mid + 1, right) return root if not nums: return None return build(0, len(nums) - 1)Time Complexity: O(n)
Each element in the array is visited exactly once:
Total work: O(n)
Note: The slicing version is O(n log n) due to the O(k) cost of creating slices at each level of recursion. The index-based version avoids this overhead.
Space Complexity: O(log n) + O(n)
Two components:
Recursion stack: O(log n) — The tree we're building is balanced, so the maximum recursion depth is O(log n).
Output tree: O(n) — The tree itself contains n nodes.
If we count only auxiliary space (excluding output), space is O(log n). If we count total space, it's O(n).
| Aspect | Naive (Sequential Insert) | Optimal (Middle Element) |
|---|---|---|
| Time (construction) | O(n²) worst case | O(n) |
| Space (recursion) | O(n) | O(log n) |
| Resulting height | O(n) | O(log n) |
| Tree balance | Degenerate (linked list) | Perfectly balanced |
| Subsequent operations | O(n) | O(log n) |
The time complexity follows the recurrence T(n) = 2T(n/2) + O(1), which solves to O(n) by the Master Theorem (a=2, b=2, k=0). Compare with merge sort's T(n) = 2T(n/2) + O(n) = O(n log n)—the difference is O(1) vs O(n) merge step.
Let's convince ourselves that this construction produces a height-balanced tree.
Definition Recap: A binary tree is height-balanced (AVL condition) if for every node, the heights of its left and right subtrees differ by at most 1.
Proof by Induction:
Base Case: A single element produces a tree of height 1 (single node). This is trivially balanced.
Inductive Step: Assume the construction produces balanced trees for all arrays of size < n. For an array of size n:
We choose the middle element, splitting the array into:
The sizes differ by at most 1 (for n even, they're equal; for n odd, one side has one more).
By the inductive hypothesis, both subtrees are balanced.
Since the subtree sizes differ by at most 1, their heights differ by at most 1.
Therefore, the root node has balanced subtrees, and the overall tree is balanced. ∎
| Array Size n | Tree Height h | Left Subtree Size | Right Subtree Size |
|---|---|---|---|
| 1 | 1 | 0 | 0 |
| 3 | 2 | 1 | 1 |
| 7 | 3 | 3 | 3 |
| 15 | 4 | 7 | 7 |
| n = 2^k - 1 | k | (n-1)/2 | (n-1)/2 |
| n (general) | ⌊log₂(n)⌋ + 1 | ≈n/2 | ≈n/2 |
For n = 2^k - 1 (perfect binary tree sizes), the result is a perfect binary tree—every level is completely full.
For other values of n, the result is height-balanced but not perfect. The deepest level may not be completely filled, and the height is ⌊log₂(n)⌋ + 1.
For arrays with an odd number of elements, there's a unique middle element. But for even-length arrays, we have two choices:
mid = (left + right) // 2 (rounds down)mid = (left + right + 1) // 2 (rounds up)Both produce valid, height-balanced BSTs, but with different structures.
Example: [1, 2, 3, 4]
Left-middle approach:
Right-middle approach:
Left-Middle BST:
2
/ \
1 3
\
4
Left subtree: 1 node Right subtree: 2 nodes
Right-Middle BST:
3
/ \
2 4
/
1
Left subtree: 2 nodes Right subtree: 1 node
Both trees have height 3 and are balanced. The choice is typically arbitrary (left-middle is more common as it matches integer division semantics in most languages).
Interview Note: If asked, either choice is correct. You might mention that LeetCode problems often accept both answers.
A common variant is converting a sorted linked list to a balanced BST. The challenge: linked lists don't support O(1) random access, so finding the middle takes O(n) time.
Approach 1: Convert to Array First
Simplest solution: traverse the linked list once to build an array, then apply the array algorithm.
Approach 2: Inorder Simulation
Clever approach that avoids the extra array by simulating inorder traversal while building the tree.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
class ListNode: def __init__(self, val=0, next=None): self.val = val self.next = next def sorted_list_to_bst_inorder(head): """ Convert sorted linked list to balanced BST using inorder simulation. Key Insight: Inorder traversal of a BST visits nodes in sorted order. We can build the tree while "consuming" the linked list in order. Time Complexity: O(n) Space Complexity: O(log n) for recursion stack """ # First, count the number of nodes def count_nodes(node): count = 0 while node: count += 1 node = node.next return count n = count_nodes(head) # Use a mutable reference to track current list position current = [head] def build(left, right): """Build balanced BST for range [left, right].""" if left > right: return None mid = (left + right) // 2 # Build left subtree first (inorder: left -> root -> right) left_subtree = build(left, mid - 1) # Now current[0] points to the node that should be the root root = TreeNode(current[0].val) current[0] = current[0].next # Advance to next list node # Attach left subtree root.left = left_subtree # Build right subtree root.right = build(mid + 1, right) return root return build(0, n - 1) def sorted_list_to_bst_with_array(head): """ Simple approach: convert to array first. Time: O(n) Space: O(n) for array """ # Convert linked list to array nums = [] current = head while current: nums.append(current.val) current = current.next # Use the array algorithm return sorted_array_to_bst(nums) # Using slow/fast pointer to find middledef sorted_list_to_bst_slow_fast(head): """ Find middle using slow/fast pointers, then recurse. Time: O(n log n) - finding middle takes O(n) at each level Space: O(log n) for recursion """ if head is None: return None if head.next is None: return TreeNode(head.val) # Find the middle node slow, fast = head, head prev = None while fast and fast.next: prev = slow slow = slow.next fast = fast.next.next # slow is now the middle node # Cut the list into two halves if prev: prev.next = None # slow becomes the root root = TreeNode(slow.val) # Recursively build subtrees if slow != head: # Avoid infinite recursion root.left = sorted_list_to_bst_slow_fast(head) root.right = sorted_list_to_bst_slow_fast(slow.next) return rootThe inorder simulation is elegant: instead of finding the middle and grabbing its value, we build the left subtree first (which 'consumes' all smaller values from the list), then take the current list head as the root, then build the right subtree. The list naturally advances in sorted order, matching BST inorder.
The sorted-to-BST pattern appears in several variations:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
def bst_to_sorted_doubly_linked_list(root): """ Convert BST to sorted circular doubly linked list. Uses inorder traversal to visit nodes in sorted order, connecting them into a doubly linked list. Time: O(n) Space: O(h) for recursion """ if not root: return None first = [None] # Head of the list last = [None] # Tail/previous node during traversal def inorder(node): if not node: return inorder(node.left) # Process current node if last[0]: # Link previous node to current last[0].right = node node.left = last[0] else: # First node in inorder first[0] = node last[0] = node inorder(node.right) inorder(root) # Make it circular first[0].left = last[0] last[0].right = first[0] return first[0] def merge_two_bsts(root1, root2): """ Merge two BSTs into a single balanced BST. Strategy: 1. Convert both BSTs to sorted arrays (inorder) 2. Merge the two sorted arrays 3. Build balanced BST from merged array Time: O(m + n) Space: O(m + n) """ def inorder(node, arr): if not node: return inorder(node.left, arr) arr.append(node.val) inorder(node.right, arr) def merge_sorted(arr1, arr2): result = [] i, j = 0, 0 while i < len(arr1) and j < len(arr2): if arr1[i] <= arr2[j]: result.append(arr1[i]) i += 1 else: result.append(arr2[j]) j += 1 result.extend(arr1[i:]) result.extend(arr2[j:]) return result # Get sorted arrays from both trees arr1, arr2 = [], [] inorder(root1, arr1) inorder(root2, arr2) # Merge sorted arrays merged = merge_sorted(arr1, arr2) # Build balanced BST return sorted_array_to_bst(merged)Building balanced BSTs from sorted data has practical applications:
Example: PostgreSQL B-Tree Index Creation
When you run CREATE INDEX on a PostgreSQL column, the database:
This bulk loading is much faster than n individual insertions and produces a perfectly packed index with optimal height.
Converting sorted arrays to balanced BSTs is a fundamental construction pattern. Let's consolidate the key insights:
Looking Ahead:
We've now covered constructing balanced BSTs from scratch. But what about existing, unbalanced BSTs? Can we restore balance without rebuilding entirely? The next page explores balancing an existing BST—the final piece of the BST patterns puzzle.
You now understand how to construct height-balanced BSTs from sorted arrays using divide-and-conquer. You can apply this to arrays, linked lists, and merged data sources, achieving optimal O(log n) height. Next, we'll explore rebalancing existing BSTs.