Parentheses Matching - Learning Module

Loading content...

0/276

Valid Bracket Sequences

Beyond Validation: The Structure of Possibility

In the previous page, we learned how to check whether a bracket sequence is valid. But there's a deeper question: What are all the possible valid sequences?

This isn't just an academic curiosity. Generating all valid bracket sequences appears in:

Code editors that need to auto-complete bracket pairs intelligently
Expression generators for testing and fuzzing
Combinatorial problems that map to parenthesization
Dynamic programming where subproblems have the same structure

The world of valid bracket sequences is remarkably rich. It connects to famous mathematical objects (Catalan numbers), bijections between seemingly unrelated problems, and elegant recursive structures that reveal deep patterns in computation.

What You Will Learn

By the end of this page, you will understand how to generate all valid bracket sequences systematically, know the Catalan number formula and why it counts these sequences, master the recursive structure that enables generation, and see how this problem connects to other combinatorial objects.

Defining Valid Bracket Sequences Precisely

A valid bracket sequence (also called well-formed parentheses or Dyck word) of length 2n is a sequence of n opening brackets and n closing brackets that is balanced.

Formal Recursive Definition: The set of valid bracket sequences S can be defined recursively:

Base case: The empty string ε is a valid bracket sequence
Recursive case 1: If A is a valid sequence, then (A) is a valid sequence
Recursive case 2: If A and B are valid sequences, then AB is a valid sequence

This definition generates all and only the valid sequences.

Constraint-Based Definition: Alternatively, a string s over {(, )} is valid if and only if:

s has equal counts of ( and )
In every prefix of s, the count of ( is ≥ the count of )

Enumeration for Small n:

Let's list all valid sequences for small values of n (number of pairs):

All Valid Bracket Sequences for n = 0 to 4
n	Count	All Valid Sequences
0	1	(empty string)
1	1	()
2	2	(()), ()()
3	5	((())), (()()), (())(), ()(()), ()()()
4	14	(((()))) ((()())) ... (14 total)

The Counting Sequence

Notice the counts: 1, 1, 2, 5, 14, 42, 132, ... This is the famous Catalan number sequence! It appears in over 200 different combinatorial problems, all of which are secretly the same problem in disguise.

Catalan Numbers: The Mathematics of Nesting

The Catalan numbers are one of the most remarkable sequences in mathematics. The n-th Catalan number Cₙ counts the number of valid bracket sequences with n pairs.

The Formula:

Cₙ = (2n)! / ((n+1)! × n!)

Alternatively using binomial coefficients:

Cₙ = (2n choose n) / (n+1)

Or with the binomial coefficient notation:

Cₙ = C(2n, n) - C(2n, n-1)

The Sequence:

n	Cₙ	Computation
0	1	Base case
1	1	2!/(2!×1!) = 1
2	2	4!/(3!×2!) = 2
3	5	6!/(4!×3!) = 5
4	14	8!/(5!×4!) = 14
5	42	10!/(6!×5!) = 42
6	132	12!/(7!×6!) = 132
7	429	14!/(8!×7!) = 429
8	1430	16!/(9!×8!) = 1430
9	4862	18!/(10!×9!) = 4862
10	16796	20!/(11!×10!) = 16796

Recurrence Relation:

Catalan numbers satisfy the recurrence:

C₀ = 1
Cₙ₊₁ = Σᵢ₌₀ⁿ (Cᵢ × Cₙ₋ᵢ)   for n ≥ 0

This recurrence has a beautiful interpretation: To form a valid sequence of (n+1) pairs:

The first character must be (
Some matching ) appears at position 2i+2 (after i pairs inside)
The i pairs inside the outer match can be arranged in Cᵢ ways
The remaining (n-i) pairs after the outer match can be arranged in Cₙ₋ᵢ ways
Sum over all possible positions i for the closing bracket

This decomposition is the key to the recursive generation algorithm.

catalan.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def catalan(n: int) -> int:
    """
    Calculate the n-th Catalan number.
    
    Methods shown: formula, recurrence, and dynamic programming.
    """
    
    # Method 1: Direct formula (may overflow for large n)
    def catalan_formula(n):
        from math import factorial
        return factorial(2*n) // (factorial(n+1) * factorial(n))
    
    # Method 2: Recurrence with memoization
    def catalan_recursive(n, memo={}):
        if n <= 1:
            return 1
        if n in memo:
            return memo[n]
        result = 0
        for i in range(n):
            result += catalan_recursive(i, memo) * catalan_recursive(n-1-i, memo)
        memo[n] = result
        return result
    
    # Method 3: Dynamic Programming (most practical)
    def catalan_dp(n):
        if n <= 1:
            return 1
        catalan = [0] * (n + 1)
        catalan[0] = catalan[1] = 1
        for i in range(2, n + 1):
            for j in range(i):
                catalan[i] += catalan[j] * catalan[i - 1 - j]
        return catalan[n]
    
    return catalan_dp(n)
 
# Print first 15 Catalan numbers
for i in range(15):
    print(f"C_{i} = {catalan(i)}")

Bijections: Same Problem, Different Appearance

Catalan numbers count many things that are secretly the same: binary trees with n internal nodes, ways to triangulate an (n+2)-gon, paths in a grid that don't cross the diagonal, non-crossing partitions, and mountain ranges with n up-strokes. Each can be converted to a bracket sequence and vice versa!

Generating All Valid Bracket Sequences

Now let's solve the classic problem: Generate all valid bracket sequences of n pairs.

The key insight is the constraint-based approach:

We build the sequence character by character
At each step, we track how many opening brackets we've used (open) and how many closing brackets we've used (close)
We can add an opening bracket if open < n
We can add a closing bracket if close < open (never more closers than openers seen)

This backtracking approach generates exactly the Catalan(n) valid sequences with no invalid sequences considered.

generate_parentheses.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
from typing import List
 
def generate_parentheses(n: int) -> List[str]:
    """
    Generate all valid bracket sequences with n pairs.
    
    This is a classic backtracking problem where we build valid
    sequences character by character, using constraints to prune
    invalid branches early.
    
    Time Complexity: O(4^n / sqrt(n)) - Catalan number growth
    Space Complexity: O(n) for recursion depth, O(Catalan(n) * 2n) for output
    
    Args:
        n: Number of bracket pairs
        
    Returns:
        List of all valid bracket sequences
    """
    result = []
    
    def backtrack(current: str, open_count: int, close_count: int):
        """
        Build valid sequences using backtracking.
        
        Invariants maintained:
        - open_count >= close_count always (valid prefix)
        - open_count <= n (can't exceed n openers)
        - close_count <= n (can't exceed n closers)
        
        Args:
            current: The sequence built so far
            open_count: Number of '(' in current
            close_count: Number of ')' in current
        """
        # Base case: complete sequence
        if len(current) == 2 * n:
            result.append(current)
            return
        
        # Choice 1: Add opening bracket
        # Allowed if we haven't used all n openers
        if open_count < n:
            backtrack(current + '(', open_count + 1, close_count)
        
        # Choice 2: Add closing bracket
        # Allowed if there's an unmatched opener (close_count < open_count)
        if close_count < open_count:
            backtrack(current + ')', open_count, close_count + 1)
    
    backtrack("", 0, 0)
    return result
 
 
# Example: Generate all sequences for n = 3
if __name__ == "__main__":
    for n in range(5):
        sequences = generate_parentheses(n)
        print(f"n={n}: {len(sequences)} sequences")
        if n <= 3:
            print(f"  {sequences}")

The Power of Pruning

The naive approach would generate all 2^(2n) binary strings and filter for valid ones. But our constraint-based backtracking never generates invalid prefixes. For n=10, we generate 16,796 sequences (Catalan(10)) instead of checking over a million (2^20 = 1,048,576). This is exponentially more efficient!

Understanding the Recursion Tree

Let's visualize the backtracking process for n=2 (generating all 2-pair sequences):

                     "" (open=0, close=0)
                      |
                     "(" (open=1, close=0)
                    /   \
            "(("          "()"
          (o=2,c=0)     (o=1,c=1)
              |            |
           "(()"         "()("
          (o=2,c=1)     (o=2,c=1)
              |            |
          "(())" ✓       "()()"
          (o=2,c=2)     (o=2,c=2) ✓

Key observations:

Every path from root to leaf is a valid sequence — The constraints ensure this
No node violates close > open — We'd never be able to reach such a state
The branching factor varies — Some nodes have 2 children, some have 1
The tree has exactly Catalan(n) leaves — No more, no less

Detailed Trace for n=2:

Backtracking Trace for n=2
Step	Current	Open	Close	Can Add (	Can Add )	Action
1	""	0	0	Yes	No	Add (
2	"("	1	0	Yes	Yes	Try (
3	"(("	2	0	No	Yes	Add )
4	"(()"	2	1	No	Yes	Add )
5	"(())"	2	2			✓ VALID
6	Back to (					Try )
7	"()"	1	1	Yes	No	Add (
8	"()("	2	1	No	Yes	Add )
9	"()()"	2	2			✓ VALID

Lexicographic Order

The backtracking algorithm generates sequences in lexicographic order because we always try '(' before ')'. The first sequence is always '(((...))', all nested. The last is always '()()...()', all sequential.

Iterative Generation Approach

While recursion is natural and elegant, some contexts require an iterative solution (e.g., avoiding stack overflow for large n, or language constraints). We can use an explicit stack to simulate the recursion.

The key is to store the state (current string, open count, close count) and process states iteratively:

generate_iterative.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
from typing import List, Tuple
 
def generate_parentheses_iterative(n: int) -> List[str]:
    """
    Generate all valid bracket sequences iteratively using an explicit stack.
    
    This avoids recursion overhead and potential stack overflow for large n.
    
    Time Complexity: O(4^n / sqrt(n)) - same as recursive
    Space Complexity: O(n * Catalan(n)) - stack size times sequence length
    """
    if n == 0:
        return [""]
    
    result = []
    
    # Stack holds tuples: (current_string, open_count, close_count)
    stack: List[Tuple[str, int, int]] = [("", 0, 0)]
    
    while stack:
        current, open_count, close_count = stack.pop()
        
        # Check if sequence is complete
        if len(current) == 2 * n:
            result.append(current)
            continue
        
        # Note: We push ')' first so '(' is processed first (LIFO)
        # This maintains lexicographic order
        
        # Option 2: Add closing bracket (pushed first, processed second)
        if close_count < open_count:
            stack.append((current + ')', open_count, close_count + 1))
        
        # Option 1: Add opening bracket (pushed second, processed first)
        if open_count < n:
            stack.append((current + '(', open_count + 1, close_count))
    
    return result
 
 
# Alternative: BFS-style using a queue (generates in different order)
from collections import deque
 
def generate_parentheses_bfs(n: int) -> List[str]:
    """
    Generate using BFS - explores by sequence length level.
    
    This generates sequences level by level (by length), which can be
    useful if you want to stream partial results.
    """
    if n == 0:
        return [""]
    
    result = []
    queue = deque([("", 0, 0)])
    
    while queue:
        current, open_count, close_count = queue.popleft()
        
        if len(current) == 2 * n:
            result.append(current)
            continue
        
        if open_count < n:
            queue.append((current + '(', open_count + 1, close_count))
        
        if close_count < open_count:
            queue.append((current + ')', open_count, close_count + 1))
    
    return result
 
 
# Test both approaches
sequences_dfs = generate_parentheses_iterative(3)
sequences_bfs = generate_parentheses_bfs(3)
 
print("DFS order:", sequences_dfs)
print("BFS order:", sequences_bfs)
print("Same set:", set(sequences_dfs) == set(sequences_bfs))

Order Matters for LIFO

When using a stack (DFS), the order of pushing matters! To get lexicographic order (trying '(' before ')'), we must push ')' first and '(' second. The stack's LIFO behavior then processes '(' first. This subtle detail is easy to get wrong.

Finding the Next Valid Sequence

A more advanced problem: Given a valid bracket sequence, find the next one in lexicographic order (without generating all sequences).

This problem appears in:

Iterating through combinations without storing them all
Implementing bracket sequence iterators
Algorithmic problem-solving challenges

The Algorithm:

Scan from right to left, find the rightmost position where we can change ')' to '('
For this to be valid, after the change there must be enough ')' remaining to close all '('
After the change, fill the remainder with the lexicographically smallest valid suffix

The key insight: we're looking for a ')' that has enough 'budget' to become '(' while the suffix can still be made valid.

next_sequence.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
def next_valid_sequence(s: str) -> str | None:
    """
    Find the next valid bracket sequence in lexicographic order.
    
    Returns None if s is the last sequence (all sequential).
    
    Time Complexity: O(n) - single pass
    Space Complexity: O(n) - to build the result
    
    Example: "(())" -> "()()" -> None (last sequence for n=2)
    """
    n = len(s)
    if n == 0:
        return None
    
    # Convert to list for mutation
    chars = list(s)
    
    # Track balance (open brackets minus close brackets)
    # Scan from right to left
    balance = 0
    
    for i in range(n - 1, -1, -1):
        if chars[i] == ')':
            balance += 1
        else:  # chars[i] == '('
            balance -= 1
            
            # Can we change this '(' to ')' and still make a valid sequence?
            # After changing, balance increases by 2 (we removed one '(', added one ')')
            # We need enough ')' remaining to close all '(' before position i
            # and to close the new ')' we're considering
            
            # If balance > 0, we have excess ')' in the suffix
            # We can change this '(' to ')' if we can rearrange the suffix
            if balance > 0:
                # Change '(' to ')'
                chars[i] = ')'
                
                # Now fill the suffix (positions i+1 to n-1) with the smallest valid string
                # After position i, we have:
                # - Some number of '(' we still need
                # - The remaining positions to fill
                
                remaining_length = n - 1 - i
                # After changing chars[i] to ')', our balance at i is balance + 1
                # (balance was computed assuming chars[i] = '(')
                new_balance = balance + 1
                
                # To complete the sequence:
                # - We need (new_balance) more '(' in the suffix (to match the excess ')')
                # - Then fill the rest with ')'
                
                # Wait, let me reconsider...
                # Actually, remaining_length = n - 1 - i
                # After position i, we need to fill a valid suffix
                # Current "excess closers" from the right = balance (before change)
                # After changing chars[i] from ( to ), we gain 2 in closer count
                
                # Simpler approach: 
                # Fill suffix with maximum '(' first, then ')' to balance
                opens_in_suffix = (remaining_length - balance + 1) // 2
                closes_in_suffix = remaining_length - opens_in_suffix
                
                suffix = '(' * min(opens_in_suffix, remaining_length) + ')' * closes_in_suffix
                
                # Hmm, this needs careful derivation. Let me use a cleaner approach:
                # After position i, just greedily fill to make smallest valid suffix
                
                result = chars[:i+1]
                suffix_len = n - i - 1
                
                # Count how many '(' we have in result
                open_count = result.count('(')
                close_count = result.count(')')
                
                # Fill suffix: add '(' as much as possible, then ')'
                for _ in range(suffix_len):
                    if open_count < n // 2:
                        result.append('(')
                        open_count += 1
                    else:
                        result.append(')')
                        close_count += 1
                
                return ''.join(result)
    
    # If we get here, s is the last sequence (all '()' pairs)
    return None
 
 
# Test the function
sequences = ["(())", "()()", "((()))", "(()())", "(())()", "()(())", "()()()"]
for seq in sequences:
    next_seq = next_valid_sequence(seq)
    print(f"{seq} -> {next_seq}")

Note: The implementation above needs careful attention to the balance calculation. A cleaner approach is to:

Find the rightmost '(' that can become ')'
Change it to ')'
Count remaining opens needed
Fill suffix with that many '(' followed by ')'

This problem is tricky to get right and is a great exercise in careful index management!

The Closure Property and Structural Recursion

Valid bracket sequences have a beautiful closure property: Every valid sequence can be decomposed into smaller valid sequences.

Prime Sequences: A valid sequence is prime if it cannot be written as AB where both A and B are non-empty valid sequences. Examples:

() — prime
(()) — prime (it's (A) where A = ())
()() — NOT prime (it's () + ())
((())) — prime
()(()) — NOT prime

Unique Factorization: Every valid sequence has a unique factorization into prime sequences:

()()() = () × () × ()
(())() = (()) × ()
(()()) = (()()) (prime, so itself)

Why This Matters: This structure enables divide-and-conquer algorithms on bracket sequences. If you can solve a problem for prime sequences and combine solutions, you've solved it for all sequences.

decomposition.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
from typing import List
 
def decompose(s: str) -> List[str]:
    """
    Decompose a valid bracket sequence into its prime factors.
    
    A prime bracket sequence cannot be split into two non-empty
    valid sequences. This function finds all prime factors.
    
    Example: "()(())" -> ["()", "(())"]
    Example: "(()())" -> ["(()())"]  (already prime)
    """
    if not s:
        return []
    
    factors = []
    balance = 0
    start = 0
    
    for i, char in enumerate(s):
        if char == '(':
            balance += 1
        else:
            balance -= 1
        
        # When balance reaches 0, we've found a complete valid sequence
        if balance == 0:
            factors.append(s[start:i+1])
            start = i + 1
    
    return factors
 
 
def is_prime_sequence(s: str) -> bool:
    """
    Check if a bracket sequence is prime (cannot be factored).
    
    A sequence is prime if it only balances at the very end.
    """
    if not s or s == "":
        return False
    
    balance = 0
    for i, char in enumerate(s[:-1]):  # Check all except last
        if char == '(':
            balance += 1
        else:
            balance -= 1
        
        if balance == 0:
            return False  # Balanced before the end
    
    return True
 
 
# Test decomposition
test_cases = [
    "()",
    "(())",
    "()()",
    "()(())",
    "(()())",
    "()()()",
    "((()))(())",
]
 
for s in test_cases:
    factors = decompose(s)
    is_prime = is_prime_sequence(s)
    print(f"{s:15} -> {factors}  {'(prime)' if is_prime else ''}")

Connection to Trees

Prime bracket sequences correspond to rooted ordered trees. The factorization into primes corresponds to a forest of trees. This bijection is why Catalan numbers count both bracket sequences AND binary trees with n internal nodes.

Applications of Valid Bracket Sequence Theory

The theory of valid bracket sequences has far-reaching applications:

Practical Applications

•Expression Tree Enumeration — Every way to parenthesize a mathematical expression corresponds to a valid bracket sequence. For n operands, there are Catalan(n-1) ways to parenthesize.
•Binary Tree Enumeration — Generating all binary tree shapes for testing or exhaustive search. Each Catalan(n) sequence maps to a unique binary tree structure.
•Matrix Chain Multiplication — The number of ways to parenthesize n matrices for multiplication is Catalan(n-1). Dynamic programming finds the optimal parenthesization.
•Code Generation — Automated test generation for parsers needs valid and invalid bracket sequences. Valid sequences test happy paths; strategic mutations test error handling.
•Combinatorial Counting — Many counting problems reduce to Catalan numbers. Understanding bracket sequences gives tools for these problems.
•Polygon Triangulation — Catalan(n-2) counts ways to triangulate an n-gon. Each triangulation maps to a bracket sequence.

Catalan Objects and Their Bracket Representations
Object	n	Catalan(n)	Bracket Interpretation
Binary trees	Internal nodes	Tree shapes	In-order traversal with parentheses
Full binary trees	Leaves - 1	Tree shapes	Leaf sequence structure
Grid paths	Steps right/up	Non-crossing paths	'(' = right, ')' = up
Triangulations	Poly sides - 2	Ways to cut	Cut sequence encoding
Non-crossing partitions	Elements	Partition count	Nesting structure

Summary: The Rich World of Valid Sequences

Key Takeaways

•Valid bracket sequences are counted by Catalan numbers — The n-th Catalan number Cₙ = (2n)! / ((n+1)! × n!) counts sequences with n pairs.
•Backtracking with constraints generates all sequences efficiently — Track open and close counts; only add '(' if open < n, only add ')' if close < open.
•The recursive structure enables elegant algorithms — Catalan's recurrence Cₙ₊₁ = Σ(Cᵢ × Cₙ₋ᵢ) reflects the decomposition of sequences.
•Every sequence factors uniquely into primes — A prime sequence can't be split; factorization finds all prime components.
•Bijections connect bracket sequences to trees, paths, and partitions — The same Catalan count appears across combinatorics.
•Both recursive and iterative generation are possible — Explicit stacks or queues simulate the recursion for iterative contexts.

What's Next:

We've mastered generating and understanding valid bracket sequences. Now we'll apply this knowledge to a practical problem: evaluating postfix expressions. This classic stack application shows how the nesting structure of expressions maps directly to stack-based computation.

Page Complete

You now understand the deep structure of valid bracket sequences—how they're counted, generated, and decomposed. This knowledge connects to binary trees, dynamic programming, and countless combinatorial problems. The Catalan numbers you've met here will appear again and again throughout your algorithmic journey.