Parentheses Matching - Learning Module

Loading content...

0/276

Converting Infix to Postfix (Shunting-Yard Intuition)

Bridging Human and Machine Expression

We've learned that computers love postfix notation—it's unambiguous and evaluates beautifully with a stack. But humans don't write 3 4 2 * +; we write 3 + 4 * 2. The real-world challenge is conversion: taking human-friendly infix expressions and transforming them into computer-friendly postfix.

This problem was elegantly solved in 1961 by Edsger Dijkstra, one of computing's greatest minds. His solution, known as the Shunting-Yard Algorithm, is named for its resemblance to a railroad shunting yard, where train cars are rerouted onto different tracks.

The algorithm uses a single stack and a clever set of rules to handle:

Operator precedence — Multiplication before addition
Associativity — Left-to-right or right-to-left
Parentheses — Overriding default precedence

The result is one of the most beautiful algorithms in computer science—compact, elegant, and practically essential.

What You Will Learn

By the end of this page, you will understand the intuition behind the Shunting-Yard algorithm, master operator precedence and associativity handling, trace through conversions step-by-step, implement the algorithm correctly, and appreciate its role in compilers and calculators.

The Challenge: Why Infix Is Hard

Infix notation is natural for humans but problematic for computers:

Problem 1: Precedence

3 + 4 * 2 equals 3 + (4 * 2) = 11, not (3 + 4) * 2 = 14.

The computer must somehow know that * 'binds tighter' than +.

Problem 2: Associativity

10 - 5 - 2 equals (10 - 5) - 2 = 3, not 10 - (5 - 2) = 7.

Most operators are left-associative: group from left to right. But exponentiation is typically right-associative: 2 ^ 3 ^ 2 = 2 ^ (3 ^ 2) = 2 ^ 9 = 512, not (2 ^ 3) ^ 2 = 64.

Problem 3: Parentheses

(3 + 4) * 2 = 14 overrides the default precedence. We must detect and honor explicit grouping.

Standard Operator Precedence (Lower = Tighter Binding)
Precedence	Operators	Associativity	Example
1 (highest)	^	Right	2^3^2 = 2^(3^2)
2	/	Left	8/4/2 = (8/4)/2
3 (lowest)		Left	5-3-1 = (5-3)-1

The Goal:

We want an algorithm that reads an infix expression and produces an equivalent postfix expression, correctly handling precedence, associativity, and parentheses.

Infix	Postfix
`3 + 4`	`3 4 +`
`3 + 4 * 2`	`3 4 2 * +`
`(3 + 4) * 2`	`3 4 + 2 *`
`2 ^ 3 ^ 2`	`2 3 2 ^ ^`
`10 - 5 - 2`	`10 5 - 2 -`

The Shunting-Yard Intuition

Imagine a railroad yard where trains come in on one track (the input), and we need to reroute them to another track (the output). There's a side track (the stack) where we can temporarily park train cars.

The key insight: Operators must 'wait' until we know whether a higher-precedence operator follows. If it does, that operator goes first. If not, the waiting operator can proceed.

Visual Model:

Input (left to right):
    3 + 4 * 2
        ↓
    ┌─────────┐
    │  Stack  │  (holding area for operators)
    └─────────┘
        ↓
    Output (postfix)

The Rules:

Operands — Go directly to output (no waiting needed)
Operators — Must consider what's already waiting on the stack:
- If the stack top has higher (or equal for left-assoc) precedence, pop it to output first
- Then push the new operator to stack
Left Parenthesis — Push to stack (marker for grouping)
Right Parenthesis — Pop operators to output until matching left parenthesis
End of input — Pop all remaining operators to output

The Core Principle

An operator can only go to the output when we're certain no higher-precedence operator will 'steal' its operands. The stack holds operators in suspense. Higher precedence operators cut in line; lower precedence operators wait their turn.

The Shunting-Yard Algorithm

Algorithm: Infix to Postfix Conversion

Initialize:
    output = empty queue (or list)
    operator_stack = empty stack

For each token in the input (left to right):

    If token is an OPERAND:
        Add token to output
    
    If token is a LEFT PARENTHESIS '(':
        Push token onto operator_stack
    
    If token is a RIGHT PARENTHESIS ')':
        While stack is not empty AND top of stack is not '(':
            Pop operator from stack and add to output
        If stack is empty:
            Error: mismatched parentheses
        Pop the '(' from stack (discard it)
    
    If token is an OPERATOR:
        While stack is not empty
              AND top of stack is not '('
              AND (top has higher precedence than token
                   OR (same precedence AND token is left-associative)):
            Pop operator from stack and add to output
        Push token onto operator_stack

After all tokens processed:
    While stack is not empty:
        If top of stack is '(':
            Error: mismatched parentheses
        Pop operator from stack and add to output

Return output

The Subtle Part: Associativity

For left-associative operators (like + - * /), we pop when precedence is equal or greater. For right-associative operators (like ^), we only pop when precedence is strictly greater. This ensures 2^3^2 becomes 2 3 2 ^ ^ (right-assoc) while 5-3-2 becomes 5 3 - 2 - (left-assoc).

Step-by-Step Trace Examples

Let's trace through several examples to build deep intuition.

Example 1: 3 + 4 * 2

Expected postfix: 3 4 2 * +

(Multiplication before addition because * has higher precedence)

Trace of 3 + 4 * 2
Token	Action	Operator Stack	Output
3	Operand → output	[]	3
	Push (stack empty)	[+]	3
4	Operand → output	[+]	3 4
	: just push	[+, *]	3 4
2	Operand → output	[+, *]	3 4 2
END	Pop all from stack	[]	3 4 2 * +

Example 2: (3 + 4) * 2

Expected postfix: 3 4 + 2 *

(Parentheses override precedence)

Trace of (3 + 4) * 2
Token	Action	Operator Stack	Output
(	Push left paren	[(]
3	Operand → output	[(]	3
	Push (stack top is '(')	[(, +]	3
4	Operand → output	[(, +]	3 4
)	Pop until '('	[]	3 4 +
	Push (stack empty)	[*]	3 4 +
2	Operand → output	[*]	3 4 + 2
END	Pop all	[]	3 4 + 2 *

Example 3: 10 - 5 - 2 (Left associativity)

Expected postfix: 10 5 - 2 -

(Left-to-right: (10-5)-2)

Trace of 10 - 5 - 2
Token	Action	Operator Stack	Output
10	Operand → output	[]	10
	Push (stack empty)	[-]	10
5	Operand → output	[-]	10 5
	Same prec, left-assoc → pop, then push	[-]	10 5 -
2	Operand → output	[-]	10 5 - 2
END	Pop all	[]	10 5 - 2 -

Example 4: 2 ^ 3 ^ 2 (Right associativity)

Expected postfix: 2 3 2 ^ ^

(Right-to-left: 2^(3^2) = 2^9 = 512)

Trace of 2 ^ 3 ^ 2
Token	Action	Operator Stack	Output
2	Operand → output	[]	2
^	Push (stack empty)	[^]	2
3	Operand → output	[^]	2 3
^	Same prec, right-assoc → just push	[^, ^]	2 3
2	Operand → output	[^, ^]	2 3 2
END	Pop all	[]	2 3 2 ^ ^

Associativity Difference

Compare Examples 3 and 4. With left-associative -, we popped when we saw the second -. With right-associative ^, we pushed without popping when we saw the second ^. This single difference makes 10-5-2 evaluate as (10-5)-2 and 2^3^2 evaluate as 2^(3^2).

Implementation

Let's implement the complete Shunting-Yard algorithm with full handling of precedence and associativity:

shunting_yard.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
from typing import List, Tuple, Dict
 
def infix_to_postfix(expression: str) -> str:
    """
    Convert infix expression to postfix using the Shunting-Yard algorithm.
    
    Supports: +, -, *, /, ^ (right-associative exponentiation)
    Handles: parentheses, operator precedence, associativity
    
    Time Complexity: O(n) where n is the number of tokens
    Space Complexity: O(n) for the operator stack
    
    Args:
        expression: Space-separated infix expression
        
    Returns:
        Space-separated postfix expression
        
    Raises:
        ValueError: If parentheses are mismatched
    """
    # Operator properties: (precedence, is_left_associative)
    # Higher precedence number = binds tighter
    operators: Dict[str, Tuple[int, bool]] = {
        '+': (1, True),   # Precedence 1, left-associative
        '-': (1, True),   # Precedence 1, left-associative
        '*': (2, True),   # Precedence 2, left-associative
        '/': (2, True),   # Precedence 2, left-associative
        '^': (3, False),  # Precedence 3, RIGHT-associative
    }
    
    output: List[str] = []
    operator_stack: List[str] = []
    
    tokens = expression.split()
    
    for token in tokens:
        # Case 1: Operand (number) - goes directly to output
        if token not in operators and token not in '()':
            output.append(token)
        
        # Case 2: Left parenthesis - push to stack
        elif token == '(':
            operator_stack.append(token)
        
        # Case 3: Right parenthesis - pop until matching '('
        elif token == ')':
            while operator_stack and operator_stack[-1] != '(':
                output.append(operator_stack.pop())
            
            if not operator_stack:
                raise ValueError("Mismatched parentheses: ')' without matching '('")
            
            # Pop the '(' but don't add to output
            operator_stack.pop()
        
        # Case 4: Operator - handle precedence and associativity
        else:
            current_prec, current_left_assoc = operators[token]
            
            while operator_stack and operator_stack[-1] != '(':
                top = operator_stack[-1]
                top_prec, _ = operators.get(top, (0, True))
                
                # Pop if:
                # - Top has higher precedence, OR
                # - Same precedence AND current operator is left-associative
                should_pop = (
                    top_prec > current_prec or
                    (top_prec == current_prec and current_left_assoc)
                )
                
                if should_pop:
                    output.append(operator_stack.pop())
                else:
                    break
            
            operator_stack.append(token)
    
    # Pop all remaining operators
    while operator_stack:
        top = operator_stack.pop()
        if top == '(':
            raise ValueError("Mismatched parentheses: '(' without matching ')'")
        output.append(top)
    
    return ' '.join(output)
 
 
# Test cases
test_cases = [
    ("3 + 4", "3 4 +"),
    ("3 + 4 * 2", "3 4 2 * +"),
    ("( 3 + 4 ) * 2", "3 4 + 2 *"),
    ("3 * ( 4 + 2 )", "3 4 2 + *"),
    ("10 - 5 - 2", "10 5 - 2 -"),
    ("2 ^ 3 ^ 2", "2 3 2 ^ ^"),
    ("1 + 2 * 3 - 4 / 2", "1 2 3 * + 4 2 / -"),
    ("( 1 + 2 ) * ( 3 + 4 )", "1 2 + 3 4 + *"),
    ("3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3", "3 4 2 * 1 5 - 2 3 ^ ^ / +"),
]
 
print("Infix to Postfix Conversion Tests:")
print("-" * 60)
for infix, expected in test_cases:
    result = infix_to_postfix(infix)
    status = "✓" if result == expected else "✗"
    print(f"{status} '{infix}'")
    print(f"   → '{result}'")
    if result != expected:
        print(f"   Expected: '{expected}'")
    print()

Complete Expression Evaluation Pipeline

Now we can combine the Shunting-Yard algorithm with postfix evaluation for a complete infix expression evaluator:

Pipeline: Infix → Postfix → Result

"3 + 4 * 2"  →  Shunting-Yard  →  "3 4 2 * +"  →  Postfix Eval  →  11

This is exactly how calculators and expression parsers work!

expression_evaluator.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def evaluate_infix(expression: str) -> float:
    """
    Evaluate an infix expression by converting to postfix and evaluating.
    
    Pipeline: Infix → Postfix → Result
    
    Supports: +, -, *, /, ^ with proper precedence and associativity.
    
    Args:
        expression: Space-separated infix expression
        
    Returns:
        The numerical result
    """
    # Step 1: Convert infix to postfix
    postfix = infix_to_postfix(expression)
    
    # Step 2: Evaluate postfix
    return evaluate_postfix(postfix)
 
 
def evaluate_postfix(expression: str) -> float:
    """Evaluate postfix expression (from previous page)."""
    tokens = expression.split()
    stack = []
    
    operators = {
        '+': lambda a, b: a + b,
        '-': lambda a, b: a - b,
        '*': lambda a, b: a * b,
        '/': lambda a, b: a / b,
        '^': lambda a, b: a ** b,
    }
    
    for token in tokens:
        if token in operators:
            b = stack.pop()
            a = stack.pop()
            stack.append(operators[token](a, b))
        else:
            stack.append(float(token))
    
    return stack[0]
 
 
# Complete test: Infix → Result
test_expressions = [
    ("3 + 4", 7),
    ("3 + 4 * 2", 11),
    ("( 3 + 4 ) * 2", 14),
    ("10 - 5 - 2", 3),
    ("2 ^ 3 ^ 2", 512),
    ("2 ^ 3", 8),
    ("( 1 + 2 ) * ( 3 + 4 )", 21),
    ("3 + 4 * 2 / ( 1 - 5 )", 3 + 8/(-4)),  # 3 + (-2) = 1
    ("10 / 2 + 3 * 4 - 5", 5 + 12 - 5),     # 12
]
 
print("Complete Expression Evaluation:")
print("-" * 60)
for expr, expected in test_expressions:
    postfix = infix_to_postfix(expr)
    result = evaluate_infix(expr)
    status = "✓" if abs(result - expected) < 0.0001 else "✗"
    print(f"{status} {expr}")
    print(f"   Postfix: {postfix}")
    print(f"   Result: {result} (expected {expected})")
    print()

Handling Unary Operators

The basic Shunting-Yard algorithm handles binary operators. But what about unary operators like negation? Consider:

-3 + 4 — The - here is unary (negation), not subtraction.

The Challenge: How do we distinguish - meaning 'subtract' from - meaning 'negate'?

Solution: Context-Based Differentiation

A - (or +) is unary if it appears:

At the start of the expression
After an opening parenthesis
After another operator

In all other cases, it's binary.

shunting_yard_unary.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
def infix_to_postfix_with_unary(expression: str) -> str:
    """
    Extended Shunting-Yard with unary operator support.
    
    Handles:
    - Unary minus: -3, 3 + -4, (-5)
    - Unary plus: +3-4
    
    Strategy: Detect unary operators by context and convert them
    to a special symbol (e.g., '~' for unary minus).
    """
    operators = {
        '+': (1, True),
        '-': (1, True),
        '*': (2, True),
        '/': (2, True),
        '^': (3, False),
        '~': (4, False),  # Unary minus (highest precedence, right-assoc)
    }
    
    def is_operator(token: str) -> bool:
        return token in operators
    
    def is_unary_context(prev_token: str | None) -> bool:
        """Determine if current position expects a unary operator."""
        if prev_token is None:
            return True  # Start of expression
        if prev_token == '(':
            return True  # After opening paren
        if is_operator(prev_token):
            return True  # After another operator
        return False
    
    tokens = expression.split()
    output = []
    op_stack = []
    prev_token = None
    
    for token in tokens:
        # Handle potential unary +/-
        if token in ['+', '-'] and is_unary_context(prev_token):
            if token == '-':
                token = '~'  # Convert to unary minus symbol
            else:
                continue  # Unary + is a no-op, skip it
        
        if token not in operators and token not in '()':
            # Operand
            output.append(token)
        elif token == '(':
            op_stack.append(token)
        elif token == ')':
            while op_stack and op_stack[-1] != '(':
                output.append(op_stack.pop())
            op_stack.pop()  # Remove '('
        else:
            # Operator
            prec, left_assoc = operators[token]
            while (op_stack and 
                   op_stack[-1] != '(' and 
                   op_stack[-1] in operators):
                top_prec, _ = operators[op_stack[-1]]
                if top_prec > prec or (top_prec == prec and left_assoc):
                    output.append(op_stack.pop())
                else:
                    break
            op_stack.append(token)
        
        prev_token = token
    
    while op_stack:
        output.append(op_stack.pop())
    
    return ' '.join(output)
 
 
def evaluate_postfix_with_unary(expression: str) -> float:
    """Evaluate postfix with unary minus support."""
    tokens = expression.split()
    stack = []
    
    binary_ops = {
        '+': lambda a, b: a + b,
        '-': lambda a, b: a - b,
        '*': lambda a, b: a * b,
        '/': lambda a, b: a / b,
        '^': lambda a, b: a ** b,
    }
    
    for token in tokens:
        if token == '~':
            # Unary minus: pop one, negate, push
            stack.append(-stack.pop())
        elif token in binary_ops:
            b = stack.pop()
            a = stack.pop()
            stack.append(binary_ops[token](a, b))
        else:
            stack.append(float(token))
    
    return stack[0]
 
 
# Test unary operators
unary_tests = [
    ("-3 + 4", 1),           # Unary minus at start
    ("3 + -4", -1),          # Unary minus after operator
    ("3 * ( -4 + 2 )", -6),  # Unary minus after paren
    ("-3 * -4", 12),         # Two unary minuses
    ("- ( 3 + 4 )", -7),     # Unary minus before paren
]
 
print("Unary Operator Tests:")
for expr, expected in unary_tests:
    postfix = infix_to_postfix_with_unary(expr)
    result = evaluate_postfix_with_unary(postfix)
    status = "✓" if abs(result - expected) < 0.0001 else "✗"
    print(f"{status} '{expr}' → '{postfix}' = {result}")

Real-World Applications

The Shunting-Yard algorithm and its derivatives are foundational in real-world systems:

Industrial Applications

•Expression Compilers — Converting user-written formulas to executable code in spreadsheets (Excel, Google Sheets)
•Calculators — Both scientific calculators and programming language REPLs use this algorithm or variants
•Programming Language Parsers — Expression parsing in compilers for C, Java, Python's parser
•Query Languages — SQL WHERE clauses, search query parsing, filter expressions
•Mathematical Typesetting — LaTeX and MathML parsers for rendering equations
•Game Engines — Scripting systems parse user formulas for damage calculations, physics, etc.
•Configuration Files — Expression evaluation in Terraform, Ansible, and similar tools

Historical Note:

Edsger Dijkstra developed this algorithm in 1961 while working on the first ALGOL 60 compiler. It was one of the earliest examples of a stack-based parsing technique and influenced decades of compiler design.

Dijkstra's contributions to computer science are enormous—he also developed Dijkstra's shortest path algorithm (yes, the same Dijkstra!), structured programming concepts, and the dining philosophers problem. The Shunting-Yard algorithm remains one of his most elegant and practical contributions.

Beyond Expressions

The Shunting-Yard algorithm can be extended to build expression trees instead of postfix output. Simply build tree nodes instead of outputting operators. This is how many modern expression evaluators work, enabling optimizations and symbolic manipulation.

Complexity Analysis

Time Complexity: O(n)

Where n is the number of tokens:

Each token is pushed to stack or output exactly once: O(1) per token
Each token is popped from stack at most once: O(1) per token
Total: O(n)

Space Complexity: O(n)

Operator stack: In the worst case (all operators, deeply nested), holds O(n) operators
Output queue: Holds all tokens, O(n)
Total: O(n)

Is this optimal?

Yes. Any algorithm must read all n tokens at least once, so Ω(n) is a lower bound. Shunting-Yard achieves this.

Practical Performance

In practice, the algorithm is extremely fast. Expression strings are typically short (hundreds of tokens at most), and the constant factors are small. The algorithm is often memory-bound rather than compute-bound, making cache-friendly implementations valuable.

Summary: The Art of Expression Conversion

Key Takeaways

•Shunting-Yard converts infix to postfix — Handling precedence, associativity, and parentheses elegantly
•The stack holds operators in suspense — Higher precedence operators 'cut in line'; lower ones wait
•Left vs right associativity differs in one comparison — Pop on equal precedence for left-assoc, only on greater for right-assoc
•Parentheses are markers, not operators — Push '(' and pop until '(' when seeing ')'
•Unary operators need context detection — - at start or after operator/paren is negation, not subtraction
•Combined with postfix evaluation, yields complete expression calculator — The full pipeline that powers real calculators

Module Complete!

We've now mastered the complete module on parentheses matching and expression problems:

Balanced Parentheses — Stack-based validation of bracket nesting
Valid Bracket Sequences — Generation, counting, and Catalan numbers
Postfix Evaluation — Stack-based expression evaluation
Infix to Postfix — The Shunting-Yard algorithm

These techniques form the foundation of expression handling in virtually all programming languages, calculators, and compilers. The stack data structure proves its power again and again—from simple bracket matching to sophisticated expression parsing.

Module Complete

Congratulations! You've mastered the classic suite of stack-based expression problems. These patterns—bracket matching, sequence generation, postfix evaluation, and the Shunting-Yard algorithm—appear throughout computer science, from compiler design to calculator apps. You now have the tools to build expression parsers and understand how programming languages handle operator precedence.