Common Patterns - Learning Module

Loading content...

0/276

Minimum Window Substring

Finding the Needle in the Haystack—Precisely

Imagine you're building a search feature that must highlight the smallest portion of a document containing all query terms. Or a DNA sequence analyzer that needs to find the shortest fragment containing all required nucleotides. Or a network packet inspector that must identify the minimal contiguous data span containing all signatures of an attack.

All of these are instances of the Minimum Window Substring problem—one of the most elegant and frequently asked problems in technical interviews. It's a litmus test for understanding sliding window algorithms, and solving it efficiently requires a precise dance between expanding and contracting your search window.

The Problem (LeetCode 76):

Given strings s (the haystack) and t (the pattern), find the minimum window substring of s that contains all characters of t (including duplicates). If no such window exists, return an empty string.

What You Will Learn

By the end of this page, you will deeply understand the sliding window technique for substring problems, implement the minimum window substring algorithm in O(n) time, and recognize how this pattern extends to numerous related problems. You'll understand not just the algorithm, but the intuition behind every design decision.

Problem Formalization

Let's establish precise definitions before diving into the algorithm.

Formal Problem Statement:

Given:

String s of length n (the text/haystack)
String t of length m (the pattern)

Find: The shortest substring w of s such that every character in t (with repetition) appears in w, or return empty string if no such window exists.

Key Clarifications:

Character frequencies matter: If t = "AA", the window must contain at least two 'A's.
Order doesn't matter: We're not looking for t as a subsequence or substring—just a window containing all characters.
Minimum length: Among all valid windows, we want the shortest one.
First occurrence: If multiple minimum windows exist, return the one starting earliest.

Example:

s = "ADOBECODEBANC"
t = "ABC"

Valid windows: "ADOBEC", "CODEBA", "BECODEBA", "BANC", ...
Minimum window: "BANC" (length 4)

Why This Problem is Hard:

The naive approach—check every possible substring of s against t—has O(n² · m) time complexity. For s with 100,000 characters and t with 1,000 characters, this means 10 trillion operations. We need something dramatically better.

The Key Insight:

Instead of checking every substring independently, we can process the string in a single pass using two pointers (called the sliding window or caterpillar technique). The right pointer expands the window to include new characters; the left pointer contracts it to find the minimum. This reduces time complexity to O(n + m).

The Sliding Window Pattern

The sliding window is one of the most powerful patterns in string and array algorithms. It applies when you're looking for a contiguous subarray/substring that satisfies some property. The core idea: maintain a window with two pointers and slide them through the data, adjusting as you go.

Algorithm Intuition: Expand and Contract

Think of the algorithm as a caterpillar inching along the string s. The caterpillar's body spans from index left to index right.

Phase 1: Expansion (Move Right Pointer)

Expand the window by moving right forward until the window contains all characters of t. Each expansion adds a new character to our "collection."

Phase 2: Contraction (Move Left Pointer)

Once we have a valid window, try to shrink it by moving left forward. This might remove enough characters to make the window invalid, at which point we stop contracting and resume expansion.

Phase 3: Record Minimum

Every time we have a valid window, compare its length to our current minimum. Track the start index of the best window we've found.

The Invariant:

We never need to move left backward or right backward. Each pointer moves only forward, giving us O(n) total movements despite the nested loops.

Visual Walkthrough:

s = "ADOBECODEBANC", t = "ABC"

Step 1: Expand right until we have A, B, C
[ADOBEC]ODEBANC  ← Valid! Length 6, save it

Step 2: Contract left while still valid
A[DOBEC]ODEBANC  ← Missing A, stop contracting

Step 3: Expand right again
A[DOBECO]DEBANC  ← Still missing A, keep going
A[DOBECODE]BANC  ← Still missing A
A[DOBECODEB]ANC  ← Still missing A
A[DOBECODEBA]NC  ← Valid! But longer than saved

Step 4: Contract left
AD[OBECODEBA]NC  ← Valid! Length 9
ADO[BECODEBA]NC  ← Valid! Length 8
ADOB[ECODEBA]NC  ← Valid! Length 7
ADOBE[CODEBA]NC  ← Valid! Length 6
ADOBEC[ODEBA]NC  ← Missing C, stop

Step 5: Expand right
ADOBEC[ODEBAN]C  ← Missing C
ADOBEC[ODEBANC]  ← Valid! Length 7

Step 6: Contract left
ADOBECO[DEBANC]  ← Valid! Length 6
ADOBECOD[EBANC]  ← Valid! Length 5
ADOBECODE[BANC]  ← Valid! Length 4 ← NEW MINIMUM!
ADOBECODEB[ANC]  ← Missing B, stop

End of string. Best window: "BANC" at position 9

Why O(n) Not O(n²)?

Each pointer moves at most n times total (never backwards). The outer loop moves right from 0 to n. The inner loop moves left from 0 to n (across all iterations of the outer loop). Total operations: O(n + n) = O(n).

Detailed Implementation

The implementation requires careful tracking of character frequencies and a "satisfaction" counter to know when our window is valid.

Data Structures:

need: Frequency map of characters in t. What we need to find.
have: Frequency map of characters in our current window from s.
formed: Count of characters in t that are fully satisfied (have >= need).
required: Number of unique characters in t that must be satisfied.

When Is Window Valid?

The window is valid when formed == required, meaning every unique character in t has sufficient count in our window.

minimum_window_substring.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
def min_window(s: str, t: str) -> str:
    """
    Find the minimum window substring containing all characters of t.
    
    Time Complexity: O(|s| + |t|)
    Space Complexity: O(|s| + |t|) in worst case (all unique characters)
    
    Args:
        s: The string to search within (haystack)
        t: The pattern string (needle)
    
    Returns:
        The minimum window substring, or empty string if none exists
    """
    if not t or not s or len(s) < len(t):
        return ""
    
    from collections import Counter
    
    # Step 1: Build frequency map of characters we need
    need = Counter(t)
    required = len(need)  # Number of unique characters in t
    
    # Step 2: Initialize window tracking
    left = 0
    formed = 0  # Number of unique chars with desired frequency
    have = {}   # Current window character counts
    
    # Step 3: Track the best window found
    # (window_length, left_index, right_index)
    ans = (float('inf'), None, None)
    
    # Step 4: Sliding window
    for right in range(len(s)):
        # Add character from the right to our window
        char = s[right]
        have[char] = have.get(char, 0) + 1
        
        # Check if this character's frequency matches what we need
        if char in need and have[char] == need[char]:
            formed += 1
        
        # Try to contract the window while it's valid
        while left <= right and formed == required:
            # Update answer if this window is smaller
            if right - left + 1 < ans[0]:
                ans = (right - left + 1, left, right)
            
            # Remove character from the left
            left_char = s[left]
            have[left_char] -= 1
            
            # Check if removal breaks a required frequency
            if left_char in need and have[left_char] < need[left_char]:
                formed -= 1
            
            left += 1
    
    # Return the minimum window or empty string
    return "" if ans[0] == float('inf') else s[ans[1]:ans[2] + 1]
 
 
def min_window_optimized(s: str, t: str) -> str:
    """
    Optimized version using a filtered list of relevant characters.
    
    When s is much longer than t and has many irrelevant characters,
    this version can be faster by only considering positions of
    characters that exist in t.
    
    Time Complexity: O(|s| + |t|) - same worst case, better best case
    Space Complexity: O(|s| + |t|)
    """
    if not t or not s or len(s) < len(t):
        return ""
    
    from collections import Counter
    
    need = Counter(t)
    required = len(need)
    
    # Create filtered list of (index, char) for chars in s that exist in t
    filtered_s = [(i, char) for i, char in enumerate(s) if char in need]
    
    if not filtered_s:
        return ""
    
    left = 0
    formed = 0
    have = {}
    ans = (float('inf'), None, None)
    
    for right in range(len(filtered_s)):
        # Add character at filtered position
        idx, char = filtered_s[right]
        have[char] = have.get(char, 0) + 1
        
        if have[char] == need[char]:
            formed += 1
        
        # Contract window while valid
        while left <= right and formed == required:
            left_idx, left_char = filtered_s[left]
            
            # Window is from left_idx to idx (in original string)
            if idx - left_idx + 1 < ans[0]:
                ans = (idx - left_idx + 1, left_idx, idx)
            
            have[left_char] -= 1
            if have[left_char] < need[left_char]:
                formed -= 1
            
            left += 1
    
    return "" if ans[0] == float('inf') else s[ans[1]:ans[2] + 1]
 
 
# Test cases
if __name__ == "__main__":
    test_cases = [
        ("ADOBECODEBANC", "ABC", "BANC"),
        ("a", "a", "a"),
        ("a", "aa", ""),  # Impossible
        ("aa", "aa", "aa"),
        ("cabwefgewcwaefgcf", "cae", "cwae"),
        ("aaaaaaaaaaaabbbbbcdd", "abcdd", "abbbbbcdd"),
    ]
    
    for s, t, expected in test_cases:
        result = min_window(s, t)
        status = "✓" if result == expected else "✗"
        print(f"{status} s='{s}', t='{t}' -> '{result}' (expected '{expected}')")

Common Implementation Bugs

Off-by-one errors: Window length is right - left + 1, not right - left. 2. Forgetting the equality check: We need have[char] == need[char], not have[char] >= need[char] for counting formed. The latter would increment formed multiple times. 3. Not handling empty strings: Always check for empty t or s first.

Complexity Analysis Deep Dive

Let's rigorously analyze why this algorithm achieves O(n + m) time complexity.

Time Complexity: O(|s| + |t|) = O(n + m)

Building the need map: We iterate through t once → O(m)
Main sliding window:
- The outer loop runs exactly n times (right pointer moves from 0 to n-1)
- The inner while loop moves left pointer forward
- Key insight: Left pointer moves at most n times total across all iterations
- Each character is added to window once (when right reaches it)
- Each character is removed from window at most once (when left passes it)
Total pointer movements: O(n) for right + O(n) for left = O(n)
Per-character operations: O(1) hash map operations per movement

Combined: O(m) + O(n) = O(n + m)

Space Complexity: O(|s| + |t|)

Worst case: O(n + m) when all characters are unique.

Breakdown:

need map: O(m) entries (characters in t)
have map: O(n) entries worst case (all unique chars in s)
filtered_s in optimized version: O(n) entries
Answer tracking: O(1)

In practice, for ASCII strings, this is O(128) = O(1) for the maps.

The Optimized Version:

The filtered version helps when s contains many characters not in t. Instead of processing n characters, we only process the positions where relevant characters appear. This doesn't improve worst-case complexity but significantly improves average case for sparse matches.

s = "XXXXXXXXXXXABCXXXXXXXXXXXX" (many Xs)
t = "ABC"

Standard: Process 26 characters
Optimized: Process 3 positions (where A, B, C appear)

Complexity Comparison: Naive vs. Sliding Window
Approach	Time	Space	Key Operation
Naive (all substrings)	O(n² · m)	O(m)	Check each substring
Sliding Window	O(n + m)	O(n + m)	Single pass with 2 pointers
Optimized Sliding Window	O(n + m)	O(n + m)	Filtered positions

Related Problems and Variations

The minimum window substring pattern extends to numerous related problems. Understanding the core pattern allows you to adapt it quickly.

1. Longest Substring with At Most K Distinct Characters (LeetCode 340)

Instead of finding minimum window containing target characters, find the longest window with at most K unique characters.

Key difference: Contract when distinct count exceeds K; record max length during valid states.

2. Longest Substring Without Repeating Characters (LeetCode 3)

Find the longest substring where all characters are unique.

Key difference: Contract when any character count exceeds 1.

3. Find All Anagrams in a String (LeetCode 438)

Find all starting positions where anagrams of pattern appear.

Key difference: Fixed window size (equal to pattern length); use the match-count technique from our anagram discussion.

4. Permutation in String (LeetCode 567)

Check if any permutation of pattern exists as a substring.

Key difference: Return true/false instead of the substring; fixed window size.

5. Substring with Concatenation of All Words (LeetCode 30)

Find starting indices where concatenation of given words appears.

Key difference: Work with word-sized blocks instead of characters; multiple passes for different starting offsets.

sliding_window_variations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
def longest_with_k_distinct(s: str, k: int) -> int:
    """
    Find length of longest substring with at most k distinct characters.
    
    LeetCode 340: Longest Substring with At Most K Distinct Characters
    
    Time: O(n)
    Space: O(k)
    """
    if k == 0:
        return 0
    
    left = 0
    max_length = 0
    char_count = {}
    
    for right in range(len(s)):
        # Expand: add right character
        char = s[right]
        char_count[char] = char_count.get(char, 0) + 1
        
        # Contract: if too many distinct characters
        while len(char_count) > k:
            left_char = s[left]
            char_count[left_char] -= 1
            if char_count[left_char] == 0:
                del char_count[left_char]
            left += 1
        
        # Record: window is valid, check max
        max_length = max(max_length, right - left + 1)
    
    return max_length
 
 
def longest_without_repeating(s: str) -> int:
    """
    Find length of longest substring without repeating characters.
    
    LeetCode 3: Longest Substring Without Repeating Characters
    
    Time: O(n)
    Space: O(min(n, alphabet_size))
    """
    left = 0
    max_length = 0
    char_index = {}  # Most recent index of each character
    
    for right in range(len(s)):
        char = s[right]
        
        # If char seen and is in current window, move left past it
        if char in char_index and char_index[char] >= left:
            left = char_index[char] + 1
        
        char_index[char] = right
        max_length = max(max_length, right - left + 1)
    
    return max_length
 
 
def permutation_in_string(s1: str, s2: str) -> bool:
    """
    Check if any permutation of s1 exists as substring of s2.
    
    LeetCode 567: Permutation in String
    
    This is minimum window with fixed window size = len(s1)
    
    Time: O(n)
    Space: O(26) = O(1)
    """
    from collections import Counter
    
    if len(s1) > len(s2):
        return False
    
    s1_count = Counter(s1)
    window_count = Counter(s2[:len(s1)])
    
    if window_count == s1_count:
        return True
    
    for i in range(len(s1), len(s2)):
        # Add new char, remove old char
        window_count[s2[i]] += 1
        
        left_char = s2[i - len(s1)]
        window_count[left_char] -= 1
        if window_count[left_char] == 0:
            del window_count[left_char]
        
        if window_count == s1_count:
            return True
    
    return False
 
 
# The pattern template for sliding window problems
def sliding_window_template(s: str, condition_checker):
    """
    General template for sliding window problems.
    
    The key variation points are:
    1. When to expand (usually always)
    2. When to contract (problem-specific condition)
    3. When to record answer (during valid state)
    4. What to record (min/max length, count, actual substring)
    """
    left = 0
    result = None  # or 0 for length, or [] for list of indices
    state = {}     # Window state (frequency map, count, etc.)
    
    for right in range(len(s)):
        # EXPAND: Update state with s[right]
        update_state_add(state, s[right])
        
        # CONTRACT: While condition violated
        while should_contract(state, condition):
            update_state_remove(state, s[left])
            left += 1
        
        # RECORD: Update answer based on current valid window
        result = update_result(result, left, right)
    
    return result

The Sliding Window Template

Once you internalize the expand-contract-record pattern, most sliding window problems become straightforward. The key is identifying: (1) What state do I track in the window? (2) What makes a window valid/invalid? (3) Do I want minimum, maximum, or all valid windows?

Interview Tips and Edge Cases

The minimum window substring is a frequent interview question. Here's how to approach it professionally.

Clarifying Questions to Ask:

"What if no valid window exists?" → Return empty string
"What about duplicate characters in t?" → Must match with multiplicity
"Are there multiple minimum windows?" → Return the first (leftmost)
"Character set?" → Usually ASCII, but clarify
"Empty inputs?" → Handle gracefully

Edge Cases to Consider:

t longer than s → Impossible, return ""
t equals s → Return s if all chars match
Single character strings → Works, but mention you considered it
t has characters not in s → Impossible, return ""
Multiple valid minimum windows → Return first one

What Interviewers Look For

•Recognize this as a sliding window problem
•Articulate the expand-contract strategy
•Handle character frequencies correctly
•Explain complexity (O(n) time, O(1) space for fixed alphabet)
•Consider edge cases proactively
•Write clean, bug-free code

Common Mistakes

•Forgetting to handle duplicates in t
•Off-by-one in window length calculation
•Not tracking when formed count changes
•Modifying answer inside contract loop (only check, don't break early)
•Ignoring empty string cases
•Using >= instead of == for formed increment

Optimization Discussion:

If the interviewer asks about optimizations:

Filtered characters: Mention the optimized version that only processes positions where characters from t appear. This helps when s has many irrelevant characters.
Early termination: If we find a window of length len(t), it can't get shorter. We could return immediately.
Space for fixed alphabet: Emphasize that for ASCII, space is O(128) = O(1), not O(n).
Preprocessing t: Building the frequency map for t is O(m) and only done once.

Most interviewers will be satisfied with the standard O(n + m) solution. The optimizations demonstrate depth of understanding.

Summary and Key Insights

The Minimum Window Substring problem is a cornerstone of string algorithm interviews, testing your mastery of the sliding window technique.

Key Takeaways

•The sliding window achieves O(n) by ensuring each pointer moves only forward — Total movements ≤ 2n.
•Track "formed" count to know when window is valid — Increment when a character's count first matches need; decrement when it drops below.
•Contract while valid, not until invalid — This finds the minimum, not just any valid window.
•The pattern generalizes to many problems — Longest substring with K distinct, no repeating, permutation search, etc.
•Edge cases matter in interviews — Empty strings, impossible targets, duplicates in pattern.
•Optimize thoughtfully — Filtered positions for sparse matches; early termination for perfect-length matches.

Page Complete

You now have deep mastery of the Minimum Window Substring problem and the sliding window pattern. This is one of the most powerful tools in your string algorithm toolkit. Next, we'll explore Distinct Subsequences—a completely different paradigm using dynamic programming.