Loading content...
Imagine you're building a search feature that must highlight the smallest portion of a document containing all query terms. Or a DNA sequence analyzer that needs to find the shortest fragment containing all required nucleotides. Or a network packet inspector that must identify the minimal contiguous data span containing all signatures of an attack.
All of these are instances of the Minimum Window Substring problem—one of the most elegant and frequently asked problems in technical interviews. It's a litmus test for understanding sliding window algorithms, and solving it efficiently requires a precise dance between expanding and contracting your search window.
The Problem (LeetCode 76):
Given strings s (the haystack) and t (the pattern), find the minimum window substring of s that contains all characters of t (including duplicates). If no such window exists, return an empty string.
By the end of this page, you will deeply understand the sliding window technique for substring problems, implement the minimum window substring algorithm in O(n) time, and recognize how this pattern extends to numerous related problems. You'll understand not just the algorithm, but the intuition behind every design decision.
Let's establish precise definitions before diving into the algorithm.
Formal Problem Statement:
Given:
s of length n (the text/haystack)t of length m (the pattern)Find: The shortest substring w of s such that every character in t (with repetition) appears in w, or return empty string if no such window exists.
Key Clarifications:
Character frequencies matter: If t = "AA", the window must contain at least two 'A's.
Order doesn't matter: We're not looking for t as a subsequence or substring—just a window containing all characters.
Minimum length: Among all valid windows, we want the shortest one.
First occurrence: If multiple minimum windows exist, return the one starting earliest.
Example:
s = "ADOBECODEBANC"
t = "ABC"
Valid windows: "ADOBEC", "CODEBA", "BECODEBA", "BANC", ...
Minimum window: "BANC" (length 4)
Why This Problem is Hard:
The naive approach—check every possible substring of s against t—has O(n² · m) time complexity. For s with 100,000 characters and t with 1,000 characters, this means 10 trillion operations. We need something dramatically better.
The Key Insight:
Instead of checking every substring independently, we can process the string in a single pass using two pointers (called the sliding window or caterpillar technique). The right pointer expands the window to include new characters; the left pointer contracts it to find the minimum. This reduces time complexity to O(n + m).
The sliding window is one of the most powerful patterns in string and array algorithms. It applies when you're looking for a contiguous subarray/substring that satisfies some property. The core idea: maintain a window with two pointers and slide them through the data, adjusting as you go.
Think of the algorithm as a caterpillar inching along the string s. The caterpillar's body spans from index left to index right.
Phase 1: Expansion (Move Right Pointer)
Expand the window by moving right forward until the window contains all characters of t. Each expansion adds a new character to our "collection."
Phase 2: Contraction (Move Left Pointer)
Once we have a valid window, try to shrink it by moving left forward. This might remove enough characters to make the window invalid, at which point we stop contracting and resume expansion.
Phase 3: Record Minimum
Every time we have a valid window, compare its length to our current minimum. Track the start index of the best window we've found.
The Invariant:
We never need to move left backward or right backward. Each pointer moves only forward, giving us O(n) total movements despite the nested loops.
Visual Walkthrough:
s = "ADOBECODEBANC", t = "ABC"
Step 1: Expand right until we have A, B, C
[ADOBEC]ODEBANC ← Valid! Length 6, save it
Step 2: Contract left while still valid
A[DOBEC]ODEBANC ← Missing A, stop contracting
Step 3: Expand right again
A[DOBECO]DEBANC ← Still missing A, keep going
A[DOBECODE]BANC ← Still missing A
A[DOBECODEB]ANC ← Still missing A
A[DOBECODEBA]NC ← Valid! But longer than saved
Step 4: Contract left
AD[OBECODEBA]NC ← Valid! Length 9
ADO[BECODEBA]NC ← Valid! Length 8
ADOB[ECODEBA]NC ← Valid! Length 7
ADOBE[CODEBA]NC ← Valid! Length 6
ADOBEC[ODEBA]NC ← Missing C, stop
Step 5: Expand right
ADOBEC[ODEBAN]C ← Missing C
ADOBEC[ODEBANC] ← Valid! Length 7
Step 6: Contract left
ADOBECO[DEBANC] ← Valid! Length 6
ADOBECOD[EBANC] ← Valid! Length 5
ADOBECODE[BANC] ← Valid! Length 4 ← NEW MINIMUM!
ADOBECODEB[ANC] ← Missing B, stop
End of string. Best window: "BANC" at position 9
Each pointer moves at most n times total (never backwards). The outer loop moves right from 0 to n. The inner loop moves left from 0 to n (across all iterations of the outer loop). Total operations: O(n + n) = O(n).
The implementation requires careful tracking of character frequencies and a "satisfaction" counter to know when our window is valid.
Data Structures:
need: Frequency map of characters in t. What we need to find.have: Frequency map of characters in our current window from s.formed: Count of characters in t that are fully satisfied (have >= need).required: Number of unique characters in t that must be satisfied.When Is Window Valid?
The window is valid when formed == required, meaning every unique character in t has sufficient count in our window.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132
def min_window(s: str, t: str) -> str: """ Find the minimum window substring containing all characters of t. Time Complexity: O(|s| + |t|) Space Complexity: O(|s| + |t|) in worst case (all unique characters) Args: s: The string to search within (haystack) t: The pattern string (needle) Returns: The minimum window substring, or empty string if none exists """ if not t or not s or len(s) < len(t): return "" from collections import Counter # Step 1: Build frequency map of characters we need need = Counter(t) required = len(need) # Number of unique characters in t # Step 2: Initialize window tracking left = 0 formed = 0 # Number of unique chars with desired frequency have = {} # Current window character counts # Step 3: Track the best window found # (window_length, left_index, right_index) ans = (float('inf'), None, None) # Step 4: Sliding window for right in range(len(s)): # Add character from the right to our window char = s[right] have[char] = have.get(char, 0) + 1 # Check if this character's frequency matches what we need if char in need and have[char] == need[char]: formed += 1 # Try to contract the window while it's valid while left <= right and formed == required: # Update answer if this window is smaller if right - left + 1 < ans[0]: ans = (right - left + 1, left, right) # Remove character from the left left_char = s[left] have[left_char] -= 1 # Check if removal breaks a required frequency if left_char in need and have[left_char] < need[left_char]: formed -= 1 left += 1 # Return the minimum window or empty string return "" if ans[0] == float('inf') else s[ans[1]:ans[2] + 1] def min_window_optimized(s: str, t: str) -> str: """ Optimized version using a filtered list of relevant characters. When s is much longer than t and has many irrelevant characters, this version can be faster by only considering positions of characters that exist in t. Time Complexity: O(|s| + |t|) - same worst case, better best case Space Complexity: O(|s| + |t|) """ if not t or not s or len(s) < len(t): return "" from collections import Counter need = Counter(t) required = len(need) # Create filtered list of (index, char) for chars in s that exist in t filtered_s = [(i, char) for i, char in enumerate(s) if char in need] if not filtered_s: return "" left = 0 formed = 0 have = {} ans = (float('inf'), None, None) for right in range(len(filtered_s)): # Add character at filtered position idx, char = filtered_s[right] have[char] = have.get(char, 0) + 1 if have[char] == need[char]: formed += 1 # Contract window while valid while left <= right and formed == required: left_idx, left_char = filtered_s[left] # Window is from left_idx to idx (in original string) if idx - left_idx + 1 < ans[0]: ans = (idx - left_idx + 1, left_idx, idx) have[left_char] -= 1 if have[left_char] < need[left_char]: formed -= 1 left += 1 return "" if ans[0] == float('inf') else s[ans[1]:ans[2] + 1] # Test casesif __name__ == "__main__": test_cases = [ ("ADOBECODEBANC", "ABC", "BANC"), ("a", "a", "a"), ("a", "aa", ""), # Impossible ("aa", "aa", "aa"), ("cabwefgewcwaefgcf", "cae", "cwae"), ("aaaaaaaaaaaabbbbbcdd", "abcdd", "abbbbbcdd"), ] for s, t, expected in test_cases: result = min_window(s, t) status = "✓" if result == expected else "✗" print(f"{status} s='{s}', t='{t}' -> '{result}' (expected '{expected}')")right - left + 1, not right - left. 2. Forgetting the equality check: We need have[char] == need[char], not have[char] >= need[char] for counting formed. The latter would increment formed multiple times. 3. Not handling empty strings: Always check for empty t or s first.Let's rigorously analyze why this algorithm achieves O(n + m) time complexity.
Time Complexity: O(|s| + |t|) = O(n + m)
Building the need map: We iterate through t once → O(m)
Main sliding window:
Total pointer movements: O(n) for right + O(n) for left = O(n)
Per-character operations: O(1) hash map operations per movement
Combined: O(m) + O(n) = O(n + m)
Space Complexity: O(|s| + |t|)
Worst case: O(n + m) when all characters are unique.
Breakdown:
need map: O(m) entries (characters in t)have map: O(n) entries worst case (all unique chars in s)filtered_s in optimized version: O(n) entriesIn practice, for ASCII strings, this is O(128) = O(1) for the maps.
The Optimized Version:
The filtered version helps when s contains many characters not in t. Instead of processing n characters, we only process the positions where relevant characters appear. This doesn't improve worst-case complexity but significantly improves average case for sparse matches.
s = "XXXXXXXXXXXABCXXXXXXXXXXXX" (many Xs)
t = "ABC"
Standard: Process 26 characters
Optimized: Process 3 positions (where A, B, C appear)
| Approach | Time | Space | Key Operation |
|---|---|---|---|
| Naive (all substrings) | O(n² · m) | O(m) | Check each substring |
| Sliding Window | O(n + m) | O(n + m) | Single pass with 2 pointers |
| Optimized Sliding Window | O(n + m) | O(n + m) | Filtered positions |
The minimum window substring pattern extends to numerous related problems. Understanding the core pattern allows you to adapt it quickly.
1. Longest Substring with At Most K Distinct Characters (LeetCode 340)
Instead of finding minimum window containing target characters, find the longest window with at most K unique characters.
Key difference: Contract when distinct count exceeds K; record max length during valid states.
2. Longest Substring Without Repeating Characters (LeetCode 3)
Find the longest substring where all characters are unique.
Key difference: Contract when any character count exceeds 1.
3. Find All Anagrams in a String (LeetCode 438)
Find all starting positions where anagrams of pattern appear.
Key difference: Fixed window size (equal to pattern length); use the match-count technique from our anagram discussion.
4. Permutation in String (LeetCode 567)
Check if any permutation of pattern exists as a substring.
Key difference: Return true/false instead of the substring; fixed window size.
5. Substring with Concatenation of All Words (LeetCode 30)
Find starting indices where concatenation of given words appears.
Key difference: Work with word-sized blocks instead of characters; multiple passes for different starting offsets.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126
def longest_with_k_distinct(s: str, k: int) -> int: """ Find length of longest substring with at most k distinct characters. LeetCode 340: Longest Substring with At Most K Distinct Characters Time: O(n) Space: O(k) """ if k == 0: return 0 left = 0 max_length = 0 char_count = {} for right in range(len(s)): # Expand: add right character char = s[right] char_count[char] = char_count.get(char, 0) + 1 # Contract: if too many distinct characters while len(char_count) > k: left_char = s[left] char_count[left_char] -= 1 if char_count[left_char] == 0: del char_count[left_char] left += 1 # Record: window is valid, check max max_length = max(max_length, right - left + 1) return max_length def longest_without_repeating(s: str) -> int: """ Find length of longest substring without repeating characters. LeetCode 3: Longest Substring Without Repeating Characters Time: O(n) Space: O(min(n, alphabet_size)) """ left = 0 max_length = 0 char_index = {} # Most recent index of each character for right in range(len(s)): char = s[right] # If char seen and is in current window, move left past it if char in char_index and char_index[char] >= left: left = char_index[char] + 1 char_index[char] = right max_length = max(max_length, right - left + 1) return max_length def permutation_in_string(s1: str, s2: str) -> bool: """ Check if any permutation of s1 exists as substring of s2. LeetCode 567: Permutation in String This is minimum window with fixed window size = len(s1) Time: O(n) Space: O(26) = O(1) """ from collections import Counter if len(s1) > len(s2): return False s1_count = Counter(s1) window_count = Counter(s2[:len(s1)]) if window_count == s1_count: return True for i in range(len(s1), len(s2)): # Add new char, remove old char window_count[s2[i]] += 1 left_char = s2[i - len(s1)] window_count[left_char] -= 1 if window_count[left_char] == 0: del window_count[left_char] if window_count == s1_count: return True return False # The pattern template for sliding window problemsdef sliding_window_template(s: str, condition_checker): """ General template for sliding window problems. The key variation points are: 1. When to expand (usually always) 2. When to contract (problem-specific condition) 3. When to record answer (during valid state) 4. What to record (min/max length, count, actual substring) """ left = 0 result = None # or 0 for length, or [] for list of indices state = {} # Window state (frequency map, count, etc.) for right in range(len(s)): # EXPAND: Update state with s[right] update_state_add(state, s[right]) # CONTRACT: While condition violated while should_contract(state, condition): update_state_remove(state, s[left]) left += 1 # RECORD: Update answer based on current valid window result = update_result(result, left, right) return resultOnce you internalize the expand-contract-record pattern, most sliding window problems become straightforward. The key is identifying: (1) What state do I track in the window? (2) What makes a window valid/invalid? (3) Do I want minimum, maximum, or all valid windows?
The minimum window substring is a frequent interview question. Here's how to approach it professionally.
Clarifying Questions to Ask:
Edge Cases to Consider:
t longer than s → Impossible, return ""t equals s → Return s if all chars matcht has characters not in s → Impossible, return ""Optimization Discussion:
If the interviewer asks about optimizations:
Filtered characters: Mention the optimized version that only processes positions where characters from t appear. This helps when s has many irrelevant characters.
Early termination: If we find a window of length len(t), it can't get shorter. We could return immediately.
Space for fixed alphabet: Emphasize that for ASCII, space is O(128) = O(1), not O(n).
Preprocessing t: Building the frequency map for t is O(m) and only done once.
Most interviewers will be satisfied with the standard O(n + m) solution. The optimizations demonstrate depth of understanding.
The Minimum Window Substring problem is a cornerstone of string algorithm interviews, testing your mastery of the sliding window technique.
You now have deep mastery of the Minimum Window Substring problem and the sliding window pattern. This is one of the most powerful tools in your string algorithm toolkit. Next, we'll explore Distinct Subsequences—a completely different paradigm using dynamic programming.