Data Structures & AlgorithmsBasic String Operations

Basic String Operations & Their Cost Model

LevelBeginner

Duration60 mins

TopicBasic String Operations

4 / 6

Substring Extraction

Carving Out Pieces of Text

Substring extraction is the operation of obtaining a contiguous portion of a string—a subset of its characters starting at some position and ending at another. This operation is fundamental to:

Parsing structured data (extracting fields from formatted strings)
Text processing (getting prefixes, suffixes, and middle portions)
Pattern matching (isolating matched regions)
Data validation (checking specific portions of input)

Whether you call it slicing, substring, or substr, the core concept is the same: given a string and a range, produce a new string containing just that range. But the details—inclusive vs exclusive bounds, copy vs view semantics, encoding considerations—can make the difference between correct and buggy code.

Learning Objectives

By the end of this page, you will understand: how substring extraction works conceptually and physically, the difference between copy and view semantics, inclusive vs exclusive range conventions, efficient substring patterns, and the time/space cost model.

What Is Substring Extraction?

A substring is a contiguous sequence of characters within a string. Substring extraction creates a new string containing only those characters.

Definition: Given a string S and indices i and j, the substring S[i:j] contains the characters from position i up to (but not including) position j.

Example:

String S = "Hello World"
S[0:5] = "Hello"    (characters at indices 0, 1, 2, 3, 4)
S[6:11] = "World"   (characters at indices 6, 7, 8, 9, 10)
S[3:8] = "lo Wo"    (characters at indices 3, 4, 5, 6, 7)

String "Hello World" with indices
Index	0	1	2	3	4	5	6	7	8	9	10
Char	H	e	l	l	o		W	o	r	l	d

Key properties of substring extraction:

Contiguous: Only consecutive characters can be extracted (not arbitrary positions)
Creates a new string: The result is independent of the original (in most languages)
Preserves order: Characters maintain their relative order
Length: The result has length (end - start) when using exclusive end
Bounds: Valid when 0 ≤ start ≤ end ≤ length

Special cases:

Empty substring: When start == end, the result is an empty string ""
Full string: When start=0 and end=length, you get a copy of the entire string
Prefix: Starting from 0
Suffix: Ending at length

Substring vs Other Terms

Different languages use different terminology: Python uses 'slice', Java uses 'substring', JavaScript has both 'substring' and 'slice' (with slightly different behaviors). The core concept is the same; pay attention to the specific API semantics in your language.

Range Conventions: Inclusive vs Exclusive

One of the most confusing aspects of substring extraction is understanding range conventions. Different languages and functions use different rules for whether the end index is included or excluded.

Half-open intervals [start, end):

Most modern languages use half-open intervals (also called "exclusive end"):

Start index is included
End index is excluded
Length = end - start

This is the convention in Python, Java, JavaScript, Go, Rust, and most modern languages.

Half-Open Interval Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
s = "0123456789"
 
# [start, end) - end is EXCLUDED
s[0:5]   # "01234" - indices 0, 1, 2, 3, 4 (length = 5 - 0 = 5)
s[3:7]   # "3456"  - indices 3, 4, 5, 6    (length = 7 - 3 = 4)
s[5:10]  # "56789" - indices 5, 6, 7, 8, 9 (length = 10 - 5 = 5)
 
# Empty substring when start == end
s[3:3]   # ""      - no characters
 
# Convenient properties:
# s[0:k] gives first k characters
# s[k:] gives everything after first k characters
# s[0:k] + s[k:] == s (split and rejoin at any point)

Why half-open intervals?

The exclusive end convention offers several mathematical advantages:

length = end - start: No off-by-one arithmetic needed
Contiguous ranges are clean: s[0:k] + s[k:n] covers the entire string with no overlap or gap
Empty ranges are natural: s[k:k] is empty (0 characters), not a single character
Length n string uses range [0, n): Matches the valid index range exactly

Watch for Inclusive End APIs

Some APIs use inclusive end (particularly older APIs or domain-specific functions). Always check documentation! The difference between exclusive and inclusive end is the most common source of off-by-one errors in substring operations.

Common Slicing Patterns

Certain substring extraction patterns appear repeatedly in programming. Mastering these patterns makes string manipulation fluent:

Pattern 1: Get first N characters (prefix)

Prefix Extraction
1
2
3
4
5
6
7
8
9
10
11
12
13
s = "Hello World"
 
# First N characters
first_5 = s[:5]    # "Hello" (0:5 is implied)
first_1 = s[:1]    # "H"
first_0 = s[:0]    # "" (empty)
 
# Practical use: truncating strings
def truncate(s: str, max_length: int) -> str:
    """Truncate string to max_length characters."""
    return s[:max_length]
 
# Example: truncate("Hello World", 5) → "Hello"

Pattern 2: Get last N characters (suffix)

Suffix Extraction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
s = "Hello World"
 
# Last N characters using negative index
last_5 = s[-5:]    # "World"
last_1 = s[-1:]    # "d"
last_3 = s[-3:]    # "rld"
 
# Alternative: calculate start position
n = 5
last_n = s[len(s) - n:]  # "World"
 
# Practical use: checking file extensions
def get_extension(filename: str) -> str:
    """Get last 4 characters (e.g., '.txt')."""
    return filename[-4:] if len(filename) >= 4 else filename

Pattern 3: Remove first/last N characters

Removing Characters from Ends
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
s = "Hello World"
 
# Remove first N characters (get suffix)
without_first_6 = s[6:]    # "World"
 
# Remove last N characters (get prefix)
without_last_6 = s[:-6]    # "Hello"
 
# Remove from both ends
trimmed = s[1:-1]          # "ello Worl" (remove first and last)
 
# Practical use: removing quotes
quoted = '"Hello"'
unquoted = quoted[1:-1]    # 'Hello'
 
# Practical use: removing prefix if present
def remove_prefix(s: str, prefix: str) -> str:
    if s.startswith(prefix):
        return s[len(prefix):]
    return s

Pattern 4: Split at position

Splitting at Position
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
s = "Hello World"
 
# Split at position k
k = 5
left = s[:k]    # "Hello"
right = s[k:]   # " World"
 
# Verify: left + right == s
assert left + right == s
 
# Practical use: parsing fixed-width fields
record = "John      30NYC"
#         ^^^^^^^^  ^^^^^^
#         name(10)  age(2)rest
 
name = record[:10].strip()  # "John"
age = record[10:12]         # "30"
city = record[12:]          # "NYC"

Python's Negative Indexing

Python's negative indices (s[-1] is last character, s[-2] is second-to-last) work in slicing too. s[-3:] gets last 3 characters, s[:-3] gets everything except last 3. This makes many patterns more readable, but be aware: not all languages support this.

Copy vs View Semantics: A Critical Distinction

When you extract a substring, does the language create a copy of the characters, or does it create a view (reference) into the original string? This distinction has significant performance and memory implications.

Copy semantics (most common):

The substring operation allocates new memory and copies the characters. The result is completely independent of the original.

Copy Semantics Illustration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Original string S = "Hello World" (stored in memory block A)
 
Substring operation: T = S[6:11]
 
Result with COPY semantics:
- New memory block B is allocated (5 characters)
- Characters "World" are copied from A to B
- T points to memory block B
- S and T are completely independent
 
Memory layout:
Block A: [H][e][l][l][o][ ][W][o][r][l][d]  ← S points here
Block B: [W][o][r][l][d]                     ← T points here
 
If S is garbage collected, T is unaffected.
Time: O(k) where k = length of substring
Space: O(k) for the new string

View semantics (some languages/cases):

The substring operation returns a reference to a portion of the original string. No characters are copied.

View Semantics Illustration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Original string S = "Hello World" (stored in memory block A)
 
Substring operation: T = view of S[6:11]
 
Result with VIEW semantics:
- No new memory is allocated
- T is a reference to block A, with offset=6 and length=5
- S and T share the same underlying memory
 
Memory layout:
Block A: [H][e][l][l][o][ ][W][o][r][l][d]
         S points here ↑        ↑ T points here (with length 5)
 
If S is still in use, the entire block A stays in memory.
Time: O(1) - just create the reference
Space: O(1) - no character copying
 
DANGER: If S is referenced only by T, the ENTIRE original string
stays in memory, not just the substring. This can cause memory leaks!

Which languages use which semantics?

Substring Semantics by Language
Language	Default Semantics	Notes
Python	Copy	Slicing always creates new string objects
JavaScript	Copy	All string operations return new strings
Java (pre-7u6)	View	Shared char[] could cause memory leaks
Java (7u6+)	Copy	Changed to copy to prevent memory issues
Go	Copy	All slice operations copy
Rust	View (&str)	String slices are borrowed references

The Java Memory Leak Story

Early Java versions used view semantics for substrings. A famous problem: read a 1MB string, extract a 10-character substring, discard the original. The 1MB char array stayed in memory because the tiny substring referenced it! Java 7 update 6 changed to copy semantics to fix this. This is a lesson in the hidden costs of optimization.

Performance: Time and Space Analysis

Understanding the cost of substring extraction is essential for writing efficient code:

Time complexity:

Copy semantics: O(k) where k = length of the extracted substring
View semantics: O(1) - just creating a pointer and length

Space complexity:

Copy semantics: O(k) - new memory for the substring
View semantics: O(1) - but original string memory is retained

Important insight:

With copy semantics, extracting a substring from a very long string is still efficient if the substring is short:

Substring Performance Examples
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Copy semantics (Python): Cost depends on SUBSTRING length, not source
import time
 
huge_string = "x" * 10_000_000  # 10 million character string
 
# This is O(k) where k = 5, regardless of huge_string length
start = time.time()
tiny = huge_string[:5]  # "xxxxx"
print(f"Extract 5 chars: {time.time() - start:.6f}s")  # ~0.000001s
 
# This is O(k) where k = 1_000_000
start = time.time()
million = huge_string[:1_000_000]
print(f"Extract 1M chars: {time.time() - start:.6f}s")  # ~0.01s
 
# The source string length doesn't affect performance
# Only the extracted length matters

When does substring extraction become expensive?

Extracting very long substrings: O(k) means long substrings take longer
Many repeated extractions: Each extraction copies; n extractions = O(n×k)
Extracting overlapping regions: Computing s[i:j] and s[i+1:j+1] shares characters, but both copy fully

Optimization: Avoid extraction when possible

Sometimes you can work with indices instead of creating substrings:

Working with Indices vs Substrings
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def compare_substrings_naive(s1: str, i1: int, s2: str, i2: int, length: int) -> bool:
    """
    Compare substrings by extracting them first.
    Time: O(length) for extraction + O(length) for comparison = O(length)
    Space: O(length) for two temporary substrings
    """
    sub1 = s1[i1:i1+length]  # Creates new string
    sub2 = s2[i2:i2+length]  # Creates new string
    return sub1 == sub2
 
 
def compare_substrings_efficient(s1: str, i1: int, s2: str, i2: int, length: int) -> bool:
    """
    Compare substrings by comparing characters directly.
    Time: O(length) for comparison
    Space: O(1) - no substring allocation
    """
    for k in range(length):
        if s1[i1 + k] != s2[i2 + k]:
            return False
    return True
 
# Both are O(length) time, but the second uses O(1) space
# and avoids memory allocation overhead

Index-Based Operations

In performance-critical code, consider whether you can avoid creating substrings entirely. If you just need to compare, search within, or iterate over a portion, working with index ranges is often faster and uses less memory.

Boundary Conditions and Edge Cases

Substring extraction has several edge cases that can cause bugs or errors. Understanding these cases is essential for robust code:

Edge case 1: Out-of-bounds indices

Different languages handle invalid indices differently:

Out-of-Bounds Handling
1
2
3
4
5
6
7
8
9
10
11
12
13
s = "hello"  # length 5, valid indices 0-4
 
# Python SILENTLY CLAMPS out-of-bounds slice indices
s[0:100]    # "hello" (end clamped to 5)
s[-100:3]   # "hel"   (start clamped to 0)
s[10:20]    # ""      (both beyond end, empty result)
s[3:1]      # ""      (start > end, empty result)
 
# This is convenient but can hide bugs!
# You might get "" when you expected data
 
# Note: Direct indexing DOES raise error
# s[10]  # IndexError: string index out of range

Edge case 2: Empty string source

Empty String Handling
1
2
3
4
5
6
7
8
9
10
s = ""  # empty string
 
# Any extraction from empty string gives empty string
s[0:0]   # ""
s[0:1]   # "" (bounds clamped, nothing to extract)
s[:5]    # ""
 
# This is usually fine, but watch for assumptions:
# If you expect s[:1] to give you the first character,
# you'll get "" instead of raising an error.

Edge case 3: Single character extractions

Single Character Extraction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
s = "hello"
 
# Character access vs single-char slice
char = s[0]        # 'h' (a character/string of length 1)
slice = s[0:1]     # 'h' (explicitly a string of length 1)
 
# In Python, these are equivalent (both return str)
type(s[0])         # <class 'str'>
type(s[0:1])       # <class 'str'>
s[0] == s[0:1]     # True
 
# But semantically:
# - s[0] means "the character at position 0"
# - s[0:1] means "the substring from 0 to 1 (exclusive)"

Silent Failure vs Exceptions

Python and JavaScript 'helpfully' return empty strings for invalid ranges. This can hide bugs where you expected data but got nothing. Java throws exceptions, forcing you to handle errors explicitly. Know your language's behavior and write defensive code accordingly.

Substring in Algorithms

Substring extraction is a core operation in many algorithms. Understanding when to use it (and when not to) is key to writing efficient code.

Example 1: Finding all substrings of length k

Extracting All Length-K Substrings
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def get_all_substrings_of_length_k(s: str, k: int) -> list[str]:
    """
    Extract all contiguous substrings of length k.
    
    For s = "abcde" and k = 3:
    Returns: ["abc", "bcd", "cde"]
    
    Time: O((n-k+1) * k) = O(n*k) - each extraction is O(k)
    Space: O((n-k+1) * k) for storing all substrings
    """
    result = []
    for i in range(len(s) - k + 1):
        result.append(s[i:i+k])
    return result
 
 
def count_unique_substrings_of_length_k(s: str, k: int) -> int:
    """
    Count unique substrings of length k using a set.
    
    Time: O(n*k) - extracting substrings
    Space: O(n*k) worst case for the set
    """
    seen = set()
    for i in range(len(s) - k + 1):
        seen.add(s[i:i+k])
    return len(seen)
 
# Example: count_unique_substrings_of_length_k("abcabc", 3)
# Substrings: "abc", "bca", "cab", "abc"
# Unique: {"abc", "bca", "cab"} → 3

Example 2: Checking if string contains a substring (naive approach)

Substring Search (Naive)
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def contains_substring_naive(text: str, pattern: str) -> bool:
    """
    Check if text contains pattern using substring extraction.
    
    Time: O((n-m+1) * m) where n = len(text), m = len(pattern)
    This is O(n*m) in the worst case.
    
    Note: Python's 'in' operator is much faster (uses optimized algorithms)
    """
    n, m = len(text), len(pattern)
    for i in range(n - m + 1):
        if text[i:i+m] == pattern:  # O(m) extraction + O(m) comparison
            return True
    return False
 
# Better: use built-in 'in' operator
def contains_substring_better(text: str, pattern: str) -> bool:
    return pattern in text  # Uses Boyer-Moore or similar internally
 
# Understanding the naive approach helps appreciate 
# why advanced algorithms (KMP, Rabin-Karp) exist

Example 3: Longest common prefix of two strings

Longest Common Prefix
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def longest_common_prefix(s1: str, s2: str) -> str:
    """
    Find the longest common prefix of two strings.
    
    Approach 1: Find common length, then extract once
    Time: O(min(n, m)) for comparison + O(k) for extraction
    Space: O(k) where k = length of common prefix
    """
    min_len = min(len(s1), len(s2))
    common_length = 0
    
    for i in range(min_len):
        if s1[i] == s2[i]:
            common_length += 1
        else:
            break
    
    return s1[:common_length]  # Extract once at the end
 
 
# Example:
# longest_common_prefix("flower", "flow") → "flow"
# longest_common_prefix("abc", "xyz") → ""

Extract Late, Not Early

In algorithms that search through strings, prefer to find the boundaries first (using index comparisons), then extract the substring once at the end. This avoids creating many temporary substring objects during the search phase.

Special Operations: Prefix and Suffix Checking

Checking whether a string starts with or ends with a specific sequence is so common that most languages provide dedicated methods. These are optimized alternatives to manual substring extraction and comparison.

Prefix checking (starts with):

Prefix Checking
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
s = "Hello World"
 
# Built-in method (preferred)
s.startswith("Hello")   # True
s.startswith("World")   # False
s.startswith("")        # True (empty is a prefix of everything)
 
# Manual approach (slower, less readable)
prefix = "Hello"
s[:len(prefix)] == prefix  # True
 
# Multiple prefixes
s.startswith(("Hello", "Goodbye"))  # True (tuple of options)
 
# Practical use: URL scheme detection
url = "https://example.com"
is_secure = url.startswith("https://")

Suffix checking (ends with):

Suffix Checking
1
2
3
4
5
6
7
8
9
10
11
12
13
s = "document.pdf"
 
# Built-in method (preferred)
s.endswith(".pdf")    # True
s.endswith(".txt")    # False
s.endswith("")        # True
 
# Multiple suffixes
s.endswith((".pdf", ".doc", ".txt"))  # True
 
# Practical use: file type detection
def is_image(filename: str) -> bool:
    return filename.lower().endswith((".jpg", ".jpeg", ".png", ".gif"))

Prefer Built-In Methods

Always use startswith()/endswith() when available. They're more readable than manual slicing, less error-prone (handle edge cases correctly), and often optimized. The manual approaches exist if you need custom behavior, but built-ins should be your default.

Summary: Substring Extraction Cost Model

Let's consolidate the cost model for substring extraction:

Substring Extraction: Complete Cost Model
Operation	Time Complexity	Space Complexity	Notes
Extract substring (copy)	O(k)	O(k)	k = length of extracted substring
Extract substring (view)	O(1)	O(1)*	*Original string memory is retained
startswith/endswith	O(m)	O(1)	m = length of prefix/suffix checked
Multiple extractions (n of length k)	O(n × k)	O(n × k)	Builds n new strings
Compare substrings via indices	O(k)	O(1)	No substring allocation

Key Takeaways

•Extraction cost is O(k): Proportional to the substring length, not the source length.
•Half-open intervals are standard: End index is excluded; length = end - start.
•Copy semantics are safer: Most modern languages copy to avoid memory leak issues.
•Built-in methods for prefix/suffix: Use startswith/endswith instead of manual slicing.
•Avoid extraction when possible: Index-based comparison saves memory allocation.
•Edge cases vary by language: Know whether out-of-bounds returns empty or throws.

What's next:

With substring extraction mastered, we're ready for the final operation in this module: string comparison. Comparing strings involves not just equality checks but also ordering, case sensitivity, and locale considerations—essential knowledge for searching, sorting, and validating text.

Page Complete

You now understand substring extraction comprehensively—from range conventions to copy/view semantics, from common patterns to performance optimization. This operation is central to text processing, and you're now equipped to use it correctly and efficiently.

4 / 6

Loading learning content...

Data Structures & AlgorithmsBasic String Operations

Basic String Operations & Their Cost Model

LevelBeginner

Duration60 mins

TopicBasic String Operations

4 / 6

Substring Extraction

Carving Out Pieces of Text

Substring extraction is the operation of obtaining a contiguous portion of a string—a subset of its characters starting at some position and ending at another. This operation is fundamental to:

Parsing structured data (extracting fields from formatted strings)
Text processing (getting prefixes, suffixes, and middle portions)
Pattern matching (isolating matched regions)
Data validation (checking specific portions of input)

Learning Objectives

What Is Substring Extraction?

A substring is a contiguous sequence of characters within a string. Substring extraction creates a new string containing only those characters.

Definition: Given a string S and indices i and j, the substring S[i:j] contains the characters from position i up to (but not including) position j.

Example:

String S = "Hello World"
S[0:5] = "Hello"    (characters at indices 0, 1, 2, 3, 4)
S[6:11] = "World"   (characters at indices 6, 7, 8, 9, 10)
S[3:8] = "lo Wo"    (characters at indices 3, 4, 5, 6, 7)

String "Hello World" with indices
Index	0	1	2	3	4	5	6	7	8	9	10
Char	H	e	l	l	o		W	o	r	l	d

Key properties of substring extraction:

Contiguous: Only consecutive characters can be extracted (not arbitrary positions)
Creates a new string: The result is independent of the original (in most languages)
Preserves order: Characters maintain their relative order
Length: The result has length (end - start) when using exclusive end
Bounds: Valid when 0 ≤ start ≤ end ≤ length

Special cases:

Empty substring: When start == end, the result is an empty string ""
Full string: When start=0 and end=length, you get a copy of the entire string
Prefix: Starting from 0
Suffix: Ending at length

Substring vs Other Terms

Range Conventions: Inclusive vs Exclusive

Half-open intervals [start, end):

Most modern languages use half-open intervals (also called "exclusive end"):

Start index is included
End index is excluded
Length = end - start

This is the convention in Python, Java, JavaScript, Go, Rust, and most modern languages.

Half-Open Interval Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
s = "0123456789"
 
# [start, end) - end is EXCLUDED
s[0:5]   # "01234" - indices 0, 1, 2, 3, 4 (length = 5 - 0 = 5)
s[3:7]   # "3456"  - indices 3, 4, 5, 6    (length = 7 - 3 = 4)
s[5:10]  # "56789" - indices 5, 6, 7, 8, 9 (length = 10 - 5 = 5)
 
# Empty substring when start == end
s[3:3]   # ""      - no characters
 
# Convenient properties:
# s[0:k] gives first k characters
# s[k:] gives everything after first k characters
# s[0:k] + s[k:] == s (split and rejoin at any point)

Why half-open intervals?

The exclusive end convention offers several mathematical advantages:

length = end - start: No off-by-one arithmetic needed
Contiguous ranges are clean: s[0:k] + s[k:n] covers the entire string with no overlap or gap
Empty ranges are natural: s[k:k] is empty (0 characters), not a single character
Length n string uses range [0, n): Matches the valid index range exactly

Watch for Inclusive End APIs

Common Slicing Patterns

Certain substring extraction patterns appear repeatedly in programming. Mastering these patterns makes string manipulation fluent:

Pattern 1: Get first N characters (prefix)

Prefix Extraction
1
2
3
4
5
6
7
8
9
10
11
12
13
s = "Hello World"
 
# First N characters
first_5 = s[:5]    # "Hello" (0:5 is implied)
first_1 = s[:1]    # "H"
first_0 = s[:0]    # "" (empty)
 
# Practical use: truncating strings
def truncate(s: str, max_length: int) -> str:
    """Truncate string to max_length characters."""
    return s[:max_length]
 
# Example: truncate("Hello World", 5) → "Hello"

Pattern 2: Get last N characters (suffix)

Suffix Extraction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
s = "Hello World"
 
# Last N characters using negative index
last_5 = s[-5:]    # "World"
last_1 = s[-1:]    # "d"
last_3 = s[-3:]    # "rld"
 
# Alternative: calculate start position
n = 5
last_n = s[len(s) - n:]  # "World"
 
# Practical use: checking file extensions
def get_extension(filename: str) -> str:
    """Get last 4 characters (e.g., '.txt')."""
    return filename[-4:] if len(filename) >= 4 else filename

Pattern 3: Remove first/last N characters

Removing Characters from Ends
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
s = "Hello World"
 
# Remove first N characters (get suffix)
without_first_6 = s[6:]    # "World"
 
# Remove last N characters (get prefix)
without_last_6 = s[:-6]    # "Hello"
 
# Remove from both ends
trimmed = s[1:-1]          # "ello Worl" (remove first and last)
 
# Practical use: removing quotes
quoted = '"Hello"'
unquoted = quoted[1:-1]    # 'Hello'
 
# Practical use: removing prefix if present
def remove_prefix(s: str, prefix: str) -> str:
    if s.startswith(prefix):
        return s[len(prefix):]
    return s

Pattern 4: Split at position

Splitting at Position
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
s = "Hello World"
 
# Split at position k
k = 5
left = s[:k]    # "Hello"
right = s[k:]   # " World"
 
# Verify: left + right == s
assert left + right == s
 
# Practical use: parsing fixed-width fields
record = "John      30NYC"
#         ^^^^^^^^  ^^^^^^
#         name(10)  age(2)rest
 
name = record[:10].strip()  # "John"
age = record[10:12]         # "30"
city = record[12:]          # "NYC"

Python's Negative Indexing

Copy vs View Semantics: A Critical Distinction

Copy semantics (most common):

The substring operation allocates new memory and copies the characters. The result is completely independent of the original.

Copy Semantics Illustration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Original string S = "Hello World" (stored in memory block A)
 
Substring operation: T = S[6:11]
 
Result with COPY semantics:
- New memory block B is allocated (5 characters)
- Characters "World" are copied from A to B
- T points to memory block B
- S and T are completely independent
 
Memory layout:
Block A: [H][e][l][l][o][ ][W][o][r][l][d]  ← S points here
Block B: [W][o][r][l][d]                     ← T points here
 
If S is garbage collected, T is unaffected.
Time: O(k) where k = length of substring
Space: O(k) for the new string

View semantics (some languages/cases):

The substring operation returns a reference to a portion of the original string. No characters are copied.

View Semantics Illustration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Original string S = "Hello World" (stored in memory block A)
 
Substring operation: T = view of S[6:11]
 
Result with VIEW semantics:
- No new memory is allocated
- T is a reference to block A, with offset=6 and length=5
- S and T share the same underlying memory
 
Memory layout:
Block A: [H][e][l][l][o][ ][W][o][r][l][d]
         S points here ↑        ↑ T points here (with length 5)
 
If S is still in use, the entire block A stays in memory.
Time: O(1) - just create the reference
Space: O(1) - no character copying
 
DANGER: If S is referenced only by T, the ENTIRE original string
stays in memory, not just the substring. This can cause memory leaks!

Which languages use which semantics?

Substring Semantics by Language
Language	Default Semantics	Notes
Python	Copy	Slicing always creates new string objects
JavaScript	Copy	All string operations return new strings
Java (pre-7u6)	View	Shared char[] could cause memory leaks
Java (7u6+)	Copy	Changed to copy to prevent memory issues
Go	Copy	All slice operations copy
Rust	View (&str)	String slices are borrowed references

The Java Memory Leak Story

Performance: Time and Space Analysis

Understanding the cost of substring extraction is essential for writing efficient code:

Time complexity:

Copy semantics: O(k) where k = length of the extracted substring
View semantics: O(1) - just creating a pointer and length

Space complexity:

Copy semantics: O(k) - new memory for the substring
View semantics: O(1) - but original string memory is retained

Important insight:

With copy semantics, extracting a substring from a very long string is still efficient if the substring is short:

Substring Performance Examples
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Copy semantics (Python): Cost depends on SUBSTRING length, not source
import time
 
huge_string = "x" * 10_000_000  # 10 million character string
 
# This is O(k) where k = 5, regardless of huge_string length
start = time.time()
tiny = huge_string[:5]  # "xxxxx"
print(f"Extract 5 chars: {time.time() - start:.6f}s")  # ~0.000001s
 
# This is O(k) where k = 1_000_000
start = time.time()
million = huge_string[:1_000_000]
print(f"Extract 1M chars: {time.time() - start:.6f}s")  # ~0.01s
 
# The source string length doesn't affect performance
# Only the extracted length matters

When does substring extraction become expensive?

Extracting very long substrings: O(k) means long substrings take longer
Many repeated extractions: Each extraction copies; n extractions = O(n×k)
Extracting overlapping regions: Computing s[i:j] and s[i+1:j+1] shares characters, but both copy fully

Optimization: Avoid extraction when possible

Sometimes you can work with indices instead of creating substrings:

Working with Indices vs Substrings
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def compare_substrings_naive(s1: str, i1: int, s2: str, i2: int, length: int) -> bool:
    """
    Compare substrings by extracting them first.
    Time: O(length) for extraction + O(length) for comparison = O(length)
    Space: O(length) for two temporary substrings
    """
    sub1 = s1[i1:i1+length]  # Creates new string
    sub2 = s2[i2:i2+length]  # Creates new string
    return sub1 == sub2
 
 
def compare_substrings_efficient(s1: str, i1: int, s2: str, i2: int, length: int) -> bool:
    """
    Compare substrings by comparing characters directly.
    Time: O(length) for comparison
    Space: O(1) - no substring allocation
    """
    for k in range(length):
        if s1[i1 + k] != s2[i2 + k]:
            return False
    return True
 
# Both are O(length) time, but the second uses O(1) space
# and avoids memory allocation overhead

Index-Based Operations

Boundary Conditions and Edge Cases

Substring extraction has several edge cases that can cause bugs or errors. Understanding these cases is essential for robust code:

Edge case 1: Out-of-bounds indices

Different languages handle invalid indices differently:

Out-of-Bounds Handling
1
2
3
4
5
6
7
8
9
10
11
12
13
s = "hello"  # length 5, valid indices 0-4
 
# Python SILENTLY CLAMPS out-of-bounds slice indices
s[0:100]    # "hello" (end clamped to 5)
s[-100:3]   # "hel"   (start clamped to 0)
s[10:20]    # ""      (both beyond end, empty result)
s[3:1]      # ""      (start > end, empty result)
 
# This is convenient but can hide bugs!
# You might get "" when you expected data
 
# Note: Direct indexing DOES raise error
# s[10]  # IndexError: string index out of range

Edge case 2: Empty string source

Empty String Handling
1
2
3
4
5
6
7
8
9
10
s = ""  # empty string
 
# Any extraction from empty string gives empty string
s[0:0]   # ""
s[0:1]   # "" (bounds clamped, nothing to extract)
s[:5]    # ""
 
# This is usually fine, but watch for assumptions:
# If you expect s[:1] to give you the first character,
# you'll get "" instead of raising an error.

Edge case 3: Single character extractions

Single Character Extraction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
s = "hello"
 
# Character access vs single-char slice
char = s[0]        # 'h' (a character/string of length 1)
slice = s[0:1]     # 'h' (explicitly a string of length 1)
 
# In Python, these are equivalent (both return str)
type(s[0])         # <class 'str'>
type(s[0:1])       # <class 'str'>
s[0] == s[0:1]     # True
 
# But semantically:
# - s[0] means "the character at position 0"
# - s[0:1] means "the substring from 0 to 1 (exclusive)"

Silent Failure vs Exceptions

Substring in Algorithms

Substring extraction is a core operation in many algorithms. Understanding when to use it (and when not to) is key to writing efficient code.

Example 1: Finding all substrings of length k

Extracting All Length-K Substrings
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def get_all_substrings_of_length_k(s: str, k: int) -> list[str]:
    """
    Extract all contiguous substrings of length k.
    
    For s = "abcde" and k = 3:
    Returns: ["abc", "bcd", "cde"]
    
    Time: O((n-k+1) * k) = O(n*k) - each extraction is O(k)
    Space: O((n-k+1) * k) for storing all substrings
    """
    result = []
    for i in range(len(s) - k + 1):
        result.append(s[i:i+k])
    return result
 
 
def count_unique_substrings_of_length_k(s: str, k: int) -> int:
    """
    Count unique substrings of length k using a set.
    
    Time: O(n*k) - extracting substrings
    Space: O(n*k) worst case for the set
    """
    seen = set()
    for i in range(len(s) - k + 1):
        seen.add(s[i:i+k])
    return len(seen)
 
# Example: count_unique_substrings_of_length_k("abcabc", 3)
# Substrings: "abc", "bca", "cab", "abc"
# Unique: {"abc", "bca", "cab"} → 3

Example 2: Checking if string contains a substring (naive approach)

Substring Search (Naive)
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def contains_substring_naive(text: str, pattern: str) -> bool:
    """
    Check if text contains pattern using substring extraction.
    
    Time: O((n-m+1) * m) where n = len(text), m = len(pattern)
    This is O(n*m) in the worst case.
    
    Note: Python's 'in' operator is much faster (uses optimized algorithms)
    """
    n, m = len(text), len(pattern)
    for i in range(n - m + 1):
        if text[i:i+m] == pattern:  # O(m) extraction + O(m) comparison
            return True
    return False
 
# Better: use built-in 'in' operator
def contains_substring_better(text: str, pattern: str) -> bool:
    return pattern in text  # Uses Boyer-Moore or similar internally
 
# Understanding the naive approach helps appreciate 
# why advanced algorithms (KMP, Rabin-Karp) exist

Example 3: Longest common prefix of two strings

Longest Common Prefix
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def longest_common_prefix(s1: str, s2: str) -> str:
    """
    Find the longest common prefix of two strings.
    
    Approach 1: Find common length, then extract once
    Time: O(min(n, m)) for comparison + O(k) for extraction
    Space: O(k) where k = length of common prefix
    """
    min_len = min(len(s1), len(s2))
    common_length = 0
    
    for i in range(min_len):
        if s1[i] == s2[i]:
            common_length += 1
        else:
            break
    
    return s1[:common_length]  # Extract once at the end
 
 
# Example:
# longest_common_prefix("flower", "flow") → "flow"
# longest_common_prefix("abc", "xyz") → ""

Extract Late, Not Early

Special Operations: Prefix and Suffix Checking

Prefix checking (starts with):

Prefix Checking
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
s = "Hello World"
 
# Built-in method (preferred)
s.startswith("Hello")   # True
s.startswith("World")   # False
s.startswith("")        # True (empty is a prefix of everything)
 
# Manual approach (slower, less readable)
prefix = "Hello"
s[:len(prefix)] == prefix  # True
 
# Multiple prefixes
s.startswith(("Hello", "Goodbye"))  # True (tuple of options)
 
# Practical use: URL scheme detection
url = "https://example.com"
is_secure = url.startswith("https://")

Suffix checking (ends with):

Suffix Checking
1
2
3
4
5
6
7
8
9
10
11
12
13
s = "document.pdf"
 
# Built-in method (preferred)
s.endswith(".pdf")    # True
s.endswith(".txt")    # False
s.endswith("")        # True
 
# Multiple suffixes
s.endswith((".pdf", ".doc", ".txt"))  # True
 
# Practical use: file type detection
def is_image(filename: str) -> bool:
    return filename.lower().endswith((".jpg", ".jpeg", ".png", ".gif"))

Prefer Built-In Methods

Summary: Substring Extraction Cost Model

Let's consolidate the cost model for substring extraction:

Substring Extraction: Complete Cost Model
Operation	Time Complexity	Space Complexity	Notes
Extract substring (copy)	O(k)	O(k)	k = length of extracted substring
Extract substring (view)	O(1)	O(1)*	*Original string memory is retained
startswith/endswith	O(m)	O(1)	m = length of prefix/suffix checked
Multiple extractions (n of length k)	O(n × k)	O(n × k)	Builds n new strings
Compare substrings via indices	O(k)	O(1)	No substring allocation

Key Takeaways

•Extraction cost is O(k): Proportional to the substring length, not the source length.
•Half-open intervals are standard: End index is excluded; length = end - start.
•Copy semantics are safer: Most modern languages copy to avoid memory leak issues.
•Built-in methods for prefix/suffix: Use startswith/endswith instead of manual slicing.
•Avoid extraction when possible: Index-based comparison saves memory allocation.
•Edge cases vary by language: Know whether out-of-bounds returns empty or throws.

What's next:

Page Complete

4 / 6