Data Structures & AlgorithmsHow Strings Are Represented

How Strings Are Represented (Conceptual View)

LevelBeginner

Duration45 mins

TopicHow Strings Are Represented

3 / 3

Length, Indexing, and Immutability vs Mutability

The Three Pillars of String Access

Having established what strings are logically (ordered sequences of characters) and how they're stored physically (contiguously in memory), we now explore the practical concepts that determine how you actually work with strings in code.

Three concepts form the foundation of all string manipulation:

Length — How many characters does this string contain?
Indexing — How do I access the character at a specific position?
Mutability — Can I change this string after it's created, or must I create a new one?

These aren't just implementation details—they're fundamental properties that affect how you design algorithms, predict performance, and avoid bugs. Master these concepts, and you'll write better string code in any language.

What You Will Learn

By the end of this page, you'll deeply understand string length semantics, master the nuances of indexing (including boundary conditions), and grasp the profound implications of immutability vs mutability for performance and correctness.

Understanding String Length

The length of a string is the number of characters it contains. This seems utterly simple—and for ASCII strings, it is. But as we'll see, 'length' has surprising depth.

Basic properties of length:

The empty string "" has length 0
A single-character string "A" has length 1
Length is always a non-negative integer
Length is a property of the string itself—it doesn't change unless the string changes

The fundamental length equation:

For a string with indices 0: through n-1, the length is n. Equivalently:

length(s) = highest_valid_index(s) + 1

Or:

valid_indices(s) = {0, 1, 2, ..., length(s) - 1}

This relationship between length and valid indices is critical for loop bounds and boundary checking.

Length and Valid Indices
String	Length	Valid Indices	Highest Index
"" (empty)	0	None	N/A
"A"	1	0	0
"Hi"	2	0, 1	1
"Hello"	5	0, 1, 2, 3, 4	4
"Algorithm"	9	0 through 8	8

The Off-by-One Pattern

The last valid index is always length - 1, not length. Accessing index = length is an out-of-bounds error. This is the most common string bug: for i = 0 to length (inclusive) instead of for i = 0 to length - 1 (inclusive) or for i = 0 to length (exclusive).

When Length Gets Complicated

For ASCII strings, length is straightforward: one byte = one character = one unit of length. But modern strings introduce complexity.

The three meanings of 'length':

Byte length: How many bytes does the string occupy in memory?
Code unit length: How many encoding units (e.g., UTF-16 code units) does it contain?
Character/grapheme length: How many visible characters does a human perceive?

These can all be different numbers for the same string:

String: "héllo👋"

Byte length (UTF-8):       10 bytes
  h(1) + é(2) + l(1) + l(1) + o(1) + 👋(4) = 10

Code unit length (UTF-16): 7 units
  h(1) + é(1) + l(1) + l(1) + o(1) + 👋(2) = 7

Grapheme/visual length:    6 characters
  h + é + l + l + o + 👋 = 6 visible things

The combining character problem:

Some 'characters' are actually composed of multiple code points. The letter 'é' can be represented as:

One code point: U+00E9 (LATIN SMALL LETTER E WITH ACUTE) — 1 unit
Two code points: U+0065 (e) + U+0301 (combining acute accent) — 2 units

Both render identically, but one has 'length' 1 and the other has 'length' 2 depending on how you count.

The emoji problem:

Modern emoji can be composed of multiple code points:

👨‍👩‍👧‍👦 (family emoji) is actually 7 code points joined by invisible 'joiner' characters
Different platforms might show it as 1, 4, or 7 'characters' depending on their length function

This isn't pedantry—these differences cause real bugs:

Truncating a message at 'character 10' might break an emoji
Splitting a string might split a character into invalid fragments
Character-by-character iteration might visit 'partial' characters

Know What Length You're Measuring

When working with Unicode strings, always clarify what 'length' means in your context. JavaScript's .length gives UTF-16 code units. Python 3's len() gives code points. Neither gives grapheme clusters (visible characters) by default. For user-facing operations (character limits, truncation), you often need grapheme-aware libraries.

Indexing: Accessing Characters by Position

Indexing is the operation of accessing a specific character by its position number. Given a string s and an index k, the expression s[k] retrieves the character at position k.

Index semantics:

String:   "ALGORITHM"
Indices:   0 1 2 3 4 5 6 7 8
           A L G O R I T H M

s[0] = 'A'  (first character)
s[4] = 'R'  (fifth character, zero-indexed)
s[8] = 'M'  (last character)

The valid index range:

For a string of length n:

Valid indices: 0, 1, 2, ..., n-1
Invalid indices: anything < 0 or >= n

Accessing an invalid index is an error. Different languages handle this differently:

Python/Java/C#: Throw an exception (IndexError, IndexOutOfBoundsException)
JavaScript: Return undefined (silent failure—dangerous!)
C/C++: Undefined behavior (anything can happen—very dangerous!)

Index Access: What Different Languages Do
Language	Valid Access	Out-of-Bounds Access
Python	s[k] → character	IndexError exception
Java	s.charAt(k) → char	StringIndexOutOfBoundsException
JavaScript	s[k] → string	undefined (silent!)
C	s[k] → char	Undefined behavior (dangerous!)
Rust	s.chars().nth(k)	None (explicit Option type)
Go	s[k] → byte	panic (runtime error)

Always Validate Indices

Never assume an index is valid. Before accessing s[k], confirm that 0 <= k < length(s). This is especially critical when indices come from user input, calculations, or other strings' lengths.

Negative Indexing: Counting from the End

Some languages (Python, Ruby, Perl) support negative indexing: a convenient way to access characters relative to the end of the string.

String:   "ALGORITHM"
Positive:  0 1 2 3 4 5 6 7 8
Negative: -9-8-7-6-5-4-3-2-1
           A L G O R I T H M

s[-1] = 'M'  (last character)
s[-2] = 'H'  (second-to-last)
s[-9] = 'A'  (first character, same as s[0])

The conversion formula:

For index i where i < 0:
  actual_index = length + i

Example: s[-1] where length = 9
  actual_index = 9 + (-1) = 8
  s[-1] ≡ s[8]

Why negative indexing is useful:

Access the last character: s[-1] instead of s[len(s)-1]
Traverse from end: s[-1], s[-2], s[-3], ...
Slice from end: s[-3:] gets the last 3 characters

Advantages of Negative Indexing

•Cleaner syntax for end-relative access
•Avoids repeated length calculations
•Reduces off-by-one error risk for 'last element'
•More readable intent: -1 clearly means 'last'

Cautions with Negative Indexing

•Not available in all languages (Java, JS, C)
•Can mask bugs: s[-len] works but s[-len-1] fails
•Mixing positive and negative indices can confuse
•Empty string has no valid negative indices

Negative Indexing Bounds

For a string of length n, valid negative indices are -1 through -n. Index -n is the first character; index -(n+1) is out of bounds. The empty string has length 0, so no negative index is valid for it.

Slicing: Accessing a Range of Characters

Beyond single-character access, most languages support slicing: extracting a contiguous subsequence (substring) by specifying a range.

Common slice notation:

s[start:end]  → Characters from position start to end-1 (inclusive)
                (end is exclusive—the character at 'end' is NOT included)

String:   "ALGORITHM"
Indices:   0 1 2 3 4 5 6 7 8
           A L G O R I T H M

s[0:4]  = "ALGO"   (positions 0, 1, 2, 3)
s[2:6]  = "GORI"   (positions 2, 3, 4, 5)
s[4:9]  = "RITHM"  (positions 4, 5, 6, 7, 8)
s[3:3]  = ""       (empty range)

Why exclusive end?

The half-open interval [start, end) has elegant properties:

Length formula: length(s[start:end]) = end - start. No +1 or -1.
Non-overlapping splits: s[0:k] + s[k:n] = s[0:n]. No gaps, no overlaps.
Empty range: s[k:k] naturally yields empty string.
Full string: s[0:len(s)] gives the whole string.

Slice Semantics Across Languages
Language	Syntax	End Inclusive?	Supports Negative?
Python	s[start:end]	No (exclusive)	Yes
JavaScript	s.slice(start, end)	No (exclusive)	Yes
Java	s.substring(start, end)	No (exclusive)	No
Ruby	s[start, length]	N/A (length-based)	Yes
Go	s[start:end]	No (exclusive)	No
C#	s.Substring(start, length)	N/A (length-based)	No

Slice with Omitted Bounds

Many languages allow omitting bounds: s[:k] = s[0:k] (first k chars), s[k:] = s[k:len(s)] (from k to end), s[:] = s[0:len(s)] (copy of entire string). This makes common operations concise.

Immutability: Strings That Cannot Change

One of the most important properties of strings in many languages is immutability: once a string is created, it cannot be modified.

What immutability means:

You cannot change individual characters: s[3] = 'X' is an error
You cannot insert, delete, or replace characters in-place
Every 'modification' operation creates a new string with the changes

Immutable by default:

Java, Python, JavaScript, C#, Go, Ruby, Swift

Mutable by default:

C (char arrays), C++ (std::string), Rust (String)

Example in Python:

s = "hello"
s[0] = 'H'    # TypeError: 'str' object does not support item assignment

# Instead, create a new string:
s = 'H' + s[1:]   # s is now "Hello", but this is a NEW string object

The Variable vs. The Value

Immutability refers to the string VALUE, not the variable. You can reassign a variable to point to a different string. What you can't do is change the characters within the string the variable currently points to. The string "hello" itself never becomes "Hello"—you create a new string "Hello" and point your variable there.

Visualizing immutability:

Before:
variable s ──────→ [Memory block: "hello"]

After s = 'H' + s[1:]:
variable s ──────→ [New memory block: "Hello"]
                   [Old memory block: "hello"] ← orphaned, will be garbage collected

The original string "hello" still exists (briefly) in memory—it wasn't modified. The variable s now points to a completely new string "Hello".

Why Languages Choose Immutability

Immutability might seem restrictive—why prevent modification? The reasons are compelling and explain why most modern languages default to immutable strings.

1. Thread Safety

If a string can't change, multiple threads can read it simultaneously without locks. No race conditions, no data corruption, no synchronization overhead. In concurrent programs, immutable strings are inherently safe.

2. Security

Sensitive strings (passwords, file paths, SQL queries) can't be modified maliciously after creation. A function receiving a string knows it won't change unexpectedly.

3. Caching and Hashing

Immutable strings can be safely cached. Their hash codes can be computed once and stored. HashMap keys work correctly because the key can't change after insertion.

4. Sharing Without Copying

Multiple variables can safely reference the same string—no need to copy defensively. a = "hello"; b = a doesn't copy the string; both point to the same memory.

5. Reasoning and Debugging

When strings can't change, you can reason about code more easily. If you see a string created at line 10, you know its content at line 100 (assuming the same variable).

Benefits of Immutable Strings

•Thread-safe by design — No locks needed for concurrent reads
•String interning is safe — Identical strings can share memory
•Reliable hash codes — Cache once, use forever
•Safe as dictionary keys — The key never changes after insertion
•No defensive copying needed — Pass strings freely without cloning
•Simpler mental model — A string is a fixed value, not a mutable container

The Cost of Immutability

Immutability isn't free. Every 'modification' creates a new string, which has performance implications.

The concatenation trap:

Consider building a string character by character:

result = ""
for char in some_list:  # Assume 1000 characters
    result = result + char  # Creates a new string each time!

What actually happens:

Iteration 1: Create new string of length 1
Iteration 2: Create new string of length 2 (copy 1, add 1)
Iteration 3: Create new string of length 3 (copy 2, add 1)
...
Iteration 1000: Create new string of length 1000 (copy 999, add 1)

Total characters copied: 1 + 2 + 3 + ... + 1000 = 500,500

This is O(n²) complexity for what should be an O(n) operation!

Never Build Strings with += in a Loop

String concatenation in a loop is one of the most common performance anti-patterns. For n concatenations, you get O(n²) copying. Use a StringBuilder (Java), list + join (Python), or similar buffered approach instead.

The solution: String builders

Languages provide mutable string-like objects specifically for construction:

Language	Mutable Builder	Usage
Java	StringBuilder	sb.append(x) repeatedly, then sb.toString()
C#	StringBuilder	Same as Java
Python	list + join	chars.append(x) repeatedly, then ''.join(chars)
JavaScript	Array + join	arr.push(x) repeatedly, then arr.join('')
Go	strings.Builder	builder.WriteString(x), then builder.String()

The builder accumulates content in a mutable buffer, then produces an immutable string at the end. This gives O(n) complexity.

The lesson:

Immutability is a design trade-off, not a free lunch. It provides safety and simplicity at the cost of requiring explicit patterns for construction. Know the pattern for your language.

Mutable Strings: When Modification Is Allowed

Some languages and contexts use mutable strings that can be changed in place.

C-style strings (char arrays):

char s[] = "hello";  // s is a mutable array of characters
s[0] = 'H';          // Valid! s is now "Hello"

C++ std::string:

std::string s = "hello";
s[0] = 'H';           // Valid! s is now "Hello"
s.append(" world");   // Valid! s is now "Hello world"

Rust's String type:

let mut s = String::from("hello");
s.push_str(" world");  // Valid! s is now "hello world"

With mutable strings, modifications happen in place—no new memory allocation needed (unless the string grows beyond its current capacity).

Advantages of Mutability

•Direct modification without copying
•In-place algorithms use less memory
•No O(n²) concatenation trap
•Fine-grained control over memory

Risks of Mutability

•Thread safety requires explicit locks
•Aliasing bugs: two variables, one string, unexpected changes
•Cannot safely use as hash keys
•Defensive copying may be needed

Choosing Between Models

Use immutable strings for data that shouldn't change (constants, identifiers, keys). Use mutable builders for constructing strings piece by piece. In languages that offer both (Rust: &str vs String), use immutable references for reading and mutable values for modification.

Practical Implications for Problem-Solving

Understanding length, indexing, and mutability directly impacts how you approach string problems. Here are patterns that emerge from this understanding:

Pattern 1: Boundary Checking

Always validate indices before access:

def safe_access(s, i):
    if 0 <= i < len(s):
        return s[i]
    return None  # or raise an error, or return default

Pattern 2: Building Strings Efficiently

# Bad: O(n²)
result = ""
for x in items:
    result += str(x)

# Good: O(n)
parts = [str(x) for x in items]
result = ''.join(parts)

Pattern 3: Using Indices for Two-Pointer Techniques

def is_palindrome(s):
    left, right = 0, len(s) - 1
    while left < right:
        if s[left] != s[right]:
            return False
        left += 1
        right -= 1
    return True

Operation Costs Based on String Properties
Operation	Immutable String	Mutable String
Get length	O(1)	O(1)
Access character at index k	O(1)	O(1)
Modify character at index k	O(n) - create new string	O(1) - in place
Concatenate n strings	O(n×avg_length) total	O(total_length)
Check if two variables share data	Often yes (safe)	Must check carefully

Algorithm Design Implications

In-place string algorithms (reverse, remove characters, etc.) naturally apply to mutable strings. With immutable strings, you often convert to a mutable form (list/array), perform the algorithm, then convert back. Factor this overhead into your complexity analysis.

Summary: Mastering String Mechanics

We've explored the three pillars of working with strings: length, indexing, and mutability. Let's consolidate the key insights:

Key Takeaways

•Length is not always straightforward — Bytes, code units, and graphemes can all differ. Know what you're counting.
•Valid indices range from 0 to length-1 — Accessing index = length is out of bounds. Always check boundaries.
•Negative indexing (where available) counts from the end — s[-1] is the last character, s[-k] is the k-th from end.
•Slicing uses half-open intervals — s[start:end] includes start, excludes end. Length = end - start.
•Most modern languages have immutable strings — 'Modification' creates new strings; the original is unchanged.
•Immutability provides safety at a cost — Thread safety, caching, sharing are free; building strings requires patterns.
•The concatenation trap is O(n²) — Use string builders or list+join for efficient construction.
•Understanding these concepts enables algorithm design — Two-pointer techniques, in-place algorithms, and efficient builders all depend on this knowledge.

Module complete:

You now understand how strings are represented from three perspectives: the logical view (ordered sequence), the physical view (contiguous memory), and the practical view (length, indexing, mutability). This comprehensive understanding prepares you to work with strings confidently in any language and to analyze the complexity of string algorithms systematically.

Module Complete

You've mastered the conceptual representation of strings. You understand not just what strings are, but how they work at logical, physical, and practical levels. This foundation will support everything from basic string manipulation to advanced pattern matching algorithms.

3 / 3

Loading learning content...

Data Structures & AlgorithmsHow Strings Are Represented

How Strings Are Represented (Conceptual View)

LevelBeginner

Duration45 mins

TopicHow Strings Are Represented

3 / 3

Length, Indexing, and Immutability vs Mutability

The Three Pillars of String Access

Three concepts form the foundation of all string manipulation:

Length — How many characters does this string contain?
Indexing — How do I access the character at a specific position?
Mutability — Can I change this string after it's created, or must I create a new one?

What You Will Learn

Understanding String Length

The length of a string is the number of characters it contains. This seems utterly simple—and for ASCII strings, it is. But as we'll see, 'length' has surprising depth.

Basic properties of length:

The empty string "" has length 0
A single-character string "A" has length 1
Length is always a non-negative integer
Length is a property of the string itself—it doesn't change unless the string changes

The fundamental length equation:

For a string with indices 0: through n-1, the length is n. Equivalently:

length(s) = highest_valid_index(s) + 1

Or:

valid_indices(s) = {0, 1, 2, ..., length(s) - 1}

This relationship between length and valid indices is critical for loop bounds and boundary checking.

Length and Valid Indices
String	Length	Valid Indices	Highest Index
"" (empty)	0	None	N/A
"A"	1	0	0
"Hi"	2	0, 1	1
"Hello"	5	0, 1, 2, 3, 4	4
"Algorithm"	9	0 through 8	8

The Off-by-One Pattern

When Length Gets Complicated

For ASCII strings, length is straightforward: one byte = one character = one unit of length. But modern strings introduce complexity.

The three meanings of 'length':

Byte length: How many bytes does the string occupy in memory?
Code unit length: How many encoding units (e.g., UTF-16 code units) does it contain?
Character/grapheme length: How many visible characters does a human perceive?

These can all be different numbers for the same string:

String: "héllo👋"

Byte length (UTF-8):       10 bytes
  h(1) + é(2) + l(1) + l(1) + o(1) + 👋(4) = 10

Code unit length (UTF-16): 7 units
  h(1) + é(1) + l(1) + l(1) + o(1) + 👋(2) = 7

Grapheme/visual length:    6 characters
  h + é + l + l + o + 👋 = 6 visible things

The combining character problem:

Some 'characters' are actually composed of multiple code points. The letter 'é' can be represented as:

One code point: U+00E9 (LATIN SMALL LETTER E WITH ACUTE) — 1 unit
Two code points: U+0065 (e) + U+0301 (combining acute accent) — 2 units

Both render identically, but one has 'length' 1 and the other has 'length' 2 depending on how you count.

The emoji problem:

Modern emoji can be composed of multiple code points:

👨‍👩‍👧‍👦 (family emoji) is actually 7 code points joined by invisible 'joiner' characters
Different platforms might show it as 1, 4, or 7 'characters' depending on their length function

This isn't pedantry—these differences cause real bugs:

Truncating a message at 'character 10' might break an emoji
Splitting a string might split a character into invalid fragments
Character-by-character iteration might visit 'partial' characters

Know What Length You're Measuring

Indexing: Accessing Characters by Position

Indexing is the operation of accessing a specific character by its position number. Given a string s and an index k, the expression s[k] retrieves the character at position k.

Index semantics:

String:   "ALGORITHM"
Indices:   0 1 2 3 4 5 6 7 8
           A L G O R I T H M

s[0] = 'A'  (first character)
s[4] = 'R'  (fifth character, zero-indexed)
s[8] = 'M'  (last character)

The valid index range:

For a string of length n:

Valid indices: 0, 1, 2, ..., n-1
Invalid indices: anything < 0 or >= n

Accessing an invalid index is an error. Different languages handle this differently:

Python/Java/C#: Throw an exception (IndexError, IndexOutOfBoundsException)
JavaScript: Return undefined (silent failure—dangerous!)
C/C++: Undefined behavior (anything can happen—very dangerous!)

Index Access: What Different Languages Do
Language	Valid Access	Out-of-Bounds Access
Python	s[k] → character	IndexError exception
Java	s.charAt(k) → char	StringIndexOutOfBoundsException
JavaScript	s[k] → string	undefined (silent!)
C	s[k] → char	Undefined behavior (dangerous!)
Rust	s.chars().nth(k)	None (explicit Option type)
Go	s[k] → byte	panic (runtime error)

Always Validate Indices

Never assume an index is valid. Before accessing s[k], confirm that 0 <= k < length(s). This is especially critical when indices come from user input, calculations, or other strings' lengths.

Negative Indexing: Counting from the End

Some languages (Python, Ruby, Perl) support negative indexing: a convenient way to access characters relative to the end of the string.

String:   "ALGORITHM"
Positive:  0 1 2 3 4 5 6 7 8
Negative: -9-8-7-6-5-4-3-2-1
           A L G O R I T H M

s[-1] = 'M'  (last character)
s[-2] = 'H'  (second-to-last)
s[-9] = 'A'  (first character, same as s[0])

The conversion formula:

For index i where i < 0:
  actual_index = length + i

Example: s[-1] where length = 9
  actual_index = 9 + (-1) = 8
  s[-1] ≡ s[8]

Why negative indexing is useful:

Access the last character: s[-1] instead of s[len(s)-1]
Traverse from end: s[-1], s[-2], s[-3], ...
Slice from end: s[-3:] gets the last 3 characters

Advantages of Negative Indexing

•Cleaner syntax for end-relative access
•Avoids repeated length calculations
•Reduces off-by-one error risk for 'last element'
•More readable intent: -1 clearly means 'last'

Cautions with Negative Indexing

•Not available in all languages (Java, JS, C)
•Can mask bugs: s[-len] works but s[-len-1] fails
•Mixing positive and negative indices can confuse
•Empty string has no valid negative indices

Negative Indexing Bounds

Slicing: Accessing a Range of Characters

Beyond single-character access, most languages support slicing: extracting a contiguous subsequence (substring) by specifying a range.

Common slice notation:

s[start:end]  → Characters from position start to end-1 (inclusive)
                (end is exclusive—the character at 'end' is NOT included)

String:   "ALGORITHM"
Indices:   0 1 2 3 4 5 6 7 8
           A L G O R I T H M

s[0:4]  = "ALGO"   (positions 0, 1, 2, 3)
s[2:6]  = "GORI"   (positions 2, 3, 4, 5)
s[4:9]  = "RITHM"  (positions 4, 5, 6, 7, 8)
s[3:3]  = ""       (empty range)

Why exclusive end?

The half-open interval [start, end) has elegant properties:

Length formula: length(s[start:end]) = end - start. No +1 or -1.
Non-overlapping splits: s[0:k] + s[k:n] = s[0:n]. No gaps, no overlaps.
Empty range: s[k:k] naturally yields empty string.
Full string: s[0:len(s)] gives the whole string.

Slice Semantics Across Languages
Language	Syntax	End Inclusive?	Supports Negative?
Python	s[start:end]	No (exclusive)	Yes
JavaScript	s.slice(start, end)	No (exclusive)	Yes
Java	s.substring(start, end)	No (exclusive)	No
Ruby	s[start, length]	N/A (length-based)	Yes
Go	s[start:end]	No (exclusive)	No
C#	s.Substring(start, length)	N/A (length-based)	No

Slice with Omitted Bounds

Many languages allow omitting bounds: s[:k] = s[0:k] (first k chars), s[k:] = s[k:len(s)] (from k to end), s[:] = s[0:len(s)] (copy of entire string). This makes common operations concise.

Immutability: Strings That Cannot Change

One of the most important properties of strings in many languages is immutability: once a string is created, it cannot be modified.

What immutability means:

You cannot change individual characters: s[3] = 'X' is an error
You cannot insert, delete, or replace characters in-place
Every 'modification' operation creates a new string with the changes

Immutable by default:

Java, Python, JavaScript, C#, Go, Ruby, Swift

Mutable by default:

C (char arrays), C++ (std::string), Rust (String)

Example in Python:

s = "hello"
s[0] = 'H'    # TypeError: 'str' object does not support item assignment

# Instead, create a new string:
s = 'H' + s[1:]   # s is now "Hello", but this is a NEW string object

The Variable vs. The Value

Visualizing immutability:

Before:
variable s ──────→ [Memory block: "hello"]

After s = 'H' + s[1:]:
variable s ──────→ [New memory block: "Hello"]
                   [Old memory block: "hello"] ← orphaned, will be garbage collected

The original string "hello" still exists (briefly) in memory—it wasn't modified. The variable s now points to a completely new string "Hello".

Why Languages Choose Immutability

Immutability might seem restrictive—why prevent modification? The reasons are compelling and explain why most modern languages default to immutable strings.

1. Thread Safety

2. Security

Sensitive strings (passwords, file paths, SQL queries) can't be modified maliciously after creation. A function receiving a string knows it won't change unexpectedly.

3. Caching and Hashing

Immutable strings can be safely cached. Their hash codes can be computed once and stored. HashMap keys work correctly because the key can't change after insertion.

4. Sharing Without Copying

Multiple variables can safely reference the same string—no need to copy defensively. a = "hello"; b = a doesn't copy the string; both point to the same memory.

5. Reasoning and Debugging

When strings can't change, you can reason about code more easily. If you see a string created at line 10, you know its content at line 100 (assuming the same variable).

Benefits of Immutable Strings

•Thread-safe by design — No locks needed for concurrent reads
•String interning is safe — Identical strings can share memory
•Reliable hash codes — Cache once, use forever
•Safe as dictionary keys — The key never changes after insertion
•No defensive copying needed — Pass strings freely without cloning
•Simpler mental model — A string is a fixed value, not a mutable container

The Cost of Immutability

Immutability isn't free. Every 'modification' creates a new string, which has performance implications.

The concatenation trap:

Consider building a string character by character:

result = ""
for char in some_list:  # Assume 1000 characters
    result = result + char  # Creates a new string each time!

What actually happens:

Iteration 1: Create new string of length 1
Iteration 2: Create new string of length 2 (copy 1, add 1)
Iteration 3: Create new string of length 3 (copy 2, add 1)
...
Iteration 1000: Create new string of length 1000 (copy 999, add 1)

Total characters copied: 1 + 2 + 3 + ... + 1000 = 500,500

This is O(n²) complexity for what should be an O(n) operation!

Never Build Strings with += in a Loop

The solution: String builders

Languages provide mutable string-like objects specifically for construction:

Language	Mutable Builder	Usage
Java	StringBuilder	sb.append(x) repeatedly, then sb.toString()
C#	StringBuilder	Same as Java
Python	list + join	chars.append(x) repeatedly, then ''.join(chars)
JavaScript	Array + join	arr.push(x) repeatedly, then arr.join('')
Go	strings.Builder	builder.WriteString(x), then builder.String()

The builder accumulates content in a mutable buffer, then produces an immutable string at the end. This gives O(n) complexity.

The lesson:

Immutability is a design trade-off, not a free lunch. It provides safety and simplicity at the cost of requiring explicit patterns for construction. Know the pattern for your language.

Mutable Strings: When Modification Is Allowed

Some languages and contexts use mutable strings that can be changed in place.

C-style strings (char arrays):

char s[] = "hello";  // s is a mutable array of characters
s[0] = 'H';          // Valid! s is now "Hello"

C++ std::string:

std::string s = "hello";
s[0] = 'H';           // Valid! s is now "Hello"
s.append(" world");   // Valid! s is now "Hello world"

Rust's String type:

let mut s = String::from("hello");
s.push_str(" world");  // Valid! s is now "hello world"

With mutable strings, modifications happen in place—no new memory allocation needed (unless the string grows beyond its current capacity).

Advantages of Mutability

•Direct modification without copying
•In-place algorithms use less memory
•No O(n²) concatenation trap
•Fine-grained control over memory

Risks of Mutability

•Thread safety requires explicit locks
•Aliasing bugs: two variables, one string, unexpected changes
•Cannot safely use as hash keys
•Defensive copying may be needed

Choosing Between Models

Practical Implications for Problem-Solving

Understanding length, indexing, and mutability directly impacts how you approach string problems. Here are patterns that emerge from this understanding:

Pattern 1: Boundary Checking

Always validate indices before access:

def safe_access(s, i):
    if 0 <= i < len(s):
        return s[i]
    return None  # or raise an error, or return default

Pattern 2: Building Strings Efficiently

# Bad: O(n²)
result = ""
for x in items:
    result += str(x)

# Good: O(n)
parts = [str(x) for x in items]
result = ''.join(parts)

Pattern 3: Using Indices for Two-Pointer Techniques

def is_palindrome(s):
    left, right = 0, len(s) - 1
    while left < right:
        if s[left] != s[right]:
            return False
        left += 1
        right -= 1
    return True

Operation Costs Based on String Properties
Operation	Immutable String	Mutable String
Get length	O(1)	O(1)
Access character at index k	O(1)	O(1)
Modify character at index k	O(n) - create new string	O(1) - in place
Concatenate n strings	O(n×avg_length) total	O(total_length)
Check if two variables share data	Often yes (safe)	Must check carefully

Algorithm Design Implications

Summary: Mastering String Mechanics

We've explored the three pillars of working with strings: length, indexing, and mutability. Let's consolidate the key insights:

Key Takeaways

•Length is not always straightforward — Bytes, code units, and graphemes can all differ. Know what you're counting.
•Valid indices range from 0 to length-1 — Accessing index = length is out of bounds. Always check boundaries.
•Negative indexing (where available) counts from the end — s[-1] is the last character, s[-k] is the k-th from end.
•Slicing uses half-open intervals — s[start:end] includes start, excludes end. Length = end - start.
•Most modern languages have immutable strings — 'Modification' creates new strings; the original is unchanged.
•Immutability provides safety at a cost — Thread safety, caching, sharing are free; building strings requires patterns.
•The concatenation trap is O(n²) — Use string builders or list+join for efficient construction.
•Understanding these concepts enables algorithm design — Two-pointer techniques, in-place algorithms, and efficient builders all depend on this knowledge.

Module complete:

Module Complete

3 / 3