Loading learning content...
Having established what strings are logically (ordered sequences of characters) and how they're stored physically (contiguously in memory), we now explore the practical concepts that determine how you actually work with strings in code.
Three concepts form the foundation of all string manipulation:
These aren't just implementation details—they're fundamental properties that affect how you design algorithms, predict performance, and avoid bugs. Master these concepts, and you'll write better string code in any language.
By the end of this page, you'll deeply understand string length semantics, master the nuances of indexing (including boundary conditions), and grasp the profound implications of immutability vs mutability for performance and correctness.
The length of a string is the number of characters it contains. This seems utterly simple—and for ASCII strings, it is. But as we'll see, 'length' has surprising depth.
Basic properties of length:
"" has length 0"A" has length 1The fundamental length equation:
For a string with indices 0: through n-1, the length is n. Equivalently:
length(s) = highest_valid_index(s) + 1
Or:
valid_indices(s) = {0, 1, 2, ..., length(s) - 1}
This relationship between length and valid indices is critical for loop bounds and boundary checking.
| String | Length | Valid Indices | Highest Index |
|---|---|---|---|
| "" (empty) | 0 | None | N/A |
| "A" | 1 | 0 | 0 |
| "Hi" | 2 | 0, 1 | 1 |
| "Hello" | 5 | 0, 1, 2, 3, 4 | 4 |
| "Algorithm" | 9 | 0 through 8 | 8 |
The last valid index is always length - 1, not length. Accessing index = length is an out-of-bounds error. This is the most common string bug: for i = 0 to length (inclusive) instead of for i = 0 to length - 1 (inclusive) or for i = 0 to length (exclusive).
For ASCII strings, length is straightforward: one byte = one character = one unit of length. But modern strings introduce complexity.
The three meanings of 'length':
These can all be different numbers for the same string:
String: "héllo👋"
Byte length (UTF-8): 10 bytes
h(1) + é(2) + l(1) + l(1) + o(1) + 👋(4) = 10
Code unit length (UTF-16): 7 units
h(1) + é(1) + l(1) + l(1) + o(1) + 👋(2) = 7
Grapheme/visual length: 6 characters
h + é + l + l + o + 👋 = 6 visible things
The combining character problem:
Some 'characters' are actually composed of multiple code points. The letter 'é' can be represented as:
Both render identically, but one has 'length' 1 and the other has 'length' 2 depending on how you count.
The emoji problem:
Modern emoji can be composed of multiple code points:
This isn't pedantry—these differences cause real bugs:
When working with Unicode strings, always clarify what 'length' means in your context. JavaScript's .length gives UTF-16 code units. Python 3's len() gives code points. Neither gives grapheme clusters (visible characters) by default. For user-facing operations (character limits, truncation), you often need grapheme-aware libraries.
Indexing is the operation of accessing a specific character by its position number. Given a string s and an index k, the expression s[k] retrieves the character at position k.
Index semantics:
String: "ALGORITHM"
Indices: 0 1 2 3 4 5 6 7 8
A L G O R I T H M
s[0] = 'A' (first character)
s[4] = 'R' (fifth character, zero-indexed)
s[8] = 'M' (last character)
The valid index range:
For a string of length n:
Accessing an invalid index is an error. Different languages handle this differently:
undefined (silent failure—dangerous!)| Language | Valid Access | Out-of-Bounds Access |
|---|---|---|
| Python | s[k] → character | IndexError exception |
| Java | s.charAt(k) → char | StringIndexOutOfBoundsException |
| JavaScript | s[k] → string | undefined (silent!) |
| C | s[k] → char | Undefined behavior (dangerous!) |
| Rust | s.chars().nth(k) | None (explicit Option type) |
| Go | s[k] → byte | panic (runtime error) |
Never assume an index is valid. Before accessing s[k], confirm that 0 <= k < length(s). This is especially critical when indices come from user input, calculations, or other strings' lengths.
Some languages (Python, Ruby, Perl) support negative indexing: a convenient way to access characters relative to the end of the string.
String: "ALGORITHM"
Positive: 0 1 2 3 4 5 6 7 8
Negative: -9-8-7-6-5-4-3-2-1
A L G O R I T H M
s[-1] = 'M' (last character)
s[-2] = 'H' (second-to-last)
s[-9] = 'A' (first character, same as s[0])
The conversion formula:
For index i where i < 0:
actual_index = length + i
Example: s[-1] where length = 9
actual_index = 9 + (-1) = 8
s[-1] ≡ s[8]
Why negative indexing is useful:
s[-1] instead of s[len(s)-1]s[-1], s[-2], s[-3], ...s[-3:] gets the last 3 charactersFor a string of length n, valid negative indices are -1 through -n. Index -n is the first character; index -(n+1) is out of bounds. The empty string has length 0, so no negative index is valid for it.
Beyond single-character access, most languages support slicing: extracting a contiguous subsequence (substring) by specifying a range.
Common slice notation:
s[start:end] → Characters from position start to end-1 (inclusive)
(end is exclusive—the character at 'end' is NOT included)
String: "ALGORITHM"
Indices: 0 1 2 3 4 5 6 7 8
A L G O R I T H M
s[0:4] = "ALGO" (positions 0, 1, 2, 3)
s[2:6] = "GORI" (positions 2, 3, 4, 5)
s[4:9] = "RITHM" (positions 4, 5, 6, 7, 8)
s[3:3] = "" (empty range)
Why exclusive end?
The half-open interval [start, end) has elegant properties:
length(s[start:end]) = end - start. No +1 or -1.s[0:k] + s[k:n] = s[0:n]. No gaps, no overlaps.s[k:k] naturally yields empty string.s[0:len(s)] gives the whole string.| Language | Syntax | End Inclusive? | Supports Negative? |
|---|---|---|---|
| Python | s[start:end] | No (exclusive) | Yes |
| JavaScript | s.slice(start, end) | No (exclusive) | Yes |
| Java | s.substring(start, end) | No (exclusive) | No |
| Ruby | s[start, length] | N/A (length-based) | Yes |
| Go | s[start:end] | No (exclusive) | No |
| C# | s.Substring(start, length) | N/A (length-based) | No |
Many languages allow omitting bounds: s[:k] = s[0:k] (first k chars), s[k:] = s[k:len(s)] (from k to end), s[:] = s[0:len(s)] (copy of entire string). This makes common operations concise.
One of the most important properties of strings in many languages is immutability: once a string is created, it cannot be modified.
What immutability means:
s[3] = 'X' is an errorImmutable by default:
Mutable by default:
Example in Python:
s = "hello"
s[0] = 'H' # TypeError: 'str' object does not support item assignment
# Instead, create a new string:
s = 'H' + s[1:] # s is now "Hello", but this is a NEW string object
Immutability refers to the string VALUE, not the variable. You can reassign a variable to point to a different string. What you can't do is change the characters within the string the variable currently points to. The string "hello" itself never becomes "Hello"—you create a new string "Hello" and point your variable there.
Visualizing immutability:
Before:
variable s ──────→ [Memory block: "hello"]
After s = 'H' + s[1:]:
variable s ──────→ [New memory block: "Hello"]
[Old memory block: "hello"] ← orphaned, will be garbage collected
The original string "hello" still exists (briefly) in memory—it wasn't modified. The variable s now points to a completely new string "Hello".
Immutability might seem restrictive—why prevent modification? The reasons are compelling and explain why most modern languages default to immutable strings.
1. Thread Safety
If a string can't change, multiple threads can read it simultaneously without locks. No race conditions, no data corruption, no synchronization overhead. In concurrent programs, immutable strings are inherently safe.
2. Security
Sensitive strings (passwords, file paths, SQL queries) can't be modified maliciously after creation. A function receiving a string knows it won't change unexpectedly.
3. Caching and Hashing
Immutable strings can be safely cached. Their hash codes can be computed once and stored. HashMap keys work correctly because the key can't change after insertion.
4. Sharing Without Copying
Multiple variables can safely reference the same string—no need to copy defensively. a = "hello"; b = a doesn't copy the string; both point to the same memory.
5. Reasoning and Debugging
When strings can't change, you can reason about code more easily. If you see a string created at line 10, you know its content at line 100 (assuming the same variable).
Immutability isn't free. Every 'modification' creates a new string, which has performance implications.
The concatenation trap:
Consider building a string character by character:
result = ""
for char in some_list: # Assume 1000 characters
result = result + char # Creates a new string each time!
What actually happens:
Total characters copied: 1 + 2 + 3 + ... + 1000 = 500,500
This is O(n²) complexity for what should be an O(n) operation!
String concatenation in a loop is one of the most common performance anti-patterns. For n concatenations, you get O(n²) copying. Use a StringBuilder (Java), list + join (Python), or similar buffered approach instead.
The solution: String builders
Languages provide mutable string-like objects specifically for construction:
| Language | Mutable Builder | Usage |
|---|---|---|
| Java | StringBuilder | sb.append(x) repeatedly, then sb.toString() |
| C# | StringBuilder | Same as Java |
| Python | list + join | chars.append(x) repeatedly, then ''.join(chars) |
| JavaScript | Array + join | arr.push(x) repeatedly, then arr.join('') |
| Go | strings.Builder | builder.WriteString(x), then builder.String() |
The builder accumulates content in a mutable buffer, then produces an immutable string at the end. This gives O(n) complexity.
The lesson:
Immutability is a design trade-off, not a free lunch. It provides safety and simplicity at the cost of requiring explicit patterns for construction. Know the pattern for your language.
Some languages and contexts use mutable strings that can be changed in place.
C-style strings (char arrays):
char s[] = "hello"; // s is a mutable array of characters
s[0] = 'H'; // Valid! s is now "Hello"
C++ std::string:
std::string s = "hello";
s[0] = 'H'; // Valid! s is now "Hello"
s.append(" world"); // Valid! s is now "Hello world"
Rust's String type:
let mut s = String::from("hello");
s.push_str(" world"); // Valid! s is now "hello world"
With mutable strings, modifications happen in place—no new memory allocation needed (unless the string grows beyond its current capacity).
Use immutable strings for data that shouldn't change (constants, identifiers, keys). Use mutable builders for constructing strings piece by piece. In languages that offer both (Rust: &str vs String), use immutable references for reading and mutable values for modification.
Understanding length, indexing, and mutability directly impacts how you approach string problems. Here are patterns that emerge from this understanding:
Pattern 1: Boundary Checking
Always validate indices before access:
def safe_access(s, i):
if 0 <= i < len(s):
return s[i]
return None # or raise an error, or return default
Pattern 2: Building Strings Efficiently
# Bad: O(n²)
result = ""
for x in items:
result += str(x)
# Good: O(n)
parts = [str(x) for x in items]
result = ''.join(parts)
Pattern 3: Using Indices for Two-Pointer Techniques
def is_palindrome(s):
left, right = 0, len(s) - 1
while left < right:
if s[left] != s[right]:
return False
left += 1
right -= 1
return True
| Operation | Immutable String | Mutable String |
|---|---|---|
| Get length | O(1) | O(1) |
| Access character at index k | O(1) | O(1) |
| Modify character at index k | O(n) - create new string | O(1) - in place |
| Concatenate n strings | O(n×avg_length) total | O(total_length) |
| Check if two variables share data | Often yes (safe) | Must check carefully |
In-place string algorithms (reverse, remove characters, etc.) naturally apply to mutable strings. With immutable strings, you often convert to a mutable form (list/array), perform the algorithm, then convert back. Factor this overhead into your complexity analysis.
We've explored the three pillars of working with strings: length, indexing, and mutability. Let's consolidate the key insights:
Module complete:
You now understand how strings are represented from three perspectives: the logical view (ordered sequence), the physical view (contiguous memory), and the practical view (length, indexing, mutability). This comprehensive understanding prepares you to work with strings confidently in any language and to analyze the complexity of string algorithms systematically.
You've mastered the conceptual representation of strings. You understand not just what strings are, but how they work at logical, physical, and practical levels. This foundation will support everything from basic string manipulation to advanced pattern matching algorithms.