Data Structures & AlgorithmsHow Strings Are Represented

How Strings Are Represented (Conceptual View)

LevelBeginner

Duration45 mins

TopicHow Strings Are Represented

1 / 3

Logical View of a String as an Ordered Sequence

Strings: More Than Just Text

When you type your name, send a message, or search for something online, you're working with strings. But what exactly is a string? Most developers immediately think of strings as just 'text'—a sequence of letters displayed on a screen. While this intuition captures part of the truth, it misses the profound mathematical structure that makes strings one of the most important and ubiquitous data structures in all of computer science.

In this page, we'll develop a rigorous conceptual model of strings that transcends any particular programming language. By understanding strings as ordered sequences of characters, you'll gain insight into why certain operations are possible, why others are expensive, and how to reason about string-based problems systematically.

What You Will Learn

By the end of this page, you will understand the mathematical definition of a string, the concept of ordering and position, the empty string as a boundary case, and how this logical view enables us to think about string operations before ever writing code.

The Mathematical Definition of a String

Before we can reason precisely about strings, we need a formal definition that captures their essential nature. This definition must be independent of any programming language, memory model, or hardware architecture.

Definition: A string is a finite, ordered sequence of symbols drawn from a fixed set called an alphabet.

Let's unpack each component of this definition:

Finite: A string has a definite, countable length. It begins somewhere and ends somewhere. There's no such thing as an infinite string in practical computing.
Ordered: The position of each symbol matters. The string "abc" is fundamentally different from "cab" or "bac", even though they contain the same symbols.
Sequence: Symbols appear one after another in a line. Unlike a set (which has no order) or a bag (which allows duplicates but has no order), a sequence preserves both duplication and position.
Symbols from an alphabet: Every symbol in a string comes from a predefined collection. In computing, this is typically the set of characters a system can represent—ASCII characters, Unicode code points, or some subset thereof.

Why Formal Definitions Matter

This mathematical formalism isn't academic pedantry. It's the foundation that allows us to prove properties about string algorithms, understand their complexity, and transfer knowledge across languages. When you know what a string is, you can reason about what operations should be possible.

Example:

Consider the string "hello". Under our formal definition:

It is finite: exactly 5 symbols long
It is ordered: 'h' comes before 'e', which comes before 'l', and so on
It is a sequence: the symbol 'l' appears twice, at positions 3 and 4
Alphabet: each symbol belongs to the English lowercase alphabet subset

This same formal structure applies whether you're dealing with "hello" in Python, "hello" in Java, or "hello" in Haskell. The underlying abstraction is universal.

Order: The Defining Property of Sequences

The property of order is what fundamentally distinguishes sequences (and hence strings) from other collections. This ordering has profound implications for both the meaning of strings and the operations we can perform on them.

Order creates identity:

In a set, {a, b, c} and {c, b, a} are identical—sets don't care about order. But the strings "abc" and "cba" are completely different entities. They may contain the same characters, but they represent different sequences of symbols.

This matters because text has inherent linear structure. The word "stressed" is meaningfully different from "desserts" (its reverse), even though they're anagrams. The order of characters carries semantic information.

Sets vs Sequences: A Fundamental Distinction
Property	Set	Sequence (String)
Order matters	No	Yes
Duplicates allowed	No	Yes
{a, a, b} vs {a, b}	Identical	Different ("aab" ≠ "ab")
{a, b, c} vs {c, b, a}	Identical	Different ("abc" ≠ "cba")
Position concept	Not applicable	Each element has a position
Access pattern	Membership test	Index-based retrieval

Order enables position-based operations:

Because strings are ordered, every character occupies a specific position (also called an index). This unlocks a rich set of operations:

Access the character at position k
Find the position where a character first appears
Extract a contiguous subsequence (substring) from position i to j
Traverse characters in sequence from start to end (or reverse)

Without the concept of order, none of these operations would be meaningful. You couldn't ask "what's the third character?" in a set—there is no 'third' in a set.

Thinking in Positions

When you think about a string, visualize it as a row of boxes, each containing one character and labeled with a position number. This mental image—sometimes called the "array mental model"—is extraordinarily useful even before you learn about arrays as a data structure.

Visualizing Strings: The Box Model

The most powerful mental model for understanding strings is to visualize them as a linear sequence of boxes, where each box contains exactly one character and is labeled with a position number.

Consider the string "ALGORITHM":

Position:  0    1    2    3    4    5    6    7    8
         ┌────┬────┬────┬────┬────┬────┬────┬────┬────┐
         │ A  │ L  │ G  │ O  │ R  │ I  │ T  │ H  │ M  │
         └────┴────┴────┴────┴────┴────┴────┴────┴────┘

This visualization immediately reveals several important properties:

Properties Visible in the Box Model

•Contiguity: Boxes are adjacent with no gaps. Character at position 3 is immediately before character at position 4.
•Bounded extent: The sequence starts at position 0 and ends at position 8. There's no position 9 or position -1.
•Unique positions: Each position holds exactly one character. No position is empty, and no position holds two characters.
•Zero-based indexing: Following computer science convention, we number positions starting from 0, not 1. (Some languages use 1-based indexing, but 0-based is more common.)
•Length relationship: If positions run from 0 to n-1, the string has length n. Here, positions 0-8 means length 9.

Why zero-based indexing?

Zero-based indexing might seem unnatural initially—we don't count "zeroth" in everyday life. But it has deep mathematical and computational advantages:

Offset interpretation: Position k represents how far you've moved from the start. Position 0 = no movement (you're at the start). Position 5 = you've moved past 5 characters.
Simpler arithmetic: With zero-based indexing, the formula for finding the character at position k is simply: address = start + k × size. No need to subtract 1.
Consistent with memory addressing: Computer memory naturally works this way—the address of an element is its offset from a base address.
Half-open intervals: Ranges like [0, n) naturally describe a sequence of length n without any "+1" or "-1" corrections.

Off-by-One Errors: The String Trap

The most common bug category in string manipulation is the off-by-one error. It stems from confusion about whether indices start at 0 or 1, whether end indices are inclusive or exclusive, and what happens at boundaries. Master the box model now to avoid these bugs later.

The Empty String: A Critical Edge Case

One of the most important—and often overlooked—concepts in string theory is the empty string: a string containing zero characters.

Definition: The empty string, denoted as "" (two quotation marks with nothing between them), or sometimes as ε (epsilon) in formal language theory, is the unique string of length 0.

The empty string isn't merely an edge case to handle grudgingly—it's a fundamental mathematical object with important properties:

Identity for concatenation:

Just as 0 is the identity for addition (n + 0 = n) and 1 is the identity for multiplication (n × 1 = n), the empty string is the identity for string concatenation:

"hello" + "" = "hello"
"" + "world" = "world"
"" + "" = ""

Concatenating the empty string doesn't change anything.

Properties of the Empty String

•Length is 0: By definition, length("") = 0. This is the only string with this property.
•No valid indices: You cannot access position 0, 1, or any index—there are no characters to access.
•Contained in every string: The empty string is a substring of every string, including itself.
•Unique: There's exactly one empty string, just as there's exactly one empty set.
•Not the same as null/nil: The empty string is a valid string that happens to have no characters. Null/nil typically means 'no string exists at all'—a crucial distinction.

Empty String ≠ Null

One of the most common bugs in string handling is confusing the empty string with null/nil/None. The empty string "" is a legitimate string with zero length. Null typically indicates the absence of any string value. Many runtime errors occur when code expects a string but receives null, or treats null as equivalent to "".

Why the empty string matters in practice:

Input validation: User input might be empty. If your code assumes at least one character exists, it will fail.
Split operations: Splitting a string can produce empty strings. For example, splitting "a,,b" by comma gives ["a", "", "b"].
Recursive algorithms: Many string algorithms have the empty string as their base case. It's what remains when you've consumed all characters.
Concatenation building: When building a string piece by piece, you typically start with the empty string and append to it.

Professional engineers always consider the empty string when designing string-handling code. It's not an afterthought—it's part of the design.

Strings vs. Characters: One vs. Many

A recurring source of confusion is the relationship between strings and characters. Let's clarify this distinction precisely.

A character is a single, atomic symbol from an alphabet. It cannot be broken down further (at the logical level). Examples: 'A', '7', '$', '😀'.

A string is a sequence of zero or more characters. It's a composite entity that may contain any number of characters.

Critically: A single-character string is not the same as a character.

The string "A" (a sequence containing one character) and the character 'A' (the character itself) are conceptually different:

"A" has type string and length 1
'A' has type character and the concept of length doesn't apply

Many programming languages enforce this distinction syntactically (using different quote marks), while others blur it for convenience.

Character vs String: A Detailed Comparison
Aspect	Character	String
Definition	Single atomic symbol	Sequence of symbols
Length	Not applicable (atomic)	0, 1, 2, ... n characters
Example	'A', 'b', '3', '!'	"hello", "A", ""
Can be empty?	No—a character is a unit	Yes—the empty string ""
Indexing	Not applicable	Yes—access character at position k
Typical syntax	Single quotes: 'x'	Double quotes: "x"
Relationship	Building block	Collection of building blocks

The containment relationship:

Characters compose strings. A string is made of characters in the same way that a word is made of letters. You can:

Extract a character from a string (given a position)
Construct a string from characters (given a sequence of characters)
Test whether a character appears within a string

This relationship is fundamental to understanding string operations. When we 'traverse' a string, we're visiting each character in sequence. When we 'compare' strings, we're comparing characters position by position.

Mental Model: The Letter Analogy

Think of characters as individual tiles in a word game (like Scrabble). Each tile has a single letter. A string is like a word on the board—a sequence of tiles placed in order. The tiles (characters) are the building blocks; the word (string) is the structured collection.

Substrings and Subsequences: Parts of a Whole

Because strings are ordered sequences, we can meaningfully extract portions of them. Two related but distinct concepts are substrings and subsequences.

Substring:

A substring is a contiguous portion of a string. Every character in a substring appears consecutively in the original string.

For the string "ALGORITHM":

"ALGO" is a substring (positions 0-3)
"RITH" is a substring (positions 4-7)
"ALGORITHM" is a substring of itself (the whole string)
"" (empty string) is a substring (trivially)
"ARM" is NOT a substring (characters aren't contiguous)

Subsequence:

A subsequence is obtained by deleting some (possibly zero) characters from a string without changing the order of remaining characters. Characters need not be contiguous.

For the string "ALGORITHM":

"ARM" is a subsequence (A at 0, R at 4, M at 8)
"AGT" is a subsequence (A at 0, G at 2, T at 6)
"ALGORITHM" is a subsequence of itself
"ANY" is NOT a subsequence (Y doesn't appear in the string)
"HMT" is NOT a subsequence (this ordering doesn't match the original)

Substring Properties

•Must be contiguous characters
•Defined by start and end positions
•A string of length n has n(n+1)/2 + 1 substrings
•Every substring is also a subsequence
•Easy to locate: just specify the range

Subsequence Properties

•Characters need not be adjacent
•Defined by selecting a subset of positions
•A string of length n has 2ⁿ subsequences
•Not every subsequence is a substring
•Harder to search: exponentially many options

Why This Distinction Matters

Many problems distinguish between substrings and subsequences. Finding the longest common substring (contiguous match) is fundamentally different from finding the longest common subsequence (matching characters that may be spread out). The algorithmic approaches, complexities, and applications differ significantly.

String Equality: When Are Two Strings the Same?

Two strings are equal if and only if they contain exactly the same characters in exactly the same positions.

Formal definition: Strings A and B are equal iff:

length(A) = length(B), AND
For every position i from 0 to length-1: A[i] = B[i]

This definition has several important implications:

Length must match first: If lengths differ, strings cannot be equal—no need to compare characters.
Position-by-position comparison: Characters at corresponding positions must match.
Character equality depends on the character set: Does 'A' equal 'a'? That depends on whether comparison is case-sensitive.
Order of comparison matters for efficiency: Comparing from the start lets us fail fast on early mismatches.

Examples of string equality:

String A	String B	Equal?	Reason
"hello"	"hello"	Yes	Identical characters at all positions
"hello"	"Hello"	Depends	Case-sensitive: No. Case-insensitive: Yes
"hello"	"hello "	No	Different lengths (trailing space)
"hello"	"helo"	No	Different lengths
""	""	Yes	Both are the empty string
"hello"	"olleh"	No	Same characters, different positions

Hidden Characters Can Break Equality

Strings that look identical may not be equal due to invisible characters: trailing spaces, different Unicode representations of the 'same' character (e.g., é as a single code point vs. e + combining accent), or different line ending conventions (\r\n vs \n). Always consider what equality truly means for your domain.

Lexicographical (Dictionary) Ordering:

Beyond equality, we often want to compare which of two strings comes 'first' in a sorted order. This is called lexicographical ordering—essentially, dictionary order.

The algorithm:

Compare characters position by position, starting from position 0
At the first position where characters differ, the string with the 'smaller' character comes first
If all compared characters match but lengths differ, the shorter string comes first

Examples:

"apple" < "banana" (because 'a' < 'b' at position 0)
"apple" < "application" (all match up to "appl", but "apple" is shorter)
"Zebra" < "apple" in ASCII (because 'Z' = 90 < 'a' = 97)

This ordering is fundamental to sorting strings and searching in sorted collections.

Operations Enabled by the Logical View

Our logical view of strings—as ordered sequences of characters—immediately suggests a set of natural operations. These operations are meaningful purely because of the mathematical structure we've defined, independent of any implementation.

Core operations on strings:

Fundamental String Operations
Operation	Description	Example
Length	Count the number of characters	length("hello") = 5
Access	Retrieve character at position k	"hello"[1] = 'e'
Concatenation	Join two strings end-to-end	"hel" + "lo" = "hello"
Substring	Extract contiguous portion	substring("hello", 1, 3) = "ell"
Search	Find position of a substring	find("hello", "ll") = 2
Comparison	Determine equality or ordering	"abc" < "abd"
Traversal	Visit each character in order	for c in "hello": print(c)
Reverse	Produce characters in opposite order	reverse("hello") = "olleh"

Why define operations abstractly?

At this point, we haven't discussed how these operations are implemented—we've only described what they do. This separation is intentional and powerful:

Reason about correctness independent of implementation: We can prove that reverse(reverse(s)) = s purely from the logical definition.
Understand operations before code: You know what concatenation means without seeing any code.
Recognize trade-offs later: When we discuss implementation, you'll understand why some operations are fast and others slow.

This is the essence of abstract thinking in computer science: understanding the what before the how.

Operations Are Not Free

Each operation has a cost. Accessing a character might be fast; searching for a substring might be slow. The logical view tells us what operations are possible, but the physical implementation determines their efficiency. We'll explore this in subsequent pages.

Summary: The Logical Foundation

Let's consolidate what we've learned about the logical view of strings:

Key Takeaways

•A string is a finite, ordered sequence of characters — This mathematical definition captures the essential nature of strings across all languages and systems.
•Order is the defining property — Position matters; "abc" ≠ "cba". This distinguishes sequences from sets.
•The box model visualizes strings — Picture a row of numbered boxes, each containing one character. This mental model prevents off-by-one errors.
•The empty string is fundamental — It's the identity for concatenation and must be handled in all string algorithms.
•Characters compose strings — A string is built from characters, but a single-character string is not the same as a character.
•Substrings are contiguous; subsequences are not — This distinction underlies many important algorithms.
•Equality requires same characters at same positions — Length and position-by-position comparison determine equality.
•Abstract operations derive from the model — Length, access, concatenation, search, and more are all meaningful because of the ordered sequence structure.

Looking ahead:

We've established what a string is at the logical level—an abstract mathematical object. But strings must actually exist somewhere in a computer's memory. In the next page, we'll develop intuition about physical storage: how the abstract concept of an ordered sequence translates into actual bytes in memory, and what this means for the cost of operations.

Page Complete

You now have a precise mental model of strings as ordered sequences. This logical foundation will support everything that follows—whether you're implementing string algorithms, analyzing their complexity, or debugging string-related bugs. Next, we explore how this abstract structure manifests in physical memory.

1 / 3

Loading learning content...

Data Structures & AlgorithmsHow Strings Are Represented

How Strings Are Represented (Conceptual View)

LevelBeginner

Duration45 mins

TopicHow Strings Are Represented

1 / 3

Logical View of a String as an Ordered Sequence

Strings: More Than Just Text

What You Will Learn

The Mathematical Definition of a String

Definition: A string is a finite, ordered sequence of symbols drawn from a fixed set called an alphabet.

Let's unpack each component of this definition:

Finite: A string has a definite, countable length. It begins somewhere and ends somewhere. There's no such thing as an infinite string in practical computing.
Ordered: The position of each symbol matters. The string "abc" is fundamentally different from "cab" or "bac", even though they contain the same symbols.
Sequence: Symbols appear one after another in a line. Unlike a set (which has no order) or a bag (which allows duplicates but has no order), a sequence preserves both duplication and position.
Symbols from an alphabet: Every symbol in a string comes from a predefined collection. In computing, this is typically the set of characters a system can represent—ASCII characters, Unicode code points, or some subset thereof.

Why Formal Definitions Matter

Example:

Consider the string "hello". Under our formal definition:

It is finite: exactly 5 symbols long
It is ordered: 'h' comes before 'e', which comes before 'l', and so on
It is a sequence: the symbol 'l' appears twice, at positions 3 and 4
Alphabet: each symbol belongs to the English lowercase alphabet subset

This same formal structure applies whether you're dealing with "hello" in Python, "hello" in Java, or "hello" in Haskell. The underlying abstraction is universal.

Order: The Defining Property of Sequences

Order creates identity:

Sets vs Sequences: A Fundamental Distinction
Property	Set	Sequence (String)
Order matters	No	Yes
Duplicates allowed	No	Yes
{a, a, b} vs {a, b}	Identical	Different ("aab" ≠ "ab")
{a, b, c} vs {c, b, a}	Identical	Different ("abc" ≠ "cba")
Position concept	Not applicable	Each element has a position
Access pattern	Membership test	Index-based retrieval

Order enables position-based operations:

Because strings are ordered, every character occupies a specific position (also called an index). This unlocks a rich set of operations:

Access the character at position k
Find the position where a character first appears
Extract a contiguous subsequence (substring) from position i to j
Traverse characters in sequence from start to end (or reverse)

Without the concept of order, none of these operations would be meaningful. You couldn't ask "what's the third character?" in a set—there is no 'third' in a set.

Thinking in Positions

Visualizing Strings: The Box Model

The most powerful mental model for understanding strings is to visualize them as a linear sequence of boxes, where each box contains exactly one character and is labeled with a position number.

Consider the string "ALGORITHM":

Position:  0    1    2    3    4    5    6    7    8
         ┌────┬────┬────┬────┬────┬────┬────┬────┬────┐
         │ A  │ L  │ G  │ O  │ R  │ I  │ T  │ H  │ M  │
         └────┴────┴────┴────┴────┴────┴────┴────┴────┘

This visualization immediately reveals several important properties:

Properties Visible in the Box Model

•Contiguity: Boxes are adjacent with no gaps. Character at position 3 is immediately before character at position 4.
•Bounded extent: The sequence starts at position 0 and ends at position 8. There's no position 9 or position -1.
•Unique positions: Each position holds exactly one character. No position is empty, and no position holds two characters.
•Zero-based indexing: Following computer science convention, we number positions starting from 0, not 1. (Some languages use 1-based indexing, but 0-based is more common.)
•Length relationship: If positions run from 0 to n-1, the string has length n. Here, positions 0-8 means length 9.

Why zero-based indexing?

Zero-based indexing might seem unnatural initially—we don't count "zeroth" in everyday life. But it has deep mathematical and computational advantages:

Offset interpretation: Position k represents how far you've moved from the start. Position 0 = no movement (you're at the start). Position 5 = you've moved past 5 characters.
Simpler arithmetic: With zero-based indexing, the formula for finding the character at position k is simply: address = start + k × size. No need to subtract 1.
Consistent with memory addressing: Computer memory naturally works this way—the address of an element is its offset from a base address.
Half-open intervals: Ranges like [0, n) naturally describe a sequence of length n without any "+1" or "-1" corrections.

Off-by-One Errors: The String Trap

The Empty String: A Critical Edge Case

One of the most important—and often overlooked—concepts in string theory is the empty string: a string containing zero characters.

Definition: The empty string, denoted as "" (two quotation marks with nothing between them), or sometimes as ε (epsilon) in formal language theory, is the unique string of length 0.

The empty string isn't merely an edge case to handle grudgingly—it's a fundamental mathematical object with important properties:

Identity for concatenation:

Just as 0 is the identity for addition (n + 0 = n) and 1 is the identity for multiplication (n × 1 = n), the empty string is the identity for string concatenation:

"hello" + "" = "hello"
"" + "world" = "world"
"" + "" = ""

Concatenating the empty string doesn't change anything.

Properties of the Empty String

•Length is 0: By definition, length("") = 0. This is the only string with this property.
•No valid indices: You cannot access position 0, 1, or any index—there are no characters to access.
•Contained in every string: The empty string is a substring of every string, including itself.
•Unique: There's exactly one empty string, just as there's exactly one empty set.
•Not the same as null/nil: The empty string is a valid string that happens to have no characters. Null/nil typically means 'no string exists at all'—a crucial distinction.

Empty String ≠ Null

Why the empty string matters in practice:

Input validation: User input might be empty. If your code assumes at least one character exists, it will fail.
Split operations: Splitting a string can produce empty strings. For example, splitting "a,,b" by comma gives ["a", "", "b"].
Recursive algorithms: Many string algorithms have the empty string as their base case. It's what remains when you've consumed all characters.
Concatenation building: When building a string piece by piece, you typically start with the empty string and append to it.

Professional engineers always consider the empty string when designing string-handling code. It's not an afterthought—it's part of the design.

Strings vs. Characters: One vs. Many

A recurring source of confusion is the relationship between strings and characters. Let's clarify this distinction precisely.

A character is a single, atomic symbol from an alphabet. It cannot be broken down further (at the logical level). Examples: 'A', '7', '$', '😀'.

A string is a sequence of zero or more characters. It's a composite entity that may contain any number of characters.

Critically: A single-character string is not the same as a character.

The string "A" (a sequence containing one character) and the character 'A' (the character itself) are conceptually different:

"A" has type string and length 1
'A' has type character and the concept of length doesn't apply

Many programming languages enforce this distinction syntactically (using different quote marks), while others blur it for convenience.

Character vs String: A Detailed Comparison
Aspect	Character	String
Definition	Single atomic symbol	Sequence of symbols
Length	Not applicable (atomic)	0, 1, 2, ... n characters
Example	'A', 'b', '3', '!'	"hello", "A", ""
Can be empty?	No—a character is a unit	Yes—the empty string ""
Indexing	Not applicable	Yes—access character at position k
Typical syntax	Single quotes: 'x'	Double quotes: "x"
Relationship	Building block	Collection of building blocks

The containment relationship:

Characters compose strings. A string is made of characters in the same way that a word is made of letters. You can:

Extract a character from a string (given a position)
Construct a string from characters (given a sequence of characters)
Test whether a character appears within a string

Mental Model: The Letter Analogy

Substrings and Subsequences: Parts of a Whole

Because strings are ordered sequences, we can meaningfully extract portions of them. Two related but distinct concepts are substrings and subsequences.

Substring:

A substring is a contiguous portion of a string. Every character in a substring appears consecutively in the original string.

For the string "ALGORITHM":

"ALGO" is a substring (positions 0-3)
"RITH" is a substring (positions 4-7)
"ALGORITHM" is a substring of itself (the whole string)
"" (empty string) is a substring (trivially)
"ARM" is NOT a substring (characters aren't contiguous)

Subsequence:

A subsequence is obtained by deleting some (possibly zero) characters from a string without changing the order of remaining characters. Characters need not be contiguous.

For the string "ALGORITHM":

"ARM" is a subsequence (A at 0, R at 4, M at 8)
"AGT" is a subsequence (A at 0, G at 2, T at 6)
"ALGORITHM" is a subsequence of itself
"ANY" is NOT a subsequence (Y doesn't appear in the string)
"HMT" is NOT a subsequence (this ordering doesn't match the original)

Substring Properties

•Must be contiguous characters
•Defined by start and end positions
•A string of length n has n(n+1)/2 + 1 substrings
•Every substring is also a subsequence
•Easy to locate: just specify the range

Subsequence Properties

•Characters need not be adjacent
•Defined by selecting a subset of positions
•A string of length n has 2ⁿ subsequences
•Not every subsequence is a substring
•Harder to search: exponentially many options

Why This Distinction Matters

String Equality: When Are Two Strings the Same?

Two strings are equal if and only if they contain exactly the same characters in exactly the same positions.

Formal definition: Strings A and B are equal iff:

length(A) = length(B), AND
For every position i from 0 to length-1: A[i] = B[i]

This definition has several important implications:

Length must match first: If lengths differ, strings cannot be equal—no need to compare characters.
Position-by-position comparison: Characters at corresponding positions must match.
Character equality depends on the character set: Does 'A' equal 'a'? That depends on whether comparison is case-sensitive.
Order of comparison matters for efficiency: Comparing from the start lets us fail fast on early mismatches.

Examples of string equality:

String A	String B	Equal?	Reason
"hello"	"hello"	Yes	Identical characters at all positions
"hello"	"Hello"	Depends	Case-sensitive: No. Case-insensitive: Yes
"hello"	"hello "	No	Different lengths (trailing space)
"hello"	"helo"	No	Different lengths
""	""	Yes	Both are the empty string
"hello"	"olleh"	No	Same characters, different positions

Hidden Characters Can Break Equality

Lexicographical (Dictionary) Ordering:

Beyond equality, we often want to compare which of two strings comes 'first' in a sorted order. This is called lexicographical ordering—essentially, dictionary order.

The algorithm:

Compare characters position by position, starting from position 0
At the first position where characters differ, the string with the 'smaller' character comes first
If all compared characters match but lengths differ, the shorter string comes first

Examples:

"apple" < "banana" (because 'a' < 'b' at position 0)
"apple" < "application" (all match up to "appl", but "apple" is shorter)
"Zebra" < "apple" in ASCII (because 'Z' = 90 < 'a' = 97)

This ordering is fundamental to sorting strings and searching in sorted collections.

Operations Enabled by the Logical View

Core operations on strings:

Fundamental String Operations
Operation	Description	Example
Length	Count the number of characters	length("hello") = 5
Access	Retrieve character at position k	"hello"[1] = 'e'
Concatenation	Join two strings end-to-end	"hel" + "lo" = "hello"
Substring	Extract contiguous portion	substring("hello", 1, 3) = "ell"
Search	Find position of a substring	find("hello", "ll") = 2
Comparison	Determine equality or ordering	"abc" < "abd"
Traversal	Visit each character in order	for c in "hello": print(c)
Reverse	Produce characters in opposite order	reverse("hello") = "olleh"

Why define operations abstractly?

At this point, we haven't discussed how these operations are implemented—we've only described what they do. This separation is intentional and powerful:

Reason about correctness independent of implementation: We can prove that reverse(reverse(s)) = s purely from the logical definition.
Understand operations before code: You know what concatenation means without seeing any code.
Recognize trade-offs later: When we discuss implementation, you'll understand why some operations are fast and others slow.

This is the essence of abstract thinking in computer science: understanding the what before the how.

Operations Are Not Free

Summary: The Logical Foundation

Let's consolidate what we've learned about the logical view of strings:

Key Takeaways

•A string is a finite, ordered sequence of characters — This mathematical definition captures the essential nature of strings across all languages and systems.
•Order is the defining property — Position matters; "abc" ≠ "cba". This distinguishes sequences from sets.
•The box model visualizes strings — Picture a row of numbered boxes, each containing one character. This mental model prevents off-by-one errors.
•The empty string is fundamental — It's the identity for concatenation and must be handled in all string algorithms.
•Characters compose strings — A string is built from characters, but a single-character string is not the same as a character.
•Substrings are contiguous; subsequences are not — This distinction underlies many important algorithms.
•Equality requires same characters at same positions — Length and position-by-position comparison determine equality.
•Abstract operations derive from the model — Length, access, concatenation, search, and more are all meaningful because of the ordered sequence structure.

Looking ahead:

Page Complete

1 / 3