Loading learning content...
String comparison is one of the most frequently performed operations in computing. Every time you:
...you're comparing strings. Yet comparison is more nuanced than it first appears. Is "Apple" equal to "apple"? Which comes first: "10" or "9"? Is "café" the same as "cafe"?
Understanding comparison deeply—equality vs ordering, case sensitivity, locale effects, and performance characteristics—is essential for writing correct, efficient string-handling code.
By the end of this page, you will understand: how string equality works at the character level, lexicographical ordering and its implications, case-sensitive vs case-insensitive comparison, the performance cost model for comparison, and common pitfalls in string comparison.
Two strings are equal if and only if they have the same length and exactly the same character at each position.
The equality algorithm:
123456789101112131415161718192021222324
def strings_equal(s1: str, s2: str) -> bool: """ Check if two strings are exactly equal. Time: O(n) where n = min(len(s1), len(s2)), O(1) if lengths differ Space: O(1) """ # Quick length check - O(1) if len(s1) != len(s2): return False # Character-by-character comparison - O(n) worst case for i in range(len(s1)): if s1[i] != s2[i]: return False # Early termination on first mismatch return True # In practice, use the == operator"hello" == "hello" # True"hello" == "Hello" # False (case matters)"hello" == "hello " # False (length differs)"" == "" # True (both empty)Key properties of string equality:
Why the length check matters:
The length check is O(1) and eliminates many comparisons immediately. If you're comparing a million strings, most pairs will differ in length. This optimization is crucial for performance.
Some languages distinguish between equality (same value) and identity (same object in memory). In Python, == tests equality while 'is' tests identity. For strings, equality is almost always what you want. Identity can give surprising results due to string interning.
Beyond equality, we often need to order strings—for sorting, binary search, or finding the minimum/maximum. Strings are ordered lexicographically, which is a generalization of alphabetical ordering to all characters.
The lexicographical ordering algorithm:
To compare strings s1 and s2:
12345678910111213141516171819202122232425262728293031323334
def compare_strings(s1: str, s2: str) -> int: """ Compare two strings lexicographically. Returns: -1 if s1 < s2 0 if s1 == s2 1 if s1 > s2 Time: O(min(len(s1), len(s2))) worst case """ min_len = min(len(s1), len(s2)) for i in range(min_len): if s1[i] < s2[i]: return -1 elif s1[i] > s2[i]: return 1 # All compared characters are equal # Shorter string is "less than" if len(s1) < len(s2): return -1 elif len(s1) > len(s2): return 1 else: return 0 # Examples using Python's built-in comparison"apple" < "banana" # True (a < b)"apple" < "apply" # True (e < y at position 4)"app" < "apple" # True (prefix is smaller)"Apple" < "apple" # True (A=65 < a=97 in ASCII)| Comparison | Result | Reason |
|---|---|---|
| "abc" vs "abd" | abc < abd | c < d at position 2 |
| "abc" vs "abc" | abc == abc | All characters match |
| "ab" vs "abc" | ab < abc | ab is a prefix of abc |
| "abc" vs "ab" | abc > ab | abc contains ab |
| "10" vs "9" | "10" < "9" | Character '1' (49) < '9' (57) |
| "A" vs "a" | A < a | ASCII: 'A'=65, 'a'=97 |
Lexicographical ordering compares character-by-character. "10" < "9" because '1' comes before '9' in character code. To sort strings as numbers, you must convert to integers first. This is a common source of bugs when sorting user input or file names.
Character comparison is based on numeric character codes. In ASCII (and extended to Unicode), each character has a numeric value, and these values determine ordering.
Key ASCII ranges:
Important ordering relationships:
123456789101112131415161718192021222324
# Getting character codesord('A') # 65ord('Z') # 90ord('a') # 97ord('z') # 122ord('0') # 48ord('9') # 57ord(' ') # 32 (space) # Converting backchr(65) # 'A'chr(97) # 'a' # Why uppercase < lowercase'A' < 'a' # True (65 < 97)'Z' < 'a' # True (90 < 97) # Mixed-case sorting consequenceswords = ["apple", "Banana", "cherry", "Apricot"]sorted(words) # ['Apricot', 'Banana', 'apple', 'cherry']# All uppercase comes before all lowercase! # This is rarely what humans expect# Use case-insensitive sorting for "natural" orderUnicode considerations:
Unicode extends far beyond ASCII. Different scripts, accented characters, and emoji all have code points that affect ordering:
For locale-aware sorting (where 'é' sorts near 'e'), you need locale-specific comparison functions.
For technical comparisons (exact match, hash keys), standard comparison is correct and fast. For human-facing sorting (names, titles), consider locale-aware comparison. Don't overcomplicate when you don't need to, but don't assume ASCII ordering is 'natural' for users.
By default, string comparison is case-sensitive—'A' and 'a' are different characters with different codes. But many applications need case-insensitive comparison:
Naive approach: Convert to same case
1234567891011121314151617
# Naive: Convert both to lowercase (or uppercase)def equals_ignore_case(s1: str, s2: str) -> bool: """Case-insensitive equality check.""" return s1.lower() == s2.lower() equals_ignore_case("Hello", "hello") # Trueequals_ignore_case("PYTHON", "Python") # True # Case-insensitive sortingwords = ["apple", "Banana", "cherry", "Apricot"]sorted(words, key=str.lower) # ['apple', 'Apricot', 'Banana', 'cherry'] # Case-insensitive searchdef contains_ignore_case(haystack: str, needle: str) -> bool: return needle.lower() in haystack.lower() contains_ignore_case("Hello World", "WORLD") # TrueThe hidden complexity: Unicode case folding
For ASCII (A-Z, a-z), case conversion is trivial: add or subtract 32. But Unicode has many more complexities:
For correct internationalized code, use proper Unicode-aware case folding.
1234567891011121314151617
# Simple case: ASCII letters"ABC".lower() # "abc""abc".upper() # "ABC" # German sharp S (ß)"ß".upper() # "SS" (one char becomes two!)"SS".lower() # "ss" (two chars stay two)"ß".lower() # "ß" (already lowercase) # casefold() is more aggressive for comparison"ß".casefold() # "ss" (better for matching)"SS".casefold() # "ss""ß".casefold() == "SS".casefold() # True # Recommendation: Use casefold() for case-insensitive comparisondef equals_unicode_safe(s1: str, s2: str) -> bool: return s1.casefold() == s2.casefold()Converting to lowercase creates a new string, which is O(n) time and space. If comparing many strings repeatedly, consider normalizing once and storing the lowercase version, rather than converting on every comparison.
Understanding the performance of string comparison is crucial for writing efficient algorithms:
Equality comparison (s1 == s2):
| Case | Time Complexity | Reason |
|---|---|---|
| Different lengths | O(1) | Length check fails immediately |
| Equal strings | O(n) | Must check all n characters |
| Differ at position k | O(k) | Early termination at first mismatch |
| Average case (random strings) | O(1) | Most random strings differ early |
Why is the average case O(1)?
For random strings, the probability of matching at position 0 is 1/alphabet_size. For a 26-letter alphabet:
The expected number of comparisons is a geometric series that converges to a constant. For English text, the "average" comparison examines only a few characters before finding a difference.
Ordering comparison (s1 < s2):
Same cost model as equality—we examine characters until we find a difference or exhaust one string.
123456789101112131415161718192021222324
# Sorting n strings of average length L# Sorting algorithm: O(n log n) comparisons# Each comparison: O(L) worst case, but often O(1) average def sort_strings(strings: list[str]) -> list[str]: """ Sort a list of strings. Time complexity analysis: - Number of comparisons: O(n log n) - Each comparison: O(L) worst, O(1) average - Total: O(n log n × L) worst case O(n log n) average case for random strings """ return sorted(strings) # Performance tip for repeated comparisons:# If comparing the same strings many times (e.g., in a set),# caching the hash can speed things up # Sets use hash for O(1) average lookup, not O(n) comparisonwords_set = set(["apple", "banana", "cherry"])"apple" in words_set # O(1) average, uses hash firstString comparison benefits hugely from early termination. Different lengths are caught in O(1). Different first characters are caught in O(1). Only strings that are very similar require examining many characters. Design your data to take advantage of this: putting distinguishing information early in strings can speed up comparisons.
Languages provide various functions and methods for string comparison. Here's a reference for common operations:
12345678910111213141516171819202122232425
s1, s2 = "Hello", "hello" # Equalitys1 == s2 # False (case-sensitive)s1 != s2 # True # Orderings1 < s2 # True (H=72 < h=104)s1 <= s2 # Trues1 > s2 # Falses1 >= s2 # False # Case-insensitive equalitys1.lower() == s2.lower() # Trues1.casefold() == s2.casefold() # True (better for Unicode) # Prefix/suffix checkings1.startswith("Hel") # Trues1.endswith("lo") # True # Containment"ell" in s1 # True # Identity (same object, not usually what you want)s1 is s2 # Could be True or False (interning)In Java, == compares object references, not string contents. Two strings with the same characters may be different objects and compare as unequal with ==. Always use .equals() for string comparison. This is one of the most common Java bugs.
For user-facing sorting and comparison, locale-aware comparison follows the sorting rules of a specific language or culture. This is different from simple character-code comparison.
Examples where locale matters:
Using locale-aware comparison:
123456789101112131415161718192021222324
import locale # Set locale for comparisonlocale.setlocale(locale.LC_ALL, 'en_US.UTF-8') # Use locale.strcoll for locale-aware comparisonwords = ['apple', 'Apricot', 'banana', 'élan'] # Standard sort (ASCII order)sorted(words)# ['Apricot', 'apple', 'banana', 'élan'] # Locale-aware sortsorted(words, key=locale.strxfrm)# ['apple', 'Apricot', 'banana', 'élan']# (depends on locale settings) # For more control, use the icu library# pip install PyICUfrom icu import Collator, Locale collator = Collator.createInstance(Locale('sv_SE')) # Swedish# Now 'ä' will sort after 'z' as in SwedishUse locale-aware comparison when displaying sorted data to users or when sorting should follow cultural expectations. For technical operations (hash tables, binary search in code), standard comparison is faster and more predictable. Know your use case.
String comparison seems simple, but it hides many common bugs. Here are the pitfalls to avoid:
1234567891011121314151617181920212223242526
# Pitfall 1: Numeric string sortingversions = ["1.0", "10.0", "2.0", "9.0"]sorted(versions) # ['1.0', '10.0', '2.0', '9.0'] - WRONG for versions# Fix: Parse version numbers properlyfrom packaging import version # or custom parsing # Pitfall 2: Whitespace"hello" == "hello " # False# Fix: Strip before comparison if neededs1.strip() == s2.strip() # Pitfall 3: Unicode normalizationimport unicodedata s1 = "é" # Single character (precomposed)s2 = "e\u0301" # e + combining acute accent (decomposed)s1 == s2 # False (different character sequences!) # Fix: Normalize bothunicodedata.normalize('NFC', s1) == unicodedata.normalize('NFC', s2) # True # Pitfall 4: None handlingdef safe_compare(s1, s2): if s1 is None or s2 is None: return s1 is s2 # Both None = equal, one None = not equal return s1 == s2When writing comparison code, ask: What are the edge cases? Empty strings? None/null? Leading/trailing whitespace? Mixed case? Non-ASCII characters? Test with these cases explicitly, and document your assumptions.
Let's consolidate the cost model for string comparison:
| Operation | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Equality (==) | O(n) worst, O(1) average | O(1) | Length check enables early exit |
| Ordering (<, >) | O(n) worst, O(1) average | O(1) | Exits at first difference |
| Case-insensitive | O(n) | O(n) | Requires lowercase copy |
| Locale-aware | O(n log n) or more | O(n) | Complex collation rules |
| startswith/endswith | O(k) | O(1) | k = length of pattern |
Module Complete:
You have now mastered the five fundamental string operations:
These operations form the foundation for all string algorithms. Understanding their individual costs lets you analyze and optimize string-processing code with confidence.
You now understand string comparison comprehensively—from equality semantics to lexicographical ordering, from case sensitivity to locale awareness, from performance optimization to common pitfalls. Combined with the previous pages, you have a complete foundation in basic string operations and their cost models.