Loading content...
We've explored binary representation and number systems in the abstract. Now it's time to see how this knowledge manifests in the physical reality of computer memory.
Every variable you declare — every integer, character, boolean, or floating-point number — exists as a specific pattern of bits at a specific location in memory. Understanding this isn't just theoretical knowledge; it explains why certain bugs occur, why some operations are fast and others slow, and how data structures actually work at the lowest level.
This page completes our journey through binary representation by showing you exactly how the primitive data types you use every day are encoded as bit patterns and stored in physical memory. With this knowledge, the "magic" of programming becomes explainable mechanism.
By the end of this page, you will understand how memory is organized, how different data types are encoded as bits, and why certain values and behaviors emerge from these representations. This is the foundation that makes everything else in data structures and algorithms make sense at the lowest level.
At the most fundamental level, computer memory is a gigantic linear array of bytes. Each byte has a unique numeric address, starting from 0 and going up to the maximum addressable memory.
The Memory Model:
Address Contents (8 bits each)
0x0000 [01001101]
0x0001 [11100010]
0x0002 [00000000]
0x0003 [11111111]
... ...
0xFFFFFFFF [10101010] (4 GB boundary in 32-bit systems)
Every piece of data in your program — every variable, every object, every function, every string — occupies some portion of this linear byte array. When you access data, you're really asking the hardware: "Retrieve the bytes starting at address X."
Key Properties of Memory:
Modern systems use virtual memory, where each program sees its own private address space starting at 0. The OS and hardware translate these virtual addresses to physical locations. This abstraction isolates programs and enables memory to exceed physical RAM (using swap/paging). But the fundamental model remains: a linear array of bytes.
Integers are the simplest numeric data type, but their storage reveals important principles about binary encoding.
Unsigned Integers:
An unsigned integer simply stores the binary representation of a non-negative number. For an n-bit unsigned integer:
Example: 8-bit unsigned integer storing 200
Decimal: 200
Binary: 11001000
Byte in memory: [11001000] = 0xC8
Signed Integers: Two's Complement
Signed integers need to represent negative numbers. The universal modern approach is two's complement:
Example: Storing -5 as an 8-bit signed integer
Start with 5: 00000101
Invert all bits: 11111010
Add 1: 11111011
-5 stored as: [11111011] = 0xFB
| Type | Bits | Unsigned Range | Signed Range |
|---|---|---|---|
| byte | 8 | 0 to 255 | -128 to 127 |
| short | 16 | 0 to 65,535 | -32,768 to 32,767 |
| int | 32 | 0 to 4,294,967,295 | -2,147,483,648 to 2,147,483,647 |
| long | 64 | 0 to 18.4 quintillion | -9.2 to +9.2 quintillion |
Why Two's Complement?
Two's complement has elegant properties:
This is why two's complement became universal — it simplifies hardware design dramatically.
When arithmetic exceeds the representable range, the value 'wraps around.' For a signed 8-bit integer: 127 + 1 = -128 (not 128!). This is because 01111111 + 1 = 10000000, which is the bit pattern for -128. This silent wraparound causes subtle bugs in countless programs.
Characters represent text, but at the bit level, they're just integers mapped to symbols through an encoding scheme.
ASCII: The Original Encoding
ASCII (American Standard Code for Information Interchange) uses 7 bits to represent 128 characters: letters, digits, punctuation, and control characters. The 8th bit (in a byte) was historically used for parity checking.
Selected ASCII Values:
'A' = 65 = 0x41 = 01000001
'Z' = 90 = 0x5A = 01011010
'a' = 97 = 0x61 = 01100001
'z' = 122 = 0x7A = 01111010
'0' = 48 = 0x30 = 00110000
'9' = 57 = 0x39 = 00111001
Space = 32 = 0x20 = 00100000
Newline = 10 = 0x0A = 00001010
Key ASCII Patterns:
Unicode: The Global Standard
ASCII only covers English. Unicode assigns a unique code point to every character in every language — plus symbols, emoji, and historical scripts. Over 140,000 characters are defined.
Unicode code points are written as U+XXXX (hex). Examples:
Unicode Encodings:
Unicode defines code points, but how those are stored in memory varies:
UTF-8 Encoding Rules:
U+0000 to U+007F: 1 byte (0xxxxxxx) — ASCII compatible
U+0080 to U+07FF: 2 bytes (110xxxxx 10xxxxxx)
U+0800 to U+FFFF: 3 bytes (1110xxxx 10xxxxxx 10xxxxxx)
U+10000 to U+10FFFF: 4 bytes (11110xxx 10xxxxxx 10xxxxxx 10xxxxxx)
UTF-8's genius is backward compatibility: all valid ASCII is valid UTF-8. English text uses 1 byte per character (same as ASCII), so existing ASCII systems worked unchanged. For other languages, 2-4 bytes are used. This balance of efficiency and compatibility made UTF-8 the dominant encoding online.
A boolean represents TRUE or FALSE — logically, just 1 bit of information. So how much memory does a boolean actually use?
The Surprising Answer: Usually 1 Byte (or More)
Most programming languages store a boolean in a full byte (8 bits), not a single bit. Some use 4 bytes (32 bits). Why the waste?
Reasons for Byte-Sized Booleans:
Boolean Representation:
Typically:
When Bit Packing Matters:
In specific scenarios, we do pack booleans into bits:
For these use cases, we sacrifice access simplicity for space efficiency — a classic tradeoff.
For a few booleans, wasting 7 bits per boolean is irrelevant. For millions of booleans (like a bit vector for a massive set), the difference between 1 bit and 1 byte is 8x memory usage — suddenly very significant. Choose representation based on your scale.
Floating-point numbers (like 3.14159 or -0.001) pose a challenge: how do you represent both very large and very small values with fractional parts?
The IEEE 754 Standard:
Virtually all modern systems use IEEE 754 floating-point representation. A floating-point number is stored as three components:
The value is: (-1)^sign × (1 + mantissa) × 2^(exponent - bias)
| Format | Total Bits | Sign | Exponent | Mantissa | Approx. Precision |
|---|---|---|---|---|---|
| Single (float) | 32 | 1 | 8 | 23 | ~7 decimal digits |
| Double | 64 | 1 | 11 | 52 | ~15 decimal digits |
| Half | 16 | 1 | 5 | 10 | ~3 decimal digits |
| Quad | 128 | 1 | 15 | 112 | ~34 decimal digits |
Example: Storing 3.14159 as a float
The exact bit pattern (in hex): 0x40490FD0
Sign: 0 (positive)
Exponent: 10000000 (128, meaning 2^(128-127) = 2^1)
Mantissa: 10010000111100001111001...
Value: +1 × 1.57079... × 2 ≈ 3.14159
Special Values:
IEEE 754 reserves certain bit patterns for special values:
Most decimal fractions CANNOT be exactly represented in binary. Just as 1/3 = 0.333... in decimal, 1/10 = 0.0001100110011... repeating in binary. This is why 0.1 + 0.2 ≠ 0.3 — it's not a bug, it's the fundamental limitation of binary representation.
When multiple values are stored in memory (arrays, structures, objects), their layout follows rules that optimize performance.
Memory Alignment:
Alignment means that a data item is stored at a memory address that's a multiple of its size:
Why Alignment Matters:
Struct Padding Example:
Consider this C structure:
struct Example {
char a; // 1 byte
int b; // 4 bytes
char c; // 1 byte
};
Naive expectation: 1 + 4 + 1 = 6 bytes Actual size: 12 bytes (on most systems)
Why? The compiler inserts padding for alignment:
Offset 0: a (char, 1 byte)
Offset 1-3: <padding> (3 bytes, so b starts at multiple of 4)
Offset 4-7: b (int, 4 bytes)
Offset 8: c (char, 1 byte)
Offset 9-11: <padding> (3 bytes, so struct size is multiple of 4)
Optimizing Layout:
Reordering fields can reduce padding:
struct BetterExample {
int b; // 4 bytes (offset 0)
char a; // 1 byte (offset 4)
char c; // 1 byte (offset 5)
// 2 bytes padding (to make total a multiple of 4)
}; // Total: 8 bytes
Order struct members from largest to smallest. This minimizes internal padding and produces the smallest overall size. The difference matters when you have millions of these structures!
Arrays are among the simplest data structures: they store elements contiguously — one right after another with no gaps.
Memory Layout of an Array:
An array of n elements of size s bytes occupies n × s contiguous bytes.
Example: int array[4] = {10, 20, 30, 40}
Assuming 4-byte integers and starting at address 0x1000:
Address Contents (hex) Value
0x1000 0A 00 00 00 10 (little-endian)
0x1004 14 00 00 00 20
0x1008 1E 00 00 00 30
0x100C 28 00 00 00 40
Index-to-Address Calculation:
To access array[i]:
address = base_address + (index × element_size)
For array[2]: 0x1000 + (2 × 4) = 0x1008
This calculation is O(1) — constant time regardless of array size. This is why array access is so fast.
When you access array[0], the CPU loads a cache line (typically 64 bytes) containing array[0] through array[15] (for 4-byte integers). Accessing array[1], [2], etc. are then 'free' — already in cache. This is why sequential array traversal is blazing fast compared to random access patterns.
A pointer is a variable that stores a memory address. Understanding pointer storage demystifies how data structures like linked lists, trees, and graphs work at the lowest level.
Pointer Size:
Example: A Pointer Variable
int value = 42; // Stored at address 0x1000
int *ptr = &value; // ptr stores 0x1000
In memory (64-bit system):
Address Contents Interpretation
0x1000 2A 00 00 00 value = 42 (4 bytes, integer)
0x1008 00 10 00 00 ptr = 0x1000 (8 bytes, pointer)
00 00 00 00 (little-endian, so low bytes first)
Dereferencing a Pointer:
When you write *ptr, the CPU:
This is why pointer dereferencing adds a memory access — an extra step compared to direct variable access.
Null Pointers:
A null pointer is a special value (typically 0) indicating "points to nothing." Accessing memory at address 0 is typically protected by the OS, causing a crash ("null pointer exception," "segmentation fault").
null pointer in memory: 00 00 00 00 00 00 00 00
Multiple Indirection:
Pointers can point to other pointers:
int value = 42;
int *ptr = &value;
int **ptrptr = &ptr; // Pointer to pointer
Each level adds 8 bytes of storage (on 64-bit) and one extra memory access to dereference.
Pointers are why data structures like linked lists can have non-contiguous elements. Instead of storing data next to each other, we store pointers that tell us where the next element lives. This trades cache locality for flexibility in insertion/deletion.
Everything we've learned about binary representation and memory storage directly informs how data structures work. Let's preview these connections:
Arrays and Contiguity:
Linked Lists and Pointers:
Hash Tables and Integer Math:
This knowledge of bits, bytes, and memory storage is the foundation upon which all data structures and algorithms rest. When you study arrays, trees, or hash tables in future chapters, remember: everything reduces to bit patterns stored in linear memory. There is no magic — only well-organized bytes.
We've completed our journey through binary representation — from the fundamental reasons for binary, through bits and bytes, number systems, and now physical data storage. Let's consolidate the key insights:
Module Complete:
This module on Binary Representation & Number Systems provides the essential low-level foundation for everything in computer science. You now understand:
With this foundation, you're prepared to understand any data structure at its deepest level — not as abstract concepts, but as specific arrangements of bits in physical memory.
Congratulations! You've mastered the essential knowledge of binary representation and number systems. This deep understanding of how data is physically stored will inform every data structure and algorithm you study. The 'magic' of computing is no longer magic — it's organized patterns of ones and zeros in physical memory.