Loading content...
Having established what primitive data structures are and their defining characteristics, we now examine the specific types that form the foundation of data representation in virtually every programming language.
Across the incredible diversity of programming languages—from low-level assembly to high-level scripting languages—four categories of primitive data types appear almost universally:
These four types are not arbitrary choices. They reflect fundamental categories of information that computation must handle: discrete counts, continuous measurements, symbolic text, and logical conditions. Understanding each type deeply—not just how to use them, but how they work—is essential for writing correct, efficient code.
By the end of this page, you will understand the purpose, representation, and key considerations for each of the four primary primitive types. You will learn why integers have overflow, why floating-point numbers have precision limits, how characters map to numbers, and why booleans are conceptually simple but implementation-complex.
Integers are the most fundamental numeric primitive type. They represent whole numbers—values without fractional parts—and serve as the workhorse of computation.
What Integers Represent:
Integers model discrete quantities—things that come in whole units:
Whenever you're dealing with countable, discrete values—things that don't have meaningful fractional parts—integers are your primitive of choice.
Why Not Use Floating-Point for Everything?
You might wonder: why have integers when floating-point can represent whole numbers too? Three critical reasons:
Exactness: Integers represent whole numbers exactly. The integer 100 is precisely 100. In floating-point, large numbers may lose precision.
Performance: Integer arithmetic is faster. Addition, subtraction, and comparison are single CPU instructions with no rounding considerations.
Semantics: Integers communicate intent. When you use an integer for an array index, you're saying "this is a discrete position"—not a measurement that might be 3.5.
| Type Name | Size (bits) | Range (Signed) | Range (Unsigned) |
|---|---|---|---|
| int8 / byte | 8 | -128 to 127 | 0 to 255 |
| int16 / short | 16 | -32,768 to 32,767 | 0 to 65,535 |
| int32 / int | 32 | -2.1B to 2.1B | 0 to 4.3B |
| int64 / long | 64 | ±9.2 × 10¹⁸ | 0 to 1.8 × 10¹⁹ |
Signed vs. Unsigned:
Integers come in two flavors:
Signed integers can represent negative values. The most common representation is two's complement, where the high-order bit indicates sign. A 32-bit signed int ranges from -2,147,483,648 to 2,147,483,647.
Unsigned integers represent only non-negative values. Without needing a sign bit, all bits contribute to magnitude. A 32-bit unsigned int ranges from 0 to 4,294,967,295.
Choosing Between Them:
The Critical Issue: Overflow
Integers have fixed sizes, which means they have fixed ranges. Attempting to represent values outside that range causes overflow:
int32 MAX_INT = 2,147,483,647
int32 result = MAX_INT + 1
// Expected: 2,147,483,648
// Actual: -2,147,483,648 (wrapped around!)
This is not a bug—it's defined behavior in most languages. The bits simply wrap around like an odometer. This has caused real-world disasters:
Most languages do not raise errors on integer overflow—the result simply wraps around. This silent failure makes overflow bugs particularly dangerous. Always consider: can my values exceed the integer range? If so, use larger integer types or add explicit bounds checking.
Integers in DSA Context:
Integers are central to data structures and algorithms:
When analyzing algorithm complexity, we typically assume integer operations are O(1)—a single CPU instruction. This assumption holds for fixed-size integers but breaks for arbitrary-precision integers (Python's int can grow without bound, but large-number operations become O(n) in the number of digits).
While integers handle discrete values perfectly, the real world includes measurements: temperatures, distances, velocities, probabilities. These continuous quantities are represented by floating-point numbers—values with fractional parts and the ability to represent very large or very small numbers.
What Floating-Point Represents:
Floating-point numbers model measurements and continuous quantities:
The IEEE 754 Standard:
Nearly all modern computers use the IEEE 754 standard for floating-point representation. The key insight is scientific notation in binary:
value = sign × mantissa × 2^exponent
Example: the number 5.75
5.75 = 1.4375 × 2²
= +1 × 1.01110... (binary mantissa) × 2^(10) (binary exponent)
The floating-point number is divided into three parts:
| Type | Total Bits | Exponent | Mantissa | Decimal Precision |
|---|---|---|---|---|
| float32 (single) | 32 | 8 bits | 23 bits | ~7 digits |
| float64 (double) | 64 | 11 bits | 52 bits | ~15-16 digits |
| float16 (half) | 16 | 5 bits | 10 bits | ~3 digits |
| float128 (quad) | 128 | 15 bits | 112 bits | ~34 digits |
The Critical Issue: Precision Errors
Unlike integers, floating-point numbers are approximations. Most decimal fractions cannot be exactly represented in binary floating-point. This leads to famous surprises:
0.1 + 0.2 = 0.30000000000000004 // Not exactly 0.3!
Why does this happen?
0.1 in decimal is a repeating fraction in binary: 0.1₁₀ = 0.0001100110011001100... (repeating forever)
Since the mantissa has finite bits (52 for double), it must truncate. The representation is close to 0.1, but not exactly 0.1. When you add two such approximations, errors accumulate.
Practical Implications:
Never compare floats with ==:
if (x == 0.3) // DANGEROUS
if (Math.abs(x - 0.3) < 0.0001) // BETTER
Accumulation errors grow: Summing millions of small floats can produce significant drift from the true total.
Order matters: (a + b) + c may not equal a + (b + c) due to rounding at each step.
Don't use floats for money: Use integers (cents), fixed-point arithmetic, or decimal types.
If someone claims their language doesn't have floating-point issues, ask them to print 0.1 + 0.2. Nearly every language shows something other than 0.3 (Python: 0.30000000000000004, JavaScript: same). This isn't a bug—it's fundamental to binary floating-point. Decimal types or arbitrary-precision libraries solve this when exactness matters.
Special Values:
IEEE 754 defines special floating-point values:
+Inf, -Inf): Result of overflow or division by zero+0, -0): Yes, there are two zeros (mostly equivalent)These special values propagate: NaN + anything = NaN, Inf - Inf = NaN. This allows computations to continue without exceptions, but checking for NaN is important.
Floating-Point in DSA Context:
Characters are the primitive representation of individual textual symbols. While humans think of characters as letters, digits, and punctuation, computers represent them as numeric codes—integer values that map to specific symbols according to encoding standards.
What Characters Represent:
A character represents a single symbolic unit:
The Key Insight: Characters Are Numbers
Under the hood, a character is just an integer with a specific interpretation:
'A' = 65 (in ASCII/UTF-8)
'a' = 97
'0' = 48 (not zero!)
' ' = 32
This numeric nature enables operations:
'A' + 1 = 'B' (move to next letter)'7' - '0' = 7 (convert digit character to integer)'a' < 'z' (alphabetical comparison is numeric comparison)| Encoding | Size | Coverage | Key Points |
|---|---|---|---|
| ASCII | 7 bits (1 byte) | 128 characters | English letters, digits, punctuation only |
| Extended ASCII | 8 bits (1 byte) | 256 characters | Various regional extensions |
| UTF-8 | 1-4 bytes | All Unicode (~150K chars) | Variable width, ASCII-compatible, dominant on web |
| UTF-16 | 2-4 bytes | All Unicode | Fixed 2 bytes for common chars, used by Java/Windows |
| UTF-32 | 4 bytes | All Unicode | Fixed width, simple but space-inefficient |
Unicode: The Universal Standard
Historically, character encoding was chaotic—different encodings for different languages, incompatible representations. Unicode solved this by assigning a unique code point to every character in every writing system:
U+0041 = 'A' (Latin capital A)
U+03B1 = 'α' (Greek small alpha)
U+4E2D = '中' (Chinese: middle)
U+1F600 = '😀' (grinning face emoji)
Unicode defines what the code points are. Encodings (UTF-8, UTF-16) define how those code points are stored as bytes.
UTF-8: The Dominant Encoding
UTF-8 is the most common encoding on the web and in modern systems:
Critical Issue: Variable Width
In a UTF-8 string, characters have different sizes:
This means:
What a human perceives as one character may be multiple code points. The flag 🇺🇸 is two code points (regional indicators U and S). The emoji 👨👩👧👦 (family) is 7 code points! This is why 'character count' is surprisingly complex in modern text processing.
Character Primitives in Different Languages:
char is 1 byte. For Unicode, use wchar_t or UTF-8 string libraries.char is 2 bytes (UTF-16 code unit). Characters outside the BMP (like emoji) require surrogate pairs.char is a 4-byte Unicode scalar value (any code point).rune is an alias for int32, representing a Unicode code point.Characters in DSA Context:
For most algorithmic problems, we can treat characters as small integers (0-127 for ASCII problems), which simplifies analysis. But production code must handle the full complexity of Unicode.
Booleans are the simplest primitive type conceptually: they can hold exactly two values, true or false. Yet their role in computation is enormous—booleans are the foundation of all control flow, conditional execution, and logical reasoning.
What Booleans Represent:
Booleans represent logical states and conditions:
isEmpty, hasError, existsx > y, name == "Alice"isActive, isDone, shouldRetrycanEdit, isAdminconnected, authenticatedAny question that has a yes/no answer is naturally represented as a boolean.
The Boolean Origin:
Named after George Boole (1815-1864), who formalized algebraic logic. Boolean algebra—the mathematics of true/false values—forms the theoretical foundation of digital computing. Every digital circuit operates on boolean signals (high voltage = 1 = true, low voltage = 0 = false).
Boolean Operations:
Booleans support three fundamental logical operations:
AND (&&): True only if both operands are true
true AND true = true
true AND false = false
false AND anything = false
OR (||): True if at least one operand is true
true OR anything = true
false OR true = true
false OR false = false
NOT (!): Inverts the value
NOT true = false
NOT false = true
The Representation Paradox:
A boolean requires only 1 bit of information (two states = one bit). Yet in most systems, booleans occupy a full byte (8 bits) or even more:
bool is typically 1 byteboolean is 1 byte (or 4 bytes in arrays, JVM-dependent)bool is a full object with reference overheadWhy not use single bits?
Memory Addressing: Most architectures address memory by bytes, not bits. Accessing a single bit requires fetching a byte and masking.
Alignment: CPU operations are fastest on aligned data. Single bits don't align to natural boundaries.
Atomicity: Languages need atomic read/write for thread safety. Bit-level operations on shared memory are complex.
Simplicity: Using a byte simplifies compiler and runtime implementation.
When boolean packing is critical (huge arrays of flags), languages offer bit vectors or bitsets—data structures that pack 8 booleans per byte. But the primitive type itself uses a full byte.
Truthy and Falsy:
Many languages extend boolean logic to non-boolean types:
// JavaScript
if (0) // falsy (0 is false)
if ("") // falsy (empty string is false)
if (null) // falsy
if (undefined) // falsy
if ([]) // truthy in JS (careful!)
// Python
if 0: // falsy
if []: // falsy (empty list)
if {}: // falsy (empty dict)
This convenience can cause bugs. Explicit boolean comparisons are safer when clarity matters.
Most languages use short-circuit evaluation for boolean expressions: a && b doesn't evaluate b if a is false; a || b doesn't evaluate b if a is true. This is not just optimization—it enables idioms like if (obj != null && obj.isValid()) where the second check would crash if obj were null.
Booleans in DSA Context:
visited[i] tracks whether node i has been processed (O(1) lookup)found, done, swapped control algorithm behaviorisEven(n), isEmpty(list) return boolean resultsa < b produces boolean, determining sort orderExample: Using Booleans Efficiently
// Checking if a number is prime
bool isPrime(int n) {
if (n < 2) return false; // Boolean result
for (int i = 2; i * i <= n; i++) {
if (n % i == 0) return false; // Early exit
}
return true;
}
// Using boolean result
bool primes[1000]; // 1000 bytes (or 125 bytes if packed)
for (int i = 0; i < 1000; i++) {
primes[i] = isPrime(i); // Store boolean result
}
Booleans enable clean, readable code. The isPrime function's return type tells you exactly what kind of answer to expect—not an integer code, not a string, but a true/false answer.
With all four primitive types examined, let's consolidate by comparing their characteristics, trade-offs, and appropriate use cases.
| Aspect | Integer | Floating-Point | Character | Boolean |
|---|---|---|---|---|
| Purpose | Discrete counts, indices | Continuous measurements | Textual symbols | Logical conditions |
| Typical Size | 4-8 bytes | 4-8 bytes | 1-4 bytes | 1 byte |
| Exactness | Exact within range | Approximate | Exact (for given encoding) | Exact |
| Primary Risk | Overflow | Precision loss | Encoding issues | Type coercion |
| Comparison | Exact equality safe | Use epsilon tolerance | Depends on encoding/locale | Exact equality safe |
| Zero Value | 0 | 0.0 (and -0.0) | '\0' (null character) | false |
| Special Values | None standard | NaN, ±Infinity | Invalid code points | None |
Choosing the Right Primitive:
Use Integers When:
Use Floating-Point When:
Use Characters When:
Use Booleans When:
Avoid These Mistakes:
Selecting the right primitive is about matching the data's nature to the type's semantics. Ask: Is this discrete or continuous? Countable or measurable? Textual or numeric? Binary or multi-valued? The answer points to the appropriate primitive.
Each primitive type has edge cases that trip up even experienced programmers. Knowing these in advance prevents painful debugging sessions.
Integer Gotchas:
(a + b) / 2 can overflow even if the result fits. Use a + (b - a) / 2 instead.-1 > 1u is true in C because -1 becomes a large unsigned value.-7 / 2 = -3, not -4. This affects algorithms assuming floor division.Floating-Point Gotchas:
x == 0.3 fails when x = 0.1 + 0.2. Always use epsilon tolerance for equality.1e16 + 1 == 1e16 is true in double precision.NaN != NaN is true. Any operation with NaN produces NaN. Check explicitly with isNaN().-0.0 == 0.0 is true, but 1 / -0.0 = -Infinity while 1 / 0.0 = +Infinity.Character Gotchas:
strlen("你好") returns 6 (bytes), not 2 (characters).'9' - '0' gives 9, but '9' - 0 gives 57 (the ASCII code of '9').Boolean Gotchas:
[] == false but if ([]) is truthy. Avoid implicit coercion.if (x = true) is assignment, always true. Use if (x == true) or just if (x).a || b(), b() is only called if a is false. Don't put required side effects in short-circuited expressions.The best defense is awareness plus testing. Write unit tests for edge cases: MIN_INT, MAX_INT, NaN, empty strings, boundary values. Static analysis tools catch many of these issues at compile time. Code review catches what tools miss.
Understanding primitives isn't just academic—it directly impacts how real-world systems are designed and how problems are solved in production.
Case Study 1: Database Column Types
Database systems carefully choose primitive storage:
CREATE TABLE users (
id INT PRIMARY KEY, -- Integer: unique identifier
name VARCHAR(100), -- Characters: variable-length text
balance DECIMAL(10, 2), -- Fixed-point: exact monetary values
temperature FLOAT, -- Floating-point: sensor data (approximate ok)
is_active BOOLEAN, -- Boolean: status flag
created_at TIMESTAMP -- Integer internally: seconds since epoch
);
Wrong type choices cause problems:
balance causes rounding errors in financial recordstemperature loses precisionis_active wastes space and requires string comparisonCase Study 2: Network Protocol Design
Network protocols define exact primitive representations:
IP Header (IPv4):
- Version: 4 bits
- Header Length: 4 bits
- Total Length: 16-bit unsigned integer
- TTL: 8-bit unsigned integer
- Source IP: 32-bit unsigned integer
- Checksum: 16-bit unsigned integer
Every bit is specified because interoperability requires exact agreement on primitive representations.
Case Study 3: Performance-Critical Code
High-performance systems make deliberate primitive choices:
// Video game physics engine
typedef float real_t; // Use float32 for performance (vs double64)
struct Vector3 {
real_t x, y, z; // 12 bytes, fits in SIMD registers
};
// Process thousands of vectors per frame
for (int i = 0; i < count; i++) {
positions[i].x += velocities[i].x * dt;
// ... SIMD-vectorizable because primitives have fixed size
}
Using float instead of double halves memory bandwidth and enables better SIMD optimization—crucial for 60fps performance.
Case Study 4: Cryptography
Cryptographic code is extremely sensitive to primitive behavior:
// Constant-time comparison to prevent timing attacks
bool constant_time_compare(const byte* a, const byte* b, size_t len) {
byte result = 0;
for (size_t i = 0; i < len; i++) {
result |= a[i] ^ b[i]; // XOR bytes, OR accumulates any difference
}
return result == 0; // True only if all bytes matched
}
This code uses byte (unsigned 8-bit integer) primitives carefully:
Case Study 5: Memory-Constrained Systems
Embedded systems pack primitives tightly:
// Bit-packed sensor reading (4 bytes instead of 12)
typedef struct {
unsigned temp : 10; // 0-1023, enough for -50 to 150°C at 0.2° resolution
unsigned humidity : 7; // 0-100%
unsigned battery : 5; // 0-31, enough for battery level
unsigned reserved : 10;
} SensorReading;
Understanding that primitives are ultimately bit patterns enables space-efficient designs for IoT and embedded applications.
From databases to networks, from games to cryptography, from embedded systems to cloud services—primitive type decisions affect correctness, performance, and security. This isn't low-level esoterica; it's foundational engineering knowledge.
We have examined each of the four primary primitive types in depth. Let's consolidate the essential knowledge:
What's Next:
Now that we've surveyed the specific primitive types, we'll examine their fundamental limitations. The next page explores why primitives alone are insufficient for complex problems—setting the stage for understanding why non-primitive data structures exist and what gaps they fill.
You now have a comprehensive understanding of integers, floating-point numbers, characters, and booleans—their purposes, representations, and critical considerations. Next, we'll explore the limitations that drive the need for more complex data structures.