Loading learning content...
Imagine you're designing a system to represent non-integer numbers. You have a fixed budget of 32 bits—no more, no less. How do you allocate those bits to represent values like 3.14159, 0.0001, or 1,234,567.89?
This seemingly simple question has two fundamentally different answers, each with profound implications for what you can represent, how precisely you can represent it, and which applications your system serves well.
Fixed-point representation says: "Let's agree that the decimal point always sits in the same position. Some bits represent the integer part, other bits represent the fractional part, and this never changes."
Floating-point representation says: "Let's allow the decimal point to move. We'll use some bits to describe where the point sits, and the remaining bits to describe the significant digits."
This page explores these two philosophies in depth, developing the intuition that will serve you throughout your career as an engineer.
By the end of this page, you will understand the core difference between fixed-point and floating-point representation, be able to identify when each approach is appropriate, develop visual intuition for how the 'floating' radix point works, and recognize the engineering tradeoffs that led to floating-point dominance in general-purpose computing.
Fixed-point representation is the simpler and more intuitive approach. The core idea is straightforward: decide in advance where the radix point (decimal point) sits, and keep it there forever.
A Concrete Example: Q16.16 Format
Consider a 32-bit fixed-point format called "Q16.16":
With 16 bits for the integer part, we can represent integers from 0 to 65,535 (unsigned) or −32,768 to 32,767 (signed).
With 16 bits for the fractional part, our smallest representable fraction is 1/65,536 ≈ 0.0000153—about 4.5 decimal places of precision.
How Values Are Encoded
In Q16.16, to store the value 3.5:
Notice the key insight: fixed-point numbers are stored as integers, with an implicit scale factor. To convert a fixed-point value back to a real number, you divide by the scale factor (65,536 for Q16.16).
| Real Value | Integer Part | Fractional Part | Stored Integer |
|---|---|---|---|
| 0.0 | 0 | 0 | 0 |
| 1.0 | 1 | 0 | 65,536 |
| 3.5 | 3 | 32,768 | 229,376 |
| −1.25 | −2 | 49,152 | −81,920 |
| 255.99998 | 255 | 65,535 | 16,777,215 |
| 0.00001526 | 0 | 1 | 1 |
Arithmetic in Fixed-Point
One of fixed-point's greatest strengths is that arithmetic uses ordinary integer operations:
Addition/Subtraction: Directly add or subtract the stored integers. If A = 3.5 (stored as 229,376) and B = 1.25 (stored as 81,920), then A + B = 229,376 + 81,920 = 311,296, which represents 4.75.
Multiplication: Multiply the integers, then shift right by the fractional bits. If A × B = 229,376 × 81,920 = 18,790,481,920, then shift right by 16 bits: 18,790,481,920 >> 16 = 286,720, which represents 4.375 (= 3.5 × 1.25).
This integer-based arithmetic is why fixed-point thrived in early computing and remains popular in resource-constrained environments.
Fixed-point is often described as 'integers with an implicit decimal point.' This is accurate but incomplete. The more useful mental model is: 'integers with a hidden multiplication by 1/scale.' Every fixed-point value is conceptually multiplied by 2^(−fractional_bits) to yield its real value.
Fixed-point representation works beautifully when your values occupy a narrow, predictable range. But the real world often presents values spanning many orders of magnitude—and here, fixed-point struggles.
The Dynamic Range Problem
Consider our Q16.16 format again:
This might seem adequate until you consider actual scientific and engineering requirements:
The Format Proliferation Problem
Faced with varied requirements, fixed-point users must choose between:
Use different formats for different applications — Q8.24 for audio (more fractional precision), Q24.8 for graphics (more integer range), Q1.31 for normalized values, etc.
Accept precision loss — Use a compromise format that's suboptimal for all use cases.
Use wider representations — 64-bit or 128-bit fixed-point, sacrificing memory and performance.
None of these is satisfying. Different formats mean conversion overhead and complexity. Compromise means mediocrity. Wider representations mean cost.
The Precision Distribution Problem
Fixed-point allocates precision uniformly across its range. Near 65,535, the gap between adjacent values is 0.0000153—the smallest fractional step. Near 0.001, that same gap still applies.
But consider: do we really need 0.0000153 precision when representing 50,000? That's precision at the 0.00003% level—overkill for most applications.
Conversely, when representing 0.0001, a step of 0.0000153 is 15% of the value—far too coarse for many purposes.
This mismatch reveals fixed-point's fundamental limitation: it distributes precision uniformly in absolute terms, but real-world requirements are often relative to magnitude.
In fixed-point, once you choose the format, your range-precision tradeoff is fixed forever. Need more fractional precision? You lose integer range. Need larger integers? You sacrifice fractional bits. Floating-point breaks this constraint by making the tradeoff dynamic.
Floating-point representation solves fixed-point's limitations through a revolutionary idea: let the radix point move.
Instead of committing bits permanently to integer vs. fractional parts, floating-point stores:
The exponent effectively says, "Take these significant digits and shift them left or right by this many positions." This "floating" of the radix point gives the representation its name.
How Floating-Point Adapts to Magnitude
Consider representing these values in a hypothetical 32-bit format:
| Value | Significand (approx) | Exponent | Precision Available |
|---|---|---|---|
| 0.000001234 | 1.234 | −6 (×10⁻⁶) | 7 significant digits |
| 1.234 | 1.234 | 0 (×10⁰) | 7 significant digits |
| 1234.0 | 1.234 | +3 (×10³) | 7 significant digits |
| 1234000.0 | 1.234 | +6 (×10⁶) | 7 significant digits |
| 1.234 × 10²⁰ | 1.234 | +20 (×10²⁰) | 7 significant digits |
The Key Insight: Constant Relative Precision
Notice that regardless of magnitude, we maintain ~7 significant digits of precision. The precision is relative to the value's magnitude, not absolute.
For 0.000001234:
For 1,234,000:
This relative precision model matches how physical measurements and scientific quantities actually work. When measuring distances, ±1 meter error is acceptable for kilometer scales but catastrophic for millimeter scales. Relative precision naturally captures this.
Visualizing the Floating Radix Point
Imagine a sliding window that reveals 24 binary digits (the significand in IEEE 754 single precision). The exponent controls where this window sits on the number line:
This sliding window metaphor explains both the power and the limitations of floating-point. You always get the same precision in significant digits, but absolute precision varies with magnitude.
Floating-point can be understood as representing numbers on a logarithmic scale. The exponent captures the order of magnitude (like the logarithm), while the significand provides precision within that order of magnitude. This logarithmic nature is why floating-point spans such enormous ranges efficiently.
The difference between fixed-point and floating-point becomes strikingly clear when we visualize how they distribute representable values across the number line.
Fixed-Point: Uniform Distribution
In fixed-point representation, representable values are uniformly spaced along the number line. Using our Q16.16 example:
| | | | | | | | | | |
0 0.001 0.002 0.003 0.004 ... 65535.996 65535.997 65535.998
Every adjacent pair of values is separated by exactly 1/65,536 ≈ 0.0000153. This uniform spacing means:
Floating-Point: Logarithmic Distribution
In floating-point representation, representable values become denser near zero and sparser as magnitude increases:
||||||||||||| | | | | | | |
0 0.001 0.01 0.1 1 10 100 1000
(dense) (moderate) (sparse)
Between 1 and 2, there might be ~8 million representable values. Between 1,000,000 and 2,000,000, there are still ~8 million representable values—but now spread over a million units instead of one.
The Density Around Zero
A fascinating consequence of floating-point's design: approximately half of all representable floating-point numbers lie between −1 and +1. The other half covers the rest of the range—from ±1 to ±10³⁰⁸.
This makes intuitive sense: values near zero need the finest gradations (to distinguish 0.0001 from 0.0002 is as important as distinguishing 1,000 from 2,000 in relative terms). By concentrating representable values near zero, floating-point provides this fine gradation where it matters most.
For IEEE 754 double precision, the gap between adjacent representable values near any number x is approximately x × 2⁻⁵² ≈ x × 2.2 × 10⁻¹⁶. This is the 'machine epsilon' relative to x. Near 1.0, the gap is about 2 × 10⁻¹⁶. Near 10¹⁵, the gap is about 0.2—still only 2 × 10⁻¹⁶ relative to the value.
Despite floating-point's dominance in general-purpose computing, fixed-point remains the right choice for several important domains. Understanding when to use each representation is a mark of engineering maturity.
Financial Computing
Money is universally counted in fixed units. A dollar has exactly 100 cents. An Indian rupee has exactly 100 paise. Financial regulations often mandate exact, reproducible arithmetic—no rounding surprises allowed.
Fixed-point (often implemented as integers representing the smallest unit) is standard in financial systems:
Floating-point's rounding errors are unacceptable when auditors expect accounts to balance to the penny.
Embedded Systems and DSP
Many microcontrollers and digital signal processors lack floating-point hardware. Operations that take 1 cycle with floating-point hardware might take 100+ cycles in software emulation.
Fixed-point enables:
The Key Question: Is Your Range Predictable?
The fundamental criterion for choosing fixed-point is whether your values occupy a predictable, bounded range with uniform precision requirements.
If you can predict your range and precision needs at design time, fixed-point offers simplicity and exactness. If requirements are variable or unpredictable, floating-point's flexibility is essential.
Many languages provide decimal types (Python's decimal.Decimal, Java's BigDecimal, C#'s decimal) that offer exact decimal arithmetic for financial applications. These are essentially variable-precision fixed-point implementations, combining the exactness of fixed-point with flexible precision.
Floating-point is the default choice for general-purpose computing, and for good reason. Its flexibility makes it suitable for an enormous range of applications that would be impractical with fixed-point.
Scientific and Engineering Computing
Science operates across scales that fixed-point cannot practically represent:
Floating-point handles all these scales with consistent relative precision, using the same 64-bit representation.
Machine Learning and Statistics
ML computation involves:
No single fixed-point format could reasonably handle this range.
Hardware Support Changes the Calculus
Modern CPUs execute floating-point operations in a single clock cycle. GPUs perform trillions of floating-point operations per second. This hardware support makes floating-point effectively "free" from a performance standpoint for most applications.
The performance advantage that once favored fixed-point has largely evaporated on general-purpose hardware. Only in resource-constrained embedded systems or specialized hardware (FPGAs, custom ASICs) does fixed-point offer significant performance benefits.
The Default Choice
When in doubt, use floating-point. It's:
Only reach for fixed-point when you have specific requirements (exact decimal arithmetic, no floating-point hardware, deterministic cross-platform behavior) that floating-point cannot satisfy.
Approximately 90%+ of non-integer numerical computing uses floating-point. The remaining uses are dominated by financial systems, embedded real-time applications, and specialized domains. For a typical software engineer, floating-point is the representation you'll encounter daily.
As computing has evolved, the binary choice between fixed-point and floating-point has been supplemented by specialized formats that combine features of both or optimize for specific workloads.
Posit Numbers
A recent alternative called "posit" (or "Type III unum") aims to provide better precision-per-bit than IEEE 754 floating-point. Posits use a variable-length exponent encoding that provides:
Posits remain experimental but show promise for specialized applications.
Block Floating-Point
Used in some DSP and machine learning applications, block floating-point shares a single exponent across a group of values, with each value storing only a mantissa:
Logarithmic Number Systems
These systems store the logarithm of the value directly:
| Representation | Strengths | Weaknesses | Best For |
|---|---|---|---|
| Fixed-Point | Exact, predictable, integer-like | Limited range, inflexible | Finance, embedded, DSP |
| Floating-Point | Huge range, relative precision | Rounding errors, complex | Science, graphics, ML |
| Posit | Better precision/bit | Not standardized, new | Research, specialized HW |
| Block Float | Compact, shared exponent | Group-level precision only | ML inference, compression |
| Logarithmic | Fast multiplication | Slow addition | Multiplication-heavy DSP |
Machine Learning Specialized Formats
The AI revolution has driven development of formats optimized for neural networks:
These formats sacrifice precision for throughput, recognizing that neural networks are typically robust to quantization noise. The reduced precision enables more operations per second on specialized hardware.
The proliferation of numerical formats reflects a mature understanding that different applications have different needs. A graphics shader, a financial ledger, and a climate model all process numbers—but their requirements for range, precision, speed, and exactness differ dramatically. Skilled engineers choose (or design) representations matched to their specific requirements.
We've developed a deep intuition for the fundamental difference between fixed-point and floating-point representation—a distinction that underlies all numerical computing.
What's Next
Now that we understand why floating-point exists and how it differs from fixed-point alternatives, we're ready to examine the actual standard that governs its implementation: IEEE 754.
The next page provides a high-level overview of IEEE 754—how bits are organized, what the special values mean, and how rounding works—without diving into bit-level arithmetic. This conceptual understanding will prepare you to reason about floating-point behavior in your own code.
You now have deep intuition for the fixed-point versus floating-point tradeoff. Fixed-point offers simplicity and exactness within bounded ranges; floating-point offers flexibility and relative precision across enormous ranges. Understanding when to use each is a fundamental engineering skill. Next, we'll explore IEEE 754—the standard that made floating-point universal.