Loading learning content...
You've learned that static polymorphism resolves at compile time with zero runtime overhead, while dynamic polymorphism incurs vtable lookups and blocks inlining. But when does this actually matter? How much overhead are we talking about—nanoseconds or milliseconds? When should you sacrifice design elegance for speed?
These questions don't have universal answers. The impact of polymorphism overhead depends on your specific context: how hot the code path is, how small the methods are, what the alternative would be, and what your performance requirements are.
This page equips you to make informed decisions by understanding the actual costs, measuring them in your context, and applying strategies to optimize without abandoning good design.
By the end of this page, you will understand the quantitative costs of polymorphism, how to measure dispatch overhead in your systems, when polymorphism costs matter (and when they don't), and specific strategies for optimizing hot polymorphic code paths while maintaining design quality.
Let's put concrete numbers on the costs we've discussed. These figures are approximate and vary by CPU architecture, cache state, and compiler, but they provide a framework for reasoning.
Component costs of a virtual method call:
| Operation | Cost (CPU cycles) | Notes |
|---|---|---|
| Direct function call | 1-2 cycles | Baseline: call instruction with known target |
| Load vptr from object | 0-4 cycles (L1 hit) / 12+ cycles (L2) / 200+ (RAM) | Depends on cache state |
| Load function pointer from vtable | 0-4 cycles (L1 hit) / 12+ cycles (L2) | Often in cache if class frequently used |
| Indirect call (branch) | 1-5 cycles | Plus potential branch misprediction penalty |
| Branch misprediction | 10-20 cycles | If CPU predicted wrong target |
| Lost inlining opportunity | Varies greatly | Prevents further optimizations |
Best case scenario (everything cached, predicted):
Worst case scenario (cold cache, misprediction):
Real-world typical case:
The 2-3x overhead for vtable lookup understates the impact. The real cost is often the lost optimization opportunities. An inlined method enables constant propagation, dead code elimination, loop unrolling, and many other optimizations. A virtual call is an optimization barrier that prevents these cascading improvements.
Memory overhead:
| Item | Size (64-bit) | When Incurred |
|---|---|---|
| vptr per object | 8 bytes | Every polymorphic object |
| vtable per class | 8 bytes × virtual methods | Once per class |
| Type information (RTTI) | Varies (~20-100 bytes) | Once per class (if used) |
For objects with many instances (millions of particles, graph nodes, etc.), the 8-byte vptr overhead multiplies significantly. For typical business objects with few instances, it's negligible.
Not all code is equal. The impact of polymorphism overhead ranges from "completely irrelevant" to "critical bottleneck" depending on the context.
The 90/10 rule applies:
In most applications, 90% of execution time is spent in 10% of the code (often less). Optimizing that 10% matters enormously; optimizing the other 90% yields negligible benefit. Polymorphism overhead only matters if it's in that critical 10%.
A simple heuristic:
If the method being called does more than ~100 CPU operations of actual work, the dispatch overhead is noise. If the method is trivial (returns a field, does simple arithmetic) and is called millions of times, the overhead may dominate.
Never guess about performance. Profile your actual application with real workloads. If polymorphic dispatch appears in your profiler's hot spots, optimize it. If it doesn't appear, leave it alone. Premature optimization destroys code quality for zero benefit.
To make informed decisions, you need to measure polymorphism overhead in your specific context. Here's how to benchmark dispatch costs accurately.
Microbenchmarking pitfalls:
Measuring individual method calls is tricky because:
Proper benchmarking approach:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
// Using JMH (Java Microbenchmark Harness) for accurate measurements// JMH handles warmup, fork isolation, and statistical analysis import org.openjdk.jmh.annotations.*;import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.AverageTime)@OutputTimeUnit(TimeUnit.NANOSECONDS)@State(Scope.Thread)@Fork(2) // Run in fresh JVMs@Warmup(iterations = 5, time = 1)@Measurement(iterations = 10, time = 1)public class DispatchBenchmark { // Interface for polymorphic dispatch interface Operation { int execute(int value); } // Concrete implementation static class Doubler implements Operation { @Override public int execute(int value) { return value * 2; } } // Final class - helps devirtualization static final class FinalDoubler implements Operation { @Override public int execute(int value) { return value * 2; } } // Direct method (baseline) static int directDouble(int value) { return value * 2; } private Operation virtualOp; private Operation finalOp; private int input; @Setup public void setup() { virtualOp = new Doubler(); finalOp = new FinalDoubler(); input = 42; } @Benchmark public int directCall() { return directDouble(input); // Direct static call } @Benchmark public int virtualCall() { return virtualOp.execute(input); // Virtual dispatch } @Benchmark public int finalVirtualCall() { return finalOp.execute(input); // Final class, may devirtualize } // Measure polymorphic site (alternating types) private Operation[] mixedOps; private int index; @Setup(Level.Invocation) public void setupMixed() { mixedOps = new Operation[] { new Doubler(), new FinalDoubler(), new Doubler() }; index = (index + 1) % mixedOps.length; } @Benchmark public int polymorphicCall() { return mixedOps[index].execute(input); // True polymorphic dispatch }} // Expected results (typical modern JVM):// directCall: ~1-2 ns (baseline)// finalVirtualCall: ~1-3 ns (likely devirtualized)// virtualCall: ~2-5 ns (monomorphic, inline cached)// polymorphicCall: ~5-15 ns (megamorphic, vtable dispatch)The absolute numbers matter less than the ratios. If virtual dispatch is 3x slower than direct calls in your benchmark, that 3x ratio will roughly hold in real code. But 3x of 2 nanoseconds is 6 nanoseconds—still trivial compared to a network call taking 1 million nanoseconds.
When profiling reveals that polymorphic dispatch is a genuine bottleneck, several strategies can help without abandoning object-oriented design entirely.
final classes and methods can be inlined aggressively.Circle circle instead of Shape shape. Compiler can eliminate virtual dispatch.List<Shape> with mixed types, maintain separate List<Circle>, List<Rectangle>. Process each type optimally.processAll(List<Item>) instead of process(Item) × N.1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
// BEFORE: Virtual dispatch in tight loopvoid processShapes(List<Shape> shapes) { for (Shape shape : shapes) { shape.draw(); // Virtual call per shape }} // OPTIMIZATION 1: Homogeneous collectionsrecord ShapeCollections( List<Circle> circles, List<Rectangle> rectangles, List<Triangle> triangles) {} void processShapesOptimized(ShapeCollections shapes) { // Each loop is monomorphic - JIT can devirtualize and inline for (Circle c : shapes.circles()) { c.draw(); // Concrete type, may inline } for (Rectangle r : shapes.rectangles()) { r.draw(); } for (Triangle t : shapes.triangles()) { t.draw(); }} // OPTIMIZATION 2: Batch processinginterface ShapeProcessor { void processAll(List<? extends Shape> shapes); // Batch method} class CircleProcessor implements ShapeProcessor { @Override public void processAll(List<? extends Shape> shapes) { // Process all at once, virtual call happens once for (Shape s : shapes) { // Implementation knows all are Circles Circle c = (Circle) s; // ... optimized circle processing } }} // OPTIMIZATION 3: Manual inline cachingvoid processWithInlineCache(List<Shape> shapes) { // Track last type seen Class<?> lastType = null; for (Shape shape : shapes) { Class<?> currentType = shape.getClass(); // Fast path: same type as before if (currentType == Circle.class) { ((Circle) shape).drawOptimized(); // Direct call } else if (currentType == Rectangle.class) { ((Rectangle) shape).drawOptimized(); // Direct call } else { shape.draw(); // Fallback virtual dispatch } }}Every optimization technique trades something: homogeneous collections lose flexibility, CRTP loses heterogeneous containers, manual caching adds complexity. Only optimize when profiling proves it's necessary. Premature optimization damages maintainability for no measurable gain.
Beyond dispatch overhead, polymorphism affects memory layout and cache performance in ways that can dominate raw call costs.
The cache locality problem:
Polymorphic collections often suffer from poor cache locality:
Scattered vtables — Objects of different types have vtables at different memory locations. Iterating mixed collections causes vtable cache misses.
Pointer chasing — Polymorphic references are pointers to heap objects. Following pointers defeats CPU prefetchers and spatial locality.
Object size variation — Different derived types have different sizes. Collections become arrays of pointers, not contiguous data.
RTTI overhead — Runtime type checking requires accessing type information structures, adding more cache pressure.
| Pattern | Cache Behavior | Performance |
|---|---|---|
| struct array (values) | Sequential, predictable | Excellent (prefetch effective) |
| homogeneous object array | Pointer chase, but same vtable | Good (vtable cached) |
| heterogeneous object array | Pointer chase, different vtables | Poor (vtable thrashing) |
| Virtual calls per element | N vtable lookups + N function pointers | Poor for small methods |
12345678910111213141516171819202122232425262728293031323334353637383940414243
// Data-oriented design: Cache-friendly alternatives // ANTI-PATTERN: Array of pointers to polymorphic objectsstd::vector<Shape*> shapes; // Scattered memory, vtable thrashing // BETTER: Separate homogeneous arrays (Structure of Arrays)struct ShapeData { std::vector<Circle> circles; std::vector<Rectangle> rectangles; std::vector<Triangle> triangles;}; void processAllShapes(ShapeData& data) { // Process circles - all Circle vtables in cache for (auto& c : data.circles) { c.draw(); // Same vtable for all } // Process rectangles - now Rectangle vtable in cache for (auto& r : data.rectangles) { r.draw(); } // etc.} // EVEN BETTER: Data-oriented, no polymorphism at allstruct CircleData { std::vector<double> x; // Positions std::vector<double> y; std::vector<double> radius; // Circle-specific}; void drawAllCircles(const CircleData& data, size_t count) { // Pure data iteration - maximum cache efficiency for (size_t i = 0; i < count; ++i) { drawCircle(data.x[i], data.y[i], data.radius[i]); } // Can vectorize with SIMD} // Trade-off: Lose OOP elegance, gain performance// Use when processing millions of entities per frame (games, simulations)In extremely performance-critical contexts (game engines, scientific computing), Data-Oriented Design (DOD) replaces OOP hierarchies with flat data structures optimized for cache access. This is the extreme end of the spectrum—maximum performance, minimum abstraction. Most applications don't need this, but knowing it exists helps understand the tradeoff space.
Given everything we've covered, how do you decide when to use static vs dynamic polymorphism? Here's a practical decision framework.
| Situation | Recommendation | Reasoning |
|---|---|---|
| Plugin architecture | Dynamic (interfaces) | Types not known at compile time |
| Mathematical operations on known types | Static (generics/templates) | Full optimization, type safety |
| Business logic handlers | Dynamic (strategy pattern) | Flexibility trumps micro-performance |
| High-frequency trading loop | Static or no polymorphism | Every nanosecond counts |
| GUI event handling | Dynamic (observer) | Type of handler varies at runtime |
| Container library internals | Static (templates) | Performance critical, types known |
| Unit test mocking | Dynamic (interfaces) | Runtime injection of test doubles |
Let's examine how major systems and teams have navigated these tradeoffs.
Across these examples, the pattern is consistent: use dynamic polymorphism by default, measure actual performance, optimize the specific hot spots that matter using techniques appropriate to the context. No one-size-fits-all solution exists; context determines the right tradeoff.
We've covered the performance landscape of polymorphism comprehensively. Here are the key insights:
Module complete:
You now have a comprehensive understanding of compile-time and runtime polymorphism—from the mechanisms and resolution processes to the performance implications that guide real-world design decisions. This knowledge enables you to write flexible, maintainable code that performs well, choosing the right polymorphism approach for each specific context.
Congratulations! You've mastered compile-time vs runtime polymorphism. You understand static dispatch mechanics, dynamic dispatch via vtables, how compilers and runtimes resolve calls, and how to make performance-informed decisions. Apply this knowledge to write systems that are both elegant and efficient.