Loading learning content...
In the early days of computing, programmers were responsible for every byte of memory their programs used. Allocate too little, and the program crashes. Allocate and forget to free, and memory leaks drain the system. Free too early, and you get dangling pointers—one of the most insidious bugs in software.
Garbage collection changed everything. It's an automatic memory management technique that identifies objects no longer in use and reclaims their memory—without any explicit action from the programmer. Languages like Java, C#, Python, JavaScript, Go, and many others rely on garbage collection to handle the complexity of heap memory management.
But garbage collection isn't magic. Understanding how it works is essential for writing performant code, diagnosing memory issues, and making informed design decisions. What seems like 'automatic' is actually a sophisticated system with its own behaviors, tradeoffs, and failure modes.
By the end of this page, you will understand how object references work, how garbage collectors determine object reachability, the major GC algorithms and their tradeoffs, how GC affects application performance, and how to write GC-friendly code that minimizes memory pressure.
Before we can understand garbage collection, we must understand object references—the mechanism by which programs access heap-allocated objects.
What is a Reference?
A reference is a value that refers to an object in memory. Depending on the language and implementation, references may be:
In garbage-collected languages, references are typically opaque—you cannot meaningfully examine their numeric value or perform pointer arithmetic.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// PRIMITIVE VALUES - Copied by value, compared by valuelet a: number = 42;let b: number = a; // b gets a COPY of 42b = 100; // Changing b doesn't affect aconsole.log(a); // 42 - unchanged let str1: string = "hello";let str2: string = "hello";console.log(str1 === str2); // true - strings compared by value in JS // OBJECT REFERENCES - Copied by reference, compared by identitylet obj1 = { name: "Alice" };let obj2 = obj1; // obj2 gets a COPY OF THE REFERENCE // Both point to the SAME object on the heap obj2.name = "Bob"; // Modifying through obj2...console.log(obj1.name); // "Bob" - obj1 sees the change! // Reference comparison checks identity, not contentslet obj3 = { name: "Bob" };console.log(obj1 === obj2); // true - same objectconsole.log(obj1 === obj3); // false - different objects, same contents // Creating the reference graphclass Node { value: number; children: Node[] = []; constructor(value: number) { this.value = value; }} const root = new Node(1); // Object A on heapconst child1 = new Node(2); // Object B on heapconst child2 = new Node(3); // Object C on heap root.children.push(child1); // A references Broot.children.push(child2); // A references Cchild1.children.push(child2); // B also references C // Reference graph:// root (A)// / \// child1 (B) child2 (C)// \ /// \-------/// // C has two incoming references (from A and B)// If we clear root.children, both B and C become unreachableThe Object Graph:
Objects and their references form a directed graph. Each object is a node, and each reference is an edge. This graph is the data structure that garbage collectors analyze to determine which objects are still in use.
Key Terminology:
1234567891011121314151617181920212223242526
GC ROOTS HEAP OBJECTS═══════════ ════════════ ┌─────────────┐│ Stack Frame │ ┌─────────┐│ localVar ──┼──────────────────►│ Object A│└─────────────┘ │ field1 ─┼──────┐ │ field2 ─┼──┐ │ └─────────┘ │ │ │ │┌─────────────┐ ┌─────────┐ │ │ ┌─────────┐│ Static Field│ │ Object B│◄─┘ └───►│ Object C││ classVar ──┼──────────────────►│ child ──┼─────────►│ │└─────────────┘ └─────────┘ └─────────┘ ┌─────────────┐ ┌─────────┐│ Thread Local│ │ Object D│ ← NO PATH FROM ROOTS!│ (none) │ │ │ This is GARBAGE└─────────────┘ └─────────┘ ┌─────────┐ ┌─────────┐ │ Object E│────►│ Object F│ │ │◄────│ │ └─────────┘ └─────────┘ ↑ Circular reference but NO root path Both E and F are GARBAGEA common misconception is that circular references prevent garbage collection. Modern GCs use reachability analysis, not reference counting. If a circular group of objects has no path from any root, the entire group is garbage and will be collected. Only pure reference counting (without cycle detection) struggles with cycles.
Garbage collection operates on a simple principle: if an object cannot be reached, it cannot be used, and its memory can be reclaimed. The GC's job is to find all unreachable objects and free their memory—without disrupting the running program more than necessary.
The GC Contract:
Garbage collectors provide several guarantees:
Note: The GC does not guarantee when collection occurs or that all unreachable memory is immediately freed.
| Metric | Definition | Importance |
|---|---|---|
| Throughput | Percentage of time spent in application code (vs GC) | High throughput = more productive work |
| Latency (Pause Time) | Duration of GC-induced application pauses | Low latency = responsive applications |
| Footprint | Total memory used by application + GC overhead | Low footprint = efficient resource use |
| Promptness | How quickly dead objects are collected | Faster = more predictable memory use |
| Allocation Rate | How fast the application creates new objects | Higher rate = more frequent GC |
| Survival Rate | Fraction of objects surviving each GC cycle | Higher rate = more promotion work |
The Fundamental Tradeoff:
Garbage collectors face an inherent tension between throughput and latency. A GC that runs infrequently (high throughput) must do more work when it does run (higher latency). A GC that runs frequently (low latency) interrupts application work more often (lower throughput).
Different applications have different priorities:
Over decades of research and engineering, several garbage collection algorithms have emerged. Understanding their characteristics helps you choose appropriate GC strategies and understand your runtime's behavior.
1. Reference Counting
The simplest approach: each object maintains a count of references pointing to it. When the count drops to zero, the object is immediately freed.
Advantages: Immediate reclamation, predictable behavior, no full-heap traversal Disadvantages: Cannot handle cycles without additional mechanisms, counter updates add overhead, poor cache locality during deallocation chains
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
// Conceptual reference counting (languages implement this internally) class RefCountedObject { private refCount: number = 0; private data: any; // Called when a new reference is created addRef(): void { this.refCount++; } // Called when a reference is removed release(): void { this.refCount--; if (this.refCount === 0) { this.cleanup(); // Immediate deallocation } } private cleanup(): void { // Release any objects this object references if (this.data && this.data.release) { this.data.release(); } // Free this object's memory console.log("Object deallocated"); }} // THE CYCLE PROBLEM:class CyclicNode { refCount = 1; partner: CyclicNode | null = null;} const a = new CyclicNode(); // refCount = 1const b = new CyclicNode(); // refCount = 1 a.partner = b; // b.refCount = 2b.partner = a; // a.refCount = 2 // Now clear the original references// a = null; // a.refCount = 1 (still referenced by b)// b = null; // b.refCount = 1 (still referenced by a) // Neither can reach 0! Memory leak without cycle detection.2. Mark-and-Sweep
The foundational tracing collector. Periodically pauses the application, marks all reachable objects starting from roots, then sweeps through memory freeing unmarked objects.
Advantages: Handles cycles naturally, no runtime overhead during normal execution Disadvantages: Stop-the-world pauses, heap fragmentation over time
1234567891011121314151617181920212223
MARK PHASE:═══════════Starting from roots, traverse and mark reachable objects Root ──► [A ✓] ──► [B ✓] ──► [C ✓] │ └──► [D ✓] [E ] [F ] (not reachable, not marked) ↑____↓ cycle SWEEP PHASE:════════════Scan heap, free unmarked objects BEFORE: [A ✓][E ][B ✓][F ][C ✓][D ✓] ↓ ↓AFTER: [A ✓][FREE][B ✓][FREE][C ✓][D ✓] Clear marks for next cycle: [A ][FREE][B ][FREE][C ][D ]3. Mark-and-Compact
An evolution of mark-and-sweep that adds a compaction phase. After marking, live objects are moved to one end of the heap, eliminating fragmentation.
Advantages: No fragmentation, better cache locality, simple bump-pointer allocation Disadvantages: Additional work to move objects and update references, longer pauses
4. Copying (Semispace) Collector
Divides heap into two equal halves: 'from-space' and 'to-space'. Allocations happen in from-space. During GC, live objects are copied to to-space, and the spaces swap roles.
Advantages: Implicit compaction, allocation is simple bump-pointer, only touches live objects Disadvantages: Requires 2x memory, copying cost proportional to live data
123456789101112131415161718192021
BEFORE GC:══════════ ┌─────────── FROM-SPACE ───────────┐ ┌─────────── TO-SPACE ────────────┐ │ [A][B][dead][C][dead][dead][D] │ │ (empty) │ └──────────────────────────────────┘ └─────────────────────────────────┘ allocation ──────► DURING GC (Copy live objects):══════════════════════════════ ┌─────────── FROM-SPACE ───────────┐ ┌─────────── TO-SPACE ────────────┐ │ [A][B][dead][C][dead][dead][D] │ │ [A'][B'][C'][D'] │ └──────────────────────────────────┘ └─────────────────────────────────┘ │ │ └────── Copy live objects ───────────┘ AFTER GC (Swap spaces):═══════════════════════ ┌─────────── TO-SPACE ─────────────┐ ┌─────────── FROM-SPACE ──────────┐ │ (empty, available) │ │ [A][B][C][D] │ └──────────────────────────────────┘ └─────────────────────────────────┘ allocation ──────►Most modern garbage collectors use a generational approach based on the weak generational hypothesis: most objects die young. By dividing the heap into generations and collecting young objects frequently while collecting old objects rarely, GCs optimize for the common case.
The Weak Generational Hypothesis:
Empirical studies across many applications consistently show:
| Generation | Also Called | Purpose | GC Frequency |
|---|---|---|---|
| Young Generation | Nursery, Eden + Survivors | New object allocation | Very frequent (Minor GC) |
| Old Generation | Tenured, Mature | Long-lived objects | Infrequent (Major GC) |
| Metaspace/PermGen | Method Area (older JVMs) | Class metadata, interned strings | Rarely collected |
1234567891011121314151617181920212223242526
YOUNG GENERATION OLD GENERATION════════════════ ══════════════ ┌────────────────────────────────────────┐ ┌──────────────────────┐│ EDEN SPACE │ S0 │ S1 │ │ ││ │ │ │ │ Long-lived objects ││ New allocations │Survivor│Survivor│ ─────► │ promoted after N ││ happen here │Space 0│Space 1 │ │ GC cycles ││ │ │ │ │ │└────────────────────────────────────────┘ └──────────────────────┘ MINOR GC (Young Generation Only):════════════════════════════════1. Eden fills up → triggers Minor GC2. Scan roots and Eden for live objects3. Copy live objects to empty survivor space (S0 or S1)4. Objects surviving N minor GCs → promote to Old Gen5. Swap survivor spaces6. Clear Eden MAJOR GC (Full Heap):═════════════════════1. Old Generation fills up → triggers Major/Full GC2. Collect both Young and Old generations3. May compact Old Generation4. Significantly more expensive than Minor GCObjects promoted to the Old Generation stay there until a Major GC, which is expensive. If many medium-lived objects get promoted, your Old Generation fills up faster, triggering more frequent Major GCs. This 'mid-life crisis' pattern can severely impact performance.
Write Barriers:
Generational GC introduces a challenge: what if an Old Generation object references a Young Generation object? The Minor GC cannot collect the young object (it's reachable) but doesn't scan the entire Old Generation (too slow).
The solution is write barriers—small code snippets executed whenever a reference is written. If an Old→Young reference is created, it's recorded in a 'remembered set' so Minor GC knows to treat those Old objects as roots.
Write barriers add a small overhead to every reference assignment, but the benefit of avoiding full-heap scans vastly outweighs this cost.
Traditional stop-the-world collectors pause all application threads during GC. For applications requiring low latency, this is unacceptable. Modern collectors employ concurrent and incremental techniques to minimize or eliminate pauses.
Concurrent GC:
GC work runs simultaneously with application threads. Most marking and some sweeping can happen without pausing the application. Only brief 'safepoints' require stopping threads (typically for root scanning and handover phases).
Incremental GC:
GC work is broken into small chunks, interleaved with application execution. Instead of one long pause, many short pauses spread the work over time.
| Aspect | Stop-the-World | Concurrent | Incremental |
|---|---|---|---|
| Application pauses | Long, complete stop | Very short (safepoints) | Many short pauses |
| Complexity | Simple | High (synchronization) | Medium (progress tracking) |
| Overhead | Zero during normal execution | Barriers, synchronization | Progress checks |
| Throughput | High (concentrated work) | Medium (coordination overhead) | Medium |
| Latency | Poor (long pauses) | Excellent | Good (bounded pauses) |
The Tri-Color Abstraction:
Concurrent collectors commonly use the tri-color marking abstraction to safely mark objects while the application mutates the object graph:
The invariant: No black object points directly to a white object. If it did, the white object might be collected even though it's reachable. Write barriers maintain this invariant by marking targets of writes or re-examining sources.
123456789101112131415161718192021222324252627282930313233343536373839
INITIAL STATE: AFTER MARKING ROOT:══════════════ ═══════════════════ [Root] [Root] │ │ ▼ ▼ [A] ─────► [B] [A] ─────► [B] │ │ gray │ white ▼ ▼ [C] ─────► [D] [C] [D] white white All objects start WHITE Root marked BLACK, A marked GRAY AFTER PROCESSING A: AFTER PROCESSING B & C:═══════════════════ ═════════════════════ [Root] [Root] │ │ ▼ ▼ [A] ─────► [B] [A] ─────► [B] │ black │ gray │ black │ black ▼ ▼ [C] [D] [C] ───► [D] gray white black gray FINAL STATE:════════════ [Root] All reachable objects marked BLACK │ Any remaining WHITE objects are garbage ▼ [A] ─────► [B] If [E] exists and is WHITE, it's collected │ black │ black ▼ [C] ───► [D] black blackWhile the GC marks objects, the application keeps running and modifying references. Write barriers intercept reference updates to ensure the invariant holds. This 'GC paces with allocator' dynamic is complex to implement correctly but enables true concurrent collection.
Garbage collection, despite being 'automatic', has significant performance implications that every developer should understand. The key is recognizing that allocation is not free—every object you create will eventually be collected.
Direct Costs:
| Application Type | Tolerable Pause | GC Strategy | Risks |
|---|---|---|---|
| Batch processing | Seconds | Maximize throughput | Total GC time |
| Web services (p99) | 50-100ms | Balanced | Tail latency spikes |
| Real-time trading | < 1ms | Ultra-low latency | Any pause = lost money |
| Games (60 FPS) | < 16ms | Incremental | Visible stutter |
| Interactive UI | < 100ms | Concurrent | User frustration |
| Hard real-time | 0ms | Manual or GC-free | System failure |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
// ❌ HIGH GC PRESSURE: Creating objects in hot loops function processEventsHeavy(events: EventData[]): Summary { let totalValue = 0; for (const event of events) { // New object created EVERY iteration const processed = { id: event.id, value: event.value * 1.1, timestamp: new Date(), }; totalValue += processed.value; // Each 'processed' becomes garbage immediately after use } return { total: totalValue };}// For 10M events: creates 10M objects that immediately become garbage // ✅ LOW GC PRESSURE: Reuse objects, avoid allocations in loops interface ProcessedEvent { id: string; value: number; timestamp: Date;} function processEventsLight(events: EventData[]): Summary { let totalValue = 0; // Reuse a single object const processed: ProcessedEvent = { id: '', value: 0, timestamp: new Date() }; for (const event of events) { processed.id = event.id; processed.value = event.value * 1.1; // Don't create new Date - reuse or skip if not needed totalValue += processed.value; } return { total: totalValue };}// For 10M events: creates only 2-3 objects total // ✅ EVEN BETTER: Use primitives when possible function processEventsPrimitive(events: EventData[]): number { let totalValue = 0; for (const event of events) { // Primitives don't create heap allocations totalValue += event.value * 1.1; } return totalValue;}// Zero new heap allocations in the hot loopDon't prematurely optimize for GC. Modern GCs are highly sophisticated. Profile your application with tools like VisualVM (Java), dotMemory (.NET), or Chrome DevTools (JavaScript) to identify actual GC bottlenecks before changing code.
Most applications work well with default GC settings, but high-performance systems often benefit from tuning. The key is understanding what you're trading off and measuring the results.
General Tuning Guidelines:
1234567891011121314151617181920212223
# Throughput-oriented (for batch processing)java -XX:+UseParallelGC \ -Xms8g -Xmx8g \ -XX:NewRatio=2 \ -XX:ParallelGCThreads=8 \ -jar batch-processor.jar # Low-latency (for web services)java -XX:+UseG1GC \ -Xms4g -Xmx4g \ -XX:MaxGCPauseMillis=50 \ -XX:G1HeapRegionSize=16m \ -jar web-service.jar # Ultra-low-latency (for trading systems)java -XX:+UseZGC \ -Xms16g -Xmx16g \ -XX:SoftMaxHeapSize=14g \ -jar trading-engine.jar # Enable GC logging for analysisjava -Xlog:gc*:file=gc.log:time,uptime:filecount=5,filesize=10m \ -jar application.jar1234567891011
# Increase old space size (for large data applications)node --max-old-space-size=4096 app.js # Expose GC for manual triggering (debugging only)node --expose-gc app.js # Enable detailed GC tracingnode --trace-gc app.js # Tune for server workloadsnode --optimize-for-size --max-old-space-size=2048 server.jsGC tuning that works in development may not work in production. Different heap sizes, allocation rates, and object lifetimes produce different behavior. Always test tuning changes under production-like load, and consider A/B testing configurations.
We've explored the mechanisms by which managed runtimes track and reclaim memory. Let's consolidate the key insights:
What's Next:
Even with garbage collection, memory management isn't entirely automatic. The next page examines Memory Leaks in Managed Languages—how applications can still waste memory despite having a GC, common leak patterns, and techniques for detection and prevention.
You now understand how garbage collectors work, from fundamental concepts to modern algorithms. This knowledge enables you to write GC-friendly code, diagnose memory issues, and make informed decisions about runtime configuration and application design.