Loading learning content...
Memory bottlenecks are insidious. Unlike CPU saturation (visible in top) or disk I/O (visible in iostat), memory issues often manifest indirectly as seemingly unrelated problems:
Memory is the silent resource—taken for granted when available, catastrophic when exhausted. Understanding memory constraints is essential for any engineer working on production systems.
By the end of this page, you will understand how memory is used in applications, the impact of garbage collection on performance, techniques for diagnosing memory issues, strategies for right-sizing memory allocations, and how to identify and fix memory leaks.
Before diagnosing memory problems, we need to understand how applications use memory. Different languages and runtimes have different memory models, but common patterns emerge:
Memory Regions in a Typical Application:
| Region | Contains | Allocation | Lifecycle | Common Issues |
|---|---|---|---|---|
| Stack | Function call frames, local variables | Automatic (LIFO) | Function scope | Stack overflow from deep recursion |
| Heap | Dynamic allocations (objects, arrays) | Manual or GC-managed | Until freed or GC'd | Leaks, fragmentation, GC pauses |
| Code/Text | Compiled code, instructions | At load time | Process lifetime | Rarely a problem |
| Static/Global | Global variables, constants | At load time | Process lifetime | Large static allocations |
| Thread-Local | Per-thread storage | Per-thread allocation | Thread lifetime | Memory growth with thread count |
The Heap: Where Most Memory Lives
In most applications, the heap dominates memory usage. It's where objects, data structures, caches, and buffers live. Heap management is the primary focus of memory optimization:
Most memory bottlenecks occur in the heap—too much allocated, too little available, or too much time spent managing it.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
#!/bin/bash# =====================================================# UNDERSTANDING MEMORY USAGE ON LINUX# ===================================================== # Process memory breakdowncat /proc/<pid>/status | grep -E "Vm|Rss" # VmPeak: Maximum virtual memory used# VmSize: Current virtual memory size# VmRSS: Resident Set Size - actual RAM used# VmData: Heap size# VmStk: Stack size # More detailed: pmappmap -x <pid> | head -50 # Example output:# Address Kbytes RSS Dirty Mode Mapping# 0000000000400000 2048 1024 0 r-x-- java# 00000000006f4000 128 128 128 rw--- java (data segment)# 00007f0000000000 262144 200000 200000 rw--- [heap]# # The [heap] line shows your dynamic memory # =====================================================# JVM-SPECIFIC MEMORY ANALYSIS# ===================================================== # JVM native memory breakdownjcmd <pid> VM.native_memory summary # Output shows:# - Java Heap: where objects live# - Class: loaded class metadata# - Thread: per-thread stack memory# - Code: JIT compiled code# - GC: garbage collector data structures# - Internal: JVM internal data# - Symbol: interned strings # =====================================================# KEY METRICS TO MONITOR# ===================================================== # 1. RSS (Resident Set Size): Actual RAM used# - If approaching system memory: risk of OOM # 2. Heap usage vs heap size: # - High usage (>80%): frequent GC, risk of OOM# - Low usage (<30%): heap oversized, wasting RAM # 3. GC time percentage:# - >5% is concerning# - >10% is a serious problem # 4. Memory growth over time:# - Constant growth without plateau = memory leakGarbage collection (GC) is the automatic memory management used by most modern languages. While it frees developers from manual memory management, it introduces performance overhead that must be understood and managed.
How Garbage Collection Works (Simplified):
The GC Pause Problem:
During GC, application threads may be paused ("stop-the-world"). A 100ms GC pause means 100ms where no user requests are processed. For latency-sensitive services (gaming, trading, real-time), GC pauses are unacceptable.
| Collector | Pause Characteristics | Throughput | Best For | Heap Size |
|---|---|---|---|---|
| G1 (default) | 10-200ms pauses | High | General purpose, balanced | 4GB-64GB |
| ZGC | < 10ms pauses (usually < 1ms) | Slightly lower | Latency-sensitive | Any, including TB scale |
| Shenandoah | < 10ms pauses | Moderate | Latency-sensitive (OpenJDK) | Medium to large |
| Parallel GC | Can be 1s+ for large heaps | Highest | Throughput, offline processing | Medium |
| Serial GC | Long pauses | Low | Small heaps, client apps | < 100MB |
GC Tuning Principles:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
#!/bin/bash# =====================================================# JVM GC ANALYSIS# ===================================================== # Enable GC logging (Java 11+)java -Xlog:gc*:file=gc.log:time,uptimemillis:filecount=5,filesize=100m -jar app.jar # Key patterns to look for in GC logs:# # [0.234s] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 23M->8M(256M) 4.234ms# ^^^^^^ ^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^# GC ID Reason Heap change Pause time## Pause time > 100ms: Investigate# Frequent GC (every few seconds): Allocation rate too high# Heap not shrinking after GC: Possible leak or right-sized # Analyze GC logs with tools:# - GCViewer: visual analysis# - GCEasy: online analysis# - JClarity Censum: detailed reports # =====================================================# DETECTING GC IMPACT ON LATENCY# ===================================================== # Pattern: Request latency spikes correlate with GC pauses## Application logs: [10:45:32.100] Request completed in 5ms# [10:45:32.200] Request completed in 312ms <- spike# [10:45:32.500] Request completed in 4ms## GC logs: [10:45:32.100] GC(42) Pause Full 280.5ms## Correlation: The 312ms request included a 280ms GC pause # JVM: Enable pause-time logging# -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps # Application: Log with nanosecond timestamps# Compare slow request timestamps with GC pause timestamps # =====================================================# REAL-TIME GC MONITORING# ===================================================== # jstat: GC statisticsjstat -gc <pid> 1000 # Every 1 second # Columns to watch:# EC - Eden capacity# EU - Eden used# OC - Old gen capacity# OU - Old gen used# GCT - Total GC time# FGCT - Full GC time # If FGCT grows continuously: Full GCs happening (bad)# If EU oscillates near EC: Young gen might be undersized # =====================================================# GC TUNING FOR LOW LATENCY# ===================================================== # Switch to ZGC (Java 15+)java -XX:+UseZGC -Xmx8g -jar app.jar # ZGC characteristics:# - Pause times < 10ms regardless of heap size# - Concurrent marking, relocation, and reference processing# - Slight throughput overhead (~5-15%) # Or Shenandoah (OpenJDK)java -XX:+UseShenandoahGC -Xmx8g -jar app.jarDon't start with GC tuning. In most cases, excessive allocation in application code is the root cause. Profile allocation hotspots first. Reducing allocations is almost always more effective than tuning the GC to handle more garbage.
A memory leak occurs when memory is allocated but never freed, causing memory usage to grow unboundedly over time. In garbage-collected languages, leaks typically occur when objects remain reachable (referenced) even though they're no longer needed.
How Memory Leaks Manifest:
Common Causes of Memory Leaks:
| Pattern | Description | Example | Detection |
|---|---|---|---|
| Uncleared collections | Objects added to collections, never removed | Global cache without eviction | Collection size grows unboundedly |
| Event listener leaks | Listeners registered, never unregistered | DOM event handlers in SPAs | Object count grows per interaction |
| Closure captures | Closures retain references to outer scope | Lambda capturing large context | Heap shows unexpected retained objects |
| Thread-local leaks | Thread-local data not cleaned up | ThreadLocal in thread pools | Per-thread memory grows |
| Native resource leaks | Native resources (DB connections, files) not closed | Unclosed streams, handles | ulimit hit, file descriptors exhausted |
| ClassLoader leaks | ClassLoaders retained, preventing class unloading | Repeated hot deployments | Metaspace/PermGen grows |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
# =====================================================# COMMON MEMORY LEAK PATTERNS# ===================================================== # PATTERN 1: Unbounded cache without evictionclass LeakingCache: def __init__(self): self._cache = {} # Grows forever! def get(self, key, compute_fn): if key not in self._cache: self._cache[key] = compute_fn() # Never evicted return self._cache[key] # FIX: Use LRU cache with size limitfrom functools import lru_cache @lru_cache(maxsize=1000) # Max 1000 entriesdef compute_cached(key): return expensive_computation(key) # PATTERN 2: Global registries that aren't clearedclass EventBus: _listeners = [] # Class-level list - never cleared! @classmethod def register(cls, listener): cls._listeners.append(listener) # Listener never removed # Missing: unregister method # FIX: Weak references for optional retentionimport weakref class SafeEventBus: _listeners = [] @classmethod def register(cls, listener): # Weak reference - GC can collect if no other refs cls._listeners.append(weakref.ref(listener)) @classmethod def emit(cls, event): # Clean up dead references during emit cls._listeners = [l for l in cls._listeners if l() is not None] for listener_ref in cls._listeners: listener = listener_ref() if listener: listener.handle(event) # PATTERN 3: Closure capturing large contextdef create_handlers_bad(large_objects): handlers = [] for obj in large_objects: # Each lambda captures entire 'obj' - prevents GC handlers.append(lambda: print(obj.id)) return handlers # FIX: Capture only needed datadef create_handlers_good(large_objects): handlers = [] for obj in large_objects: obj_id = obj.id # Extract only needed value handlers.append(lambda id=obj_id: print(id)) return handlers # =====================================================# DETECTING MEMORY LEAKS IN PYTHON# ===================================================== import tracemallocimport gc # Enable tracingtracemalloc.start() # ... run application ... # Take snapshots and comparesnapshot1 = tracemalloc.take_snapshot() # ... more application work ... snapshot2 = tracemalloc.take_snapshot() # Compare snapshots - shows what grewtop_stats = snapshot2.compare_to(snapshot1, 'lineno')print("[ Top memory increases ]")for stat in top_stats[:10]: print(stat) # Force GC and check unreachable objectsgc.collect()print(f"Unreachable objects: {len(gc.garbage)}")123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
#!/bin/bash# =====================================================# JAVA/JVM MEMORY LEAK DETECTION# ===================================================== # Step 1: Take heap dump when memory is highjmap -dump:format=b,file=heap.hprof <pid> # Or enable automatic dump on OOMjava -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps -jar app.jar # Step 2: Analyze heap dump# Tools: Eclipse MAT (Memory Analyzer Tool), VisualVM, JProfiler # In Eclipse MAT:# 1. Open heap dump# 2. Run "Leak Suspects" report - automated detection# 3. Check "Dominator Tree" - who retains most memory# 4. Look for unexpected object counts## Common findings:# - HashMap with 10 million entries (should be 1000)# - String[] taking 2GB (log accumulator?)# - Many instances of same class (object pool not releasing) # =====================================================# LIVE OBJECT MONITORING# ===================================================== # Class histogram - live object counts by classjcmd <pid> GC.class_histogram | head -30 # Output:# num #instances #bytes class name# -----------------------------------------------# 1: 4234567 169382680 java.lang.String# 2: 1234567 39506144 java.util.HashMap$Node# 3: 567890 27258720 [C (char arrays) # If #instances keeps growing: potential leak in that class # =====================================================# COMPARING HEAP DUMPS OVER TIME# ===================================================== # Take dumps at different times during memory growthjmap -dump:format=b,file=heap_t1.hprof <pid># Wait...jmap -dump:format=b,file=heap_t2.hprof <pid> # Compare in MAT:# File -> Compare Heap Dumps# Shows objects that grew between snapshots# Focus investigation on growing classesMemory leaks often take hours or days to become apparent. Run load tests for extended periods (24+ hours) with memory monitoring. A healthy application's memory should plateau; a leaking one shows continuous growth.
Incorrect memory sizing is a subtle performance killer. Too little memory causes thrashing (constant GC or swapping). Too much wastes resources and can actually hurt performance (larger GC scans, less efficient CPU cache usage).
Memory Sizing Principles:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
# =====================================================# KUBERNETES MEMORY CONFIGURATION# ===================================================== # Common mistake: Set JVM heap = container limit# Result: JVM uses native memory beyond heap, OOMKilled # WRONG:apiVersion: v1kind: Podspec: containers: - name: app resources: limits: memory: "4Gi" # Container limit env: - name: JAVA_OPTS value: "-Xmx4g" # Heap = container limit = OOMKill! # CORRECT: Leave headroom for non-heap memoryapiVersion: v1kind: Podspec: containers: - name: app resources: requests: memory: "4Gi" # Guaranteed memory limits: memory: "4Gi" # Max memory (set equal for predictability) env: - name: JAVA_OPTS # Heap = 75% of container, leaves room for: # - Metaspace (~100-300MB) # - Native memory (thread stacks, JNI) # - Code cache # - GC overhead value: "-Xmx3g -Xms3g -XX:MaxMetaspaceSize=256m" # =====================================================# MEMORY SIZING CHECKLIST# ===================================================== # For a JVM application:# 1. Container memory = Heap + Metaspace + Native + Buffer# Example: 3GB heap + 256MB meta + 512MB native + 256MB buffer = 4GB## 2. Set -Xms = -Xmx for predictable behavior# Avoids heap resizing overhead## 3. Monitor these metrics:# - jvm_memory_used_bytes{area="heap"}# - jvm_memory_used_bytes{area="nonheap"}# - container_memory_working_set_bytes# - container_memory_rss## 4. Alert if:# - Heap usage > 80% sustained# - Container memory approaching limit# - OOMKill events # =====================================================# NON-JVM APPLICATIONS# ===================================================== # Python/Ruby/Node: No fixed heap, memory grows as needed# Key: Monitor RSS and set memory limits to prevent runaway # Python example (gunicorn):# - Each worker uses ~100-500MB baseline# - 8 workers = 800MB - 4GB# - Set container limit with headroom # Node.js:# - Default heap limit ~1.5GB (varies by version)# - Override with --max-old-space-size=4096 (MB)# - Container should be larger than max heapCache Sizing Strategy:
Caches consume significant memory. Sizing caches correctly is crucial—too small defeats the purpose, too large wastes RAM and causes GC pressure.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
# =====================================================# CACHE SIZING STRATEGY# ===================================================== # Key question: What's your target hit rate?# # Zipf's Law: Most accesses hit a small set of items# - 20% of items often account for 80%+ of accesses# - Caching just the "hot" items provides most benefit # Example: 1 million unique items, want 90% hit rate# If access follows Zipf distribution:# - Caching top 10% (100K items) might give 70% hit rate# - Caching top 30% (300K items) might give 90% hit rate from functools import lru_cacheimport sys # Measure cache entry sizesample_entry = {"id": 12345, "name": "Sample Product", "price": 99.99}entry_size = sys.getsizeof(sample_entry) # Approximate # Calculate max entries for target memorytarget_memory_mb = 512max_entries = (target_memory_mb * 1024 * 1024) // entry_sizeprint(f"Max cache entries for {target_memory_mb}MB: {max_entries}") # =====================================================# MONITORING CACHE EFFECTIVENESS# ===================================================== class MonitoredCache: def __init__(self, maxsize: int): self._cache = {} self._maxsize = maxsize self._hits = 0 self._misses = 0 def get(self, key): if key in self._cache: self._hits += 1 return self._cache[key] self._misses += 1 return None def put(self, key, value): if len(self._cache) >= self._maxsize: # Evict oldest (simplified - use proper LRU) oldest = next(iter(self._cache)) del self._cache[oldest] self._cache[key] = value @property def hit_rate(self): total = self._hits + self._misses return self._hits / total if total > 0 else 0 @property def size(self): return len(self._cache) # Regular monitoring:# - If hit_rate < 50%: cache too small or wrong data# - If hit_rate < 20% and cache full: reconsider caching strategy# - If hit_rate > 95% with room left: might be oversized # =====================================================# CACHE SIZING FORMULA# ===================================================== def calculate_cache_size( unique_items: int, target_hit_rate: float, access_pattern: str = "zipf" # or "uniform") -> int: """ Estimate cache size needed for target hit rate. Assumes Zipf distribution by default (typical for web). """ if access_pattern == "uniform": # Uniform access: need to cache hit_rate % of items return int(unique_items * target_hit_rate) else: # Zipf distribution: power law relationship # This is an approximation - real data varies zipf_exponent = 1.0 # For 90% hit rate, typically need ~30% of items # For 80% hit rate, typically need ~15% of items # For 99% hit rate, might need ~50-70% of items size_multiplier = target_hit_rate ** (1 / zipf_exponent) return int(unique_items * size_multiplier * 0.5) # Example:# 1 million products, 90% hit rate targetneeded_size = calculate_cache_size(1_000_000, 0.9)# Result: ~300,000 entries (rough estimate)When you suspect memory issues, follow this systematic diagnostic approach:
Step 1: Observe Memory Metrics
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
#!/bin/bash# =====================================================# STEP 1: CURRENT MEMORY STATE# ===================================================== # System-wide memoryfree -h# total used free shared buff/cache available# Mem: 16Gi 8Gi 2Gi 100Mi 5Gi 7Gi# # Key: available (not free) - includes reclaimable cache # Process-specificps aux --sort=-%mem | head -10 # Top memory consumers # Detailed process memorycat /proc/<pid>/status | grep -E "VmRSS|VmSize|VmPeak" # =====================================================# STEP 2: MEMORY OVER TIME# ===================================================== # Watch memory growthwatch -n 1 'ps -p <pid> -o rss,vsz,pmem' # Or plot over time (requires logging)while true; do echo "$(date +%s),$(ps -p <pid> -o rss=)" sleep 60done > memory_log.csv # =====================================================# STEP 3: CHECK FOR OOM HISTORY# ===================================================== # Kernel OOM killer logsdmesg | grep -i "oom|killed" # Kubernetes OOMKilled eventskubectl describe pod <pod-name> | grep OOM # =====================================================# STEP 4: GC ANALYSIS (IF APPLICABLE)# ===================================================== # JVM: GC activityjstat -gc <pid> 1000 5 # 5 samples, 1 second apart # Check for:# - FGC (Full GC) count increasing# - FGCT (Full GC Time) growing# - OU (Old gen Used) near OC (Old gen Capacity) # =====================================================# STEP 5: HEAP DUMP FOR DETAILED ANALYSIS# ===================================================== # If memory is high and you need to understand why:jmap -dump:format=b,file=heap.hprof <pid> # Or for Python:# guppy3 / tracemalloc / memory_profiler # Analyze with appropriate tool for your stack| Symptom | Likely Cause | Investigation | Solution |
|---|---|---|---|
| RSS grows continuously | Memory leak | Compare heap dumps | Find and fix leak source |
| Frequent GC, long pauses | Heap too small for workload | Check heap utilization | Increase heap or reduce allocation |
| High RSS, low heap usage | Native memory leak | Check native memory tracking | Fix native resource handling |
| OOMKilled despite small heap | Container limit too low | Compare heap + non-heap to limit | Increase container memory |
| Latency spikes correlate with GC | Stop-the-world GC pauses | Check GC logs for pause times | Tune GC or use low-latency GC |
| Memory spike overnight | Batch job allocations | Profile batch job memory | Stream processing, chunk data |
Memory issues are often temporal. They may only appear under sustained load, after many hours of operation, or during specific business processes (end-of-month reports, etc.). Monitoring must be continuous and historical analysis is essential for diagnosis.
When memory is a constraint, these strategies help reduce usage without sacrificing functionality:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
# =====================================================# STREAMING INSTEAD OF LOADING# ===================================================== # ANTI-PATTERN: Load entire file into memorydef process_log_bad(filepath): with open(filepath) as f: lines = f.readlines() # Loads entire file! for line in lines: process(line) # 10GB log file = 10GB+ memory usage # PATTERN: Stream line by linedef process_log_good(filepath): with open(filepath) as f: for line in f: # Streams, constant memory process(line) # 10GB log file = ~1MB memory usage # =====================================================# OBJECT POOLING# ===================================================== from queue import Queuefrom contextlib import contextmanager class ObjectPool: """Reuse expensive objects instead of creating new ones.""" def __init__(self, factory, size=10): self._factory = factory self._pool = Queue() for _ in range(size): self._pool.put(factory()) @contextmanager def acquire(self): obj = self._pool.get() try: yield obj finally: self._pool.put(obj) # Return to pool # Usage:buffer_pool = ObjectPool(lambda: bytearray(1024 * 1024), size=10) def process_data(data): with buffer_pool.acquire() as buffer: # Reuse buffer instead of allocating new each time buffer[:len(data)] = data process(buffer) # =====================================================# GENERATORS FOR LAZY EVALUATION# ===================================================== # ANTI-PATTERN: Materialize entire resultdef get_all_users_bad(): users = [] for row in db.query("SELECT * FROM users"): users.append(User(**row)) return users # 1 million users = 1 million objects in memory # PATTERN: Generator - one at a timedef get_all_users_good(): for row in db.query("SELECT * FROM users"): yield User(**row) # Yields one at a time # Consumer still uses same code:for user in get_all_users_good(): process(user)# But only one User object in memory at a time # =====================================================# PRIMITIVE COLLECTIONS (Java-specific, but concept applies)# ===================================================== # Java: ArrayList<Integer> vs IntArrayList (primitive)# ArrayList<Integer>: Each int wrapped in Integer object (~16 bytes overhead)# IntArrayList (e.g., Eclipse Collections): Raw ints (~4 bytes each)# # 1 million integers:# - ArrayList<Integer>: ~20MB# - IntArrayList: ~4MB # Python equivalent: array module vs listimport array numbers_list = [1.0] * 1_000_000 # ~8MB per float + object overheadnumbers_array = array.array('d', [1.0] * 1_000_000) # ~8MB total # NumPy arrays even more efficient:import numpy as npnumbers_numpy = np.ones(1_000_000, dtype=np.float64) # ~8MB, highly optimizedThe most effective memory optimization is often using less data. Do you need all 50 columns? Can you paginate results? Can you aggregate server-side? Before optimizing how you store data in memory, question whether you need all that data in memory.
Memory bottlenecks are silent killers—hard to detect, expensive to diagnose, and capable of causing cascading failures. Understanding memory management is essential for any engineer working on production systems.
What's Next:
With memory constraints covered, we'll examine disk I/O limitations in the next page. You'll learn about storage performance characteristics, I/O patterns that kill performance, and strategies for optimizing disk-bound workloads.
You now understand how memory affects application performance, can diagnose memory issues systematically, and know strategies for optimizing memory usage. The silent killer is no longer silent—you can measure it, understand it, and control it.