System Design (HLD)Identifying Bottlenecks

Identifying Performance Bottlenecks

LevelAdvanced

Duration90 mins

TopicIdentifying Bottlenecks

4 / 5

Memory Constraints: The Silent Performance Killer

Memory: The Resource Everyone Forgets Until It's Gone

Memory bottlenecks are insidious. Unlike CPU saturation (visible in top) or disk I/O (visible in iostat), memory issues often manifest indirectly as seemingly unrelated problems:

Sudden latency spikes with no apparent cause (GC pauses)
Gradual performance degradation over hours or days (memory leaks)
Sporadic failures under load (heap exhaustion)
High CPU with low useful work (constant garbage collection)
Mysterious process restarts (OOM killer)

Memory is the silent resource—taken for granted when available, catastrophic when exhausted. Understanding memory constraints is essential for any engineer working on production systems.

What You Will Learn

By the end of this page, you will understand how memory is used in applications, the impact of garbage collection on performance, techniques for diagnosing memory issues, strategies for right-sizing memory allocations, and how to identify and fix memory leaks.

Understanding Application Memory

Before diagnosing memory problems, we need to understand how applications use memory. Different languages and runtimes have different memory models, but common patterns emerge:

Memory Regions in a Typical Application:

Memory Regions and Their Characteristics
Region	Contains	Allocation	Lifecycle	Common Issues
Stack	Function call frames, local variables	Automatic (LIFO)	Function scope	Stack overflow from deep recursion
Heap	Dynamic allocations (objects, arrays)	Manual or GC-managed	Until freed or GC'd	Leaks, fragmentation, GC pauses
Code/Text	Compiled code, instructions	At load time	Process lifetime	Rarely a problem
Static/Global	Global variables, constants	At load time	Process lifetime	Large static allocations
Thread-Local	Per-thread storage	Per-thread allocation	Thread lifetime	Memory growth with thread count

The Heap: Where Most Memory Lives

In most applications, the heap dominates memory usage. It's where objects, data structures, caches, and buffers live. Heap management is the primary focus of memory optimization:

Managed languages (Java, Python, Go, JavaScript): Heap is managed by garbage collector. GC automatically frees unused memory, but introduces pauses.
Systems languages (C, C++, Rust): Heap is manually managed. Developer explicitly allocates and frees memory, risking leaks or use-after-free bugs.

Most memory bottlenecks occur in the heap—too much allocated, too little available, or too much time spent managing it.

memory_usage_breakdown.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#!/bin/bash
# =====================================================
# UNDERSTANDING MEMORY USAGE ON LINUX
# =====================================================
 
# Process memory breakdown
cat /proc/<pid>/status | grep -E "Vm|Rss"
 
# VmPeak: Maximum virtual memory used
# VmSize: Current virtual memory size
# VmRSS:  Resident Set Size - actual RAM used
# VmData: Heap size
# VmStk:  Stack size
 
# More detailed: pmap
pmap -x <pid> | head -50
 
# Example output:
# Address           Kbytes     RSS   Dirty Mode  Mapping
# 0000000000400000    2048    1024       0 r-x-- java
# 00000000006f4000     128     128     128 rw--- java  (data segment)
# 00007f0000000000  262144  200000  200000 rw---   [heap]
# 
# The [heap] line shows your dynamic memory
 
 
# =====================================================
# JVM-SPECIFIC MEMORY ANALYSIS
# =====================================================
 
# JVM native memory breakdown
jcmd <pid> VM.native_memory summary
 
# Output shows:
# - Java Heap: where objects live
# - Class: loaded class metadata
# - Thread: per-thread stack memory
# - Code: JIT compiled code
# - GC: garbage collector data structures
# - Internal: JVM internal data
# - Symbol: interned strings
 
 
# =====================================================
# KEY METRICS TO MONITOR
# =====================================================
 
# 1. RSS (Resident Set Size): Actual RAM used
#    - If approaching system memory: risk of OOM
 
# 2. Heap usage vs heap size: 
#    - High usage (>80%): frequent GC, risk of OOM
#    - Low usage (<30%): heap oversized, wasting RAM
 
# 3. GC time percentage:
#    - >5% is concerning
#    - >10% is a serious problem
 
# 4. Memory growth over time:
#    - Constant growth without plateau = memory leak

Garbage Collection Impact on Performance

Garbage collection (GC) is the automatic memory management used by most modern languages. While it frees developers from manual memory management, it introduces performance overhead that must be understood and managed.

How Garbage Collection Works (Simplified):

Mark Phase: GC traverses object graph starting from "roots" (stack, static references). Reachable objects are marked as alive.
Sweep Phase: Unmarked objects are garbage—their memory is reclaimed.
Compact Phase (some GCs): Live objects are moved together to reduce fragmentation.

The GC Pause Problem:

During GC, application threads may be paused ("stop-the-world"). A 100ms GC pause means 100ms where no user requests are processed. For latency-sensitive services (gaming, trading, real-time), GC pauses are unacceptable.

Garbage Collector Comparison (JVM Focus)
Collector	Pause Characteristics	Throughput	Best For	Heap Size
G1 (default)	10-200ms pauses	High	General purpose, balanced	4GB-64GB
ZGC	< 10ms pauses (usually < 1ms)	Slightly lower	Latency-sensitive	Any, including TB scale
Shenandoah	< 10ms pauses	Moderate	Latency-sensitive (OpenJDK)	Medium to large
Parallel GC	Can be 1s+ for large heaps	Highest	Throughput, offline processing	Medium
Serial GC	Long pauses	Low	Small heaps, client apps	< 100MB

GC Tuning Principles:

GC Tuning Guidelines

•Right-size the heap: Too small = constant GC. Too large = infrequent but very long pauses. Start with 2-4x live data set.
•Reduce allocation rate: Every allocation eventually requires GC. Reduce object creation in hot paths. Reuse objects where practical.
•Choose appropriate collector: Latency-sensitive services need ZGC/Shenandoah. Batch processing can use Parallel GC for throughput.
•Tune for your workload: GC configuration should match your application's characteristics. Monitor before and after tuning.
•Avoid tenuring short-lived objects: Objects that survive young generation GC are promoted to old gen, where collection is more expensive.

gc_analysis.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#!/bin/bash
# =====================================================
# JVM GC ANALYSIS
# =====================================================
 
# Enable GC logging (Java 11+)
java -Xlog:gc*:file=gc.log:time,uptimemillis:filecount=5,filesize=100m -jar app.jar
 
# Key patterns to look for in GC logs:
# 
# [0.234s] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 23M->8M(256M) 4.234ms
#          ^^^^^^                      ^^^^^^^^^^^^^^^^^^    ^^^^^^^^^^  ^^^^^^^^
#          GC ID                       Reason                Heap change Pause time
#
# Pause time > 100ms: Investigate
# Frequent GC (every few seconds): Allocation rate too high
# Heap not shrinking after GC: Possible leak or right-sized
 
# Analyze GC logs with tools:
# - GCViewer: visual analysis
# - GCEasy: online analysis
# - JClarity Censum: detailed reports
 
 
# =====================================================
# DETECTING GC IMPACT ON LATENCY
# =====================================================
 
# Pattern: Request latency spikes correlate with GC pauses
#
# Application logs:      [10:45:32.100] Request completed in 5ms
#                        [10:45:32.200] Request completed in 312ms  <- spike
#                        [10:45:32.500] Request completed in 4ms
#
# GC logs:               [10:45:32.100] GC(42) Pause Full 280.5ms
#
# Correlation: The 312ms request included a 280ms GC pause
 
# JVM: Enable pause-time logging
# -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
 
# Application: Log with nanosecond timestamps
# Compare slow request timestamps with GC pause timestamps
 
 
# =====================================================
# REAL-TIME GC MONITORING
# =====================================================
 
# jstat: GC statistics
jstat -gc <pid> 1000  # Every 1 second
 
# Columns to watch:
# EC - Eden capacity
# EU - Eden used
# OC - Old gen capacity
# OU - Old gen used
# GCT - Total GC time
# FGCT - Full GC time
 
# If FGCT grows continuously: Full GCs happening (bad)
# If EU oscillates near EC: Young gen might be undersized
 
 
# =====================================================
# GC TUNING FOR LOW LATENCY
# =====================================================
 
# Switch to ZGC (Java 15+)
java -XX:+UseZGC -Xmx8g -jar app.jar
 
# ZGC characteristics:
# - Pause times < 10ms regardless of heap size
# - Concurrent marking, relocation, and reference processing
# - Slight throughput overhead (~5-15%)
 
# Or Shenandoah (OpenJDK)
java -XX:+UseShenandoahGC -Xmx8g -jar app.jar

The GC Tuning Trap

Don't start with GC tuning. In most cases, excessive allocation in application code is the root cause. Profile allocation hotspots first. Reducing allocations is almost always more effective than tuning the GC to handle more garbage.

Memory Leaks — The Slow Death

A memory leak occurs when memory is allocated but never freed, causing memory usage to grow unboundedly over time. In garbage-collected languages, leaks typically occur when objects remain reachable (referenced) even though they're no longer needed.

How Memory Leaks Manifest:

•Gradual memory growth: RSS increases over hours/days, never decreasing
•Increasing GC frequency: As heap fills, GC runs more often
•Growing GC pause times: More live objects = more to scan
•OOM crashes: Eventually heap is exhausted
•Process restarts: OOM killer terminates process

Common Causes of Memory Leaks:

Memory Leak Patterns and Causes
Pattern	Description	Example	Detection
Uncleared collections	Objects added to collections, never removed	Global cache without eviction	Collection size grows unboundedly
Event listener leaks	Listeners registered, never unregistered	DOM event handlers in SPAs	Object count grows per interaction
Closure captures	Closures retain references to outer scope	Lambda capturing large context	Heap shows unexpected retained objects
Thread-local leaks	Thread-local data not cleaned up	ThreadLocal in thread pools	Per-thread memory grows
Native resource leaks	Native resources (DB connections, files) not closed	Unclosed streams, handles	ulimit hit, file descriptors exhausted
ClassLoader leaks	ClassLoaders retained, preventing class unloading	Repeated hot deployments	Metaspace/PermGen grows

finding_memory_leaks.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# =====================================================
# COMMON MEMORY LEAK PATTERNS
# =====================================================
 
# PATTERN 1: Unbounded cache without eviction
class LeakingCache:
    def __init__(self):
        self._cache = {}  # Grows forever!
    
    def get(self, key, compute_fn):
        if key not in self._cache:
            self._cache[key] = compute_fn()  # Never evicted
        return self._cache[key]
 
# FIX: Use LRU cache with size limit
from functools import lru_cache
 
@lru_cache(maxsize=1000)  # Max 1000 entries
def compute_cached(key):
    return expensive_computation(key)
 
 
# PATTERN 2: Global registries that aren't cleared
class EventBus:
    _listeners = []  # Class-level list - never cleared!
    
    @classmethod
    def register(cls, listener):
        cls._listeners.append(listener)  # Listener never removed
    
    # Missing: unregister method
 
# FIX: Weak references for optional retention
import weakref
 
class SafeEventBus:
    _listeners = []
    
    @classmethod
    def register(cls, listener):
        # Weak reference - GC can collect if no other refs
        cls._listeners.append(weakref.ref(listener))
    
    @classmethod
    def emit(cls, event):
        # Clean up dead references during emit
        cls._listeners = [l for l in cls._listeners if l() is not None]
        for listener_ref in cls._listeners:
            listener = listener_ref()
            if listener:
                listener.handle(event)
 
 
# PATTERN 3: Closure capturing large context
def create_handlers_bad(large_objects):
    handlers = []
    for obj in large_objects:
        # Each lambda captures entire 'obj' - prevents GC
        handlers.append(lambda: print(obj.id))
    return handlers
 
# FIX: Capture only needed data
def create_handlers_good(large_objects):
    handlers = []
    for obj in large_objects:
        obj_id = obj.id  # Extract only needed value
        handlers.append(lambda id=obj_id: print(id))
    return handlers
 
 
# =====================================================
# DETECTING MEMORY LEAKS IN PYTHON
# =====================================================
 
import tracemalloc
import gc
 
# Enable tracing
tracemalloc.start()
 
# ... run application ...
 
# Take snapshots and compare
snapshot1 = tracemalloc.take_snapshot()
 
# ... more application work ...
 
snapshot2 = tracemalloc.take_snapshot()
 
# Compare snapshots - shows what grew
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("[ Top memory increases ]")
for stat in top_stats[:10]:
    print(stat)
 
# Force GC and check unreachable objects
gc.collect()
print(f"Unreachable objects: {len(gc.garbage)}")

java_leak_detection.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#!/bin/bash
# =====================================================
# JAVA/JVM MEMORY LEAK DETECTION
# =====================================================
 
# Step 1: Take heap dump when memory is high
jmap -dump:format=b,file=heap.hprof <pid>
 
# Or enable automatic dump on OOM
java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps -jar app.jar
 
 
# Step 2: Analyze heap dump
# Tools: Eclipse MAT (Memory Analyzer Tool), VisualVM, JProfiler
 
# In Eclipse MAT:
# 1. Open heap dump
# 2. Run "Leak Suspects" report - automated detection
# 3. Check "Dominator Tree" - who retains most memory
# 4. Look for unexpected object counts
#
# Common findings:
# - HashMap with 10 million entries (should be 1000)
# - String[] taking 2GB (log accumulator?)
# - Many instances of same class (object pool not releasing)
 
 
# =====================================================
# LIVE OBJECT MONITORING
# =====================================================
 
# Class histogram - live object counts by class
jcmd <pid> GC.class_histogram | head -30
 
# Output:
#  num     #instances         #bytes  class name
# -----------------------------------------------
#    1:       4234567      169382680  java.lang.String
#    2:       1234567       39506144  java.util.HashMap$Node
#    3:        567890       27258720  [C  (char arrays)
 
# If #instances keeps growing: potential leak in that class
 
 
# =====================================================
# COMPARING HEAP DUMPS OVER TIME
# =====================================================
 
# Take dumps at different times during memory growth
jmap -dump:format=b,file=heap_t1.hprof <pid>
# Wait...
jmap -dump:format=b,file=heap_t2.hprof <pid>
 
# Compare in MAT:
# File -> Compare Heap Dumps
# Shows objects that grew between snapshots
# Focus investigation on growing classes

The Long-Running Test

Memory leaks often take hours or days to become apparent. Run load tests for extended periods (24+ hours) with memory monitoring. A healthy application's memory should plateau; a leaking one shows continuous growth.

Memory Sizing Strategies — Right-Sizing for Performance

Incorrect memory sizing is a subtle performance killer. Too little memory causes thrashing (constant GC or swapping). Too much wastes resources and can actually hurt performance (larger GC scans, less efficient CPU cache usage).

Memory Sizing Principles:

Right-Sizing Guidelines

•Know your working set: Measure actual live data at peak load. Heap should be 2-4x working set size for healthy GC.
•Account for overhead: JVM, runtime, GC structures take memory beyond your data. Factor in ~20-30% overhead.
•Leave room for OS: Container should have 10-20% more RAM than heap for OS buffers, native allocations.
•Test at production scale: Memory behavior changes with data volume. Test with production-sized datasets.
•Monitor after deployment: Actual workload may differ from estimates. Adjust based on production metrics.

memory_sizing_examples.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# =====================================================
# KUBERNETES MEMORY CONFIGURATION
# =====================================================
 
# Common mistake: Set JVM heap = container limit
# Result: JVM uses native memory beyond heap, OOMKilled
 
# WRONG:
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    resources:
      limits:
        memory: "4Gi"  # Container limit
    env:
    - name: JAVA_OPTS
      value: "-Xmx4g"  # Heap = container limit = OOMKill!
 
 
# CORRECT: Leave headroom for non-heap memory
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    resources:
      requests:
        memory: "4Gi"   # Guaranteed memory
      limits:
        memory: "4Gi"   # Max memory (set equal for predictability)
    env:
    - name: JAVA_OPTS
      # Heap = 75% of container, leaves room for:
      # - Metaspace (~100-300MB)
      # - Native memory (thread stacks, JNI)
      # - Code cache
      # - GC overhead
      value: "-Xmx3g -Xms3g -XX:MaxMetaspaceSize=256m"
 
 
# =====================================================
# MEMORY SIZING CHECKLIST
# =====================================================
 
# For a JVM application:
# 1. Container memory = Heap + Metaspace + Native + Buffer
#    Example: 3GB heap + 256MB meta + 512MB native + 256MB buffer = 4GB
#
# 2. Set -Xms = -Xmx for predictable behavior
#    Avoids heap resizing overhead
#
# 3. Monitor these metrics:
#    - jvm_memory_used_bytes{area="heap"}
#    - jvm_memory_used_bytes{area="nonheap"}
#    - container_memory_working_set_bytes
#    - container_memory_rss
#
# 4. Alert if:
#    - Heap usage > 80% sustained
#    - Container memory approaching limit
#    - OOMKill events
 
 
# =====================================================
# NON-JVM APPLICATIONS
# =====================================================
 
# Python/Ruby/Node: No fixed heap, memory grows as needed
# Key: Monitor RSS and set memory limits to prevent runaway
 
# Python example (gunicorn):
# - Each worker uses ~100-500MB baseline
# - 8 workers = 800MB - 4GB
# - Set container limit with headroom
 
# Node.js:
# - Default heap limit ~1.5GB (varies by version)
# - Override with --max-old-space-size=4096 (MB)
# - Container should be larger than max heap

Cache Sizing Strategy:

Caches consume significant memory. Sizing caches correctly is crucial—too small defeats the purpose, too large wastes RAM and causes GC pressure.

cache_sizing.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# =====================================================
# CACHE SIZING STRATEGY
# =====================================================
 
# Key question: What's your target hit rate?
# 
# Zipf's Law: Most accesses hit a small set of items
# - 20% of items often account for 80%+ of accesses
# - Caching just the "hot" items provides most benefit
 
# Example: 1 million unique items, want 90% hit rate
# If access follows Zipf distribution:
# - Caching top 10% (100K items) might give 70% hit rate
# - Caching top 30% (300K items) might give 90% hit rate
 
from functools import lru_cache
import sys
 
# Measure cache entry size
sample_entry = {"id": 12345, "name": "Sample Product", "price": 99.99}
entry_size = sys.getsizeof(sample_entry)  # Approximate
 
# Calculate max entries for target memory
target_memory_mb = 512
max_entries = (target_memory_mb * 1024 * 1024) // entry_size
print(f"Max cache entries for {target_memory_mb}MB: {max_entries}")
 
 
# =====================================================
# MONITORING CACHE EFFECTIVENESS
# =====================================================
 
class MonitoredCache:
    def __init__(self, maxsize: int):
        self._cache = {}
        self._maxsize = maxsize
        self._hits = 0
        self._misses = 0
    
    def get(self, key):
        if key in self._cache:
            self._hits += 1
            return self._cache[key]
        self._misses += 1
        return None
    
    def put(self, key, value):
        if len(self._cache) >= self._maxsize:
            # Evict oldest (simplified - use proper LRU)
            oldest = next(iter(self._cache))
            del self._cache[oldest]
        self._cache[key] = value
    
    @property
    def hit_rate(self):
        total = self._hits + self._misses
        return self._hits / total if total > 0 else 0
    
    @property
    def size(self):
        return len(self._cache)
 
# Regular monitoring:
# - If hit_rate < 50%: cache too small or wrong data
# - If hit_rate < 20% and cache full: reconsider caching strategy
# - If hit_rate > 95% with room left: might be oversized
 
 
# =====================================================
# CACHE SIZING FORMULA
# =====================================================
 
def calculate_cache_size(
    unique_items: int,
    target_hit_rate: float,
    access_pattern: str = "zipf"  # or "uniform"
) -> int:
    """
    Estimate cache size needed for target hit rate.
    Assumes Zipf distribution by default (typical for web).
    """
    if access_pattern == "uniform":
        # Uniform access: need to cache hit_rate % of items
        return int(unique_items * target_hit_rate)
    else:
        # Zipf distribution: power law relationship
        # This is an approximation - real data varies
        zipf_exponent = 1.0
        # For 90% hit rate, typically need ~30% of items
        # For 80% hit rate, typically need ~15% of items
        # For 99% hit rate, might need ~50-70% of items
        size_multiplier = target_hit_rate ** (1 / zipf_exponent)
        return int(unique_items * size_multiplier * 0.5)
 
# Example:
# 1 million products, 90% hit rate target
needed_size = calculate_cache_size(1_000_000, 0.9)
# Result: ~300,000 entries (rough estimate)

Diagnosing Memory Bottlenecks — A Systematic Approach

When you suspect memory issues, follow this systematic diagnostic approach:

Step 1: Observe Memory Metrics

memory_diagnostics.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#!/bin/bash
# =====================================================
# STEP 1: CURRENT MEMORY STATE
# =====================================================
 
# System-wide memory
free -h
# total        used        free      shared  buff/cache   available
# Mem:          16Gi        8Gi       2Gi       100Mi        5Gi        7Gi
# 
# Key: available (not free) - includes reclaimable cache
 
# Process-specific
ps aux --sort=-%mem | head -10  # Top memory consumers
 
# Detailed process memory
cat /proc/<pid>/status | grep -E "VmRSS|VmSize|VmPeak"
 
 
# =====================================================
# STEP 2: MEMORY OVER TIME
# =====================================================
 
# Watch memory growth
watch -n 1 'ps -p <pid> -o rss,vsz,pmem'
 
# Or plot over time (requires logging)
while true; do
    echo "$(date +%s),$(ps -p <pid> -o rss=)"
    sleep 60
done > memory_log.csv
 
 
# =====================================================
# STEP 3: CHECK FOR OOM HISTORY
# =====================================================
 
# Kernel OOM killer logs
dmesg | grep -i "oom|killed"
 
# Kubernetes OOMKilled events
kubectl describe pod <pod-name> | grep OOM
 
 
# =====================================================
# STEP 4: GC ANALYSIS (IF APPLICABLE)
# =====================================================
 
# JVM: GC activity
jstat -gc <pid> 1000 5  # 5 samples, 1 second apart
 
# Check for:
# - FGC (Full GC) count increasing
# - FGCT (Full GC Time) growing
# - OU (Old gen Used) near OC (Old gen Capacity)
 
 
# =====================================================
# STEP 5: HEAP DUMP FOR DETAILED ANALYSIS
# =====================================================
 
# If memory is high and you need to understand why:
jmap -dump:format=b,file=heap.hprof <pid>
 
# Or for Python:
# guppy3 / tracemalloc / memory_profiler
 
# Analyze with appropriate tool for your stack

Memory Bottleneck Diagnostic Patterns
Symptom	Likely Cause	Investigation	Solution
RSS grows continuously	Memory leak	Compare heap dumps	Find and fix leak source
Frequent GC, long pauses	Heap too small for workload	Check heap utilization	Increase heap or reduce allocation
High RSS, low heap usage	Native memory leak	Check native memory tracking	Fix native resource handling
OOMKilled despite small heap	Container limit too low	Compare heap + non-heap to limit	Increase container memory
Latency spikes correlate with GC	Stop-the-world GC pauses	Check GC logs for pause times	Tune GC or use low-latency GC
Memory spike overnight	Batch job allocations	Profile batch job memory	Stream processing, chunk data

The Temporal Dimension

Memory issues are often temporal. They may only appear under sustained load, after many hours of operation, or during specific business processes (end-of-month reports, etc.). Monitoring must be continuous and historical analysis is essential for diagnosis.

Memory Optimization Strategies

When memory is a constraint, these strategies help reduce usage without sacrificing functionality:

Memory Optimization Techniques

•Object pooling: Reuse objects instead of allocating new ones. Critical for high-frequency allocations (database connections, byte buffers).
•Streaming vs loading: Process data as a stream instead of loading entirely into memory. Essential for large files, bulk operations.
•Lazy initialization: Don't allocate until needed. Defer expensive initializations to first use.
•Right-sized data structures: Use appropriate types. ArrayList for indexed access, LinkedList for frequent insertions. Specialized collections for primitives.
•Immutable sharing: Immutable objects can be safely shared without copying. String interning, flyweight pattern.
•Off-heap storage: For large datasets, consider off-heap memory (DirectByteBuffer, memory-mapped files) to avoid GC pressure.
•Compression in memory: For reference data that's rarely accessed, consider compressed storage (reduced footprint, decompression cost).

memory_optimization_patterns.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# =====================================================
# STREAMING INSTEAD OF LOADING
# =====================================================
 
# ANTI-PATTERN: Load entire file into memory
def process_log_bad(filepath):
    with open(filepath) as f:
        lines = f.readlines()  # Loads entire file!
    
    for line in lines:
        process(line)
    
    # 10GB log file = 10GB+ memory usage
 
# PATTERN: Stream line by line
def process_log_good(filepath):
    with open(filepath) as f:
        for line in f:  # Streams, constant memory
            process(line)
    
    # 10GB log file = ~1MB memory usage
 
 
# =====================================================
# OBJECT POOLING
# =====================================================
 
from queue import Queue
from contextlib import contextmanager
 
class ObjectPool:
    """Reuse expensive objects instead of creating new ones."""
    
    def __init__(self, factory, size=10):
        self._factory = factory
        self._pool = Queue()
        for _ in range(size):
            self._pool.put(factory())
    
    @contextmanager
    def acquire(self):
        obj = self._pool.get()
        try:
            yield obj
        finally:
            self._pool.put(obj)  # Return to pool
 
# Usage:
buffer_pool = ObjectPool(lambda: bytearray(1024 * 1024), size=10)
 
def process_data(data):
    with buffer_pool.acquire() as buffer:
        # Reuse buffer instead of allocating new each time
        buffer[:len(data)] = data
        process(buffer)
 
 
# =====================================================
# GENERATORS FOR LAZY EVALUATION
# =====================================================
 
# ANTI-PATTERN: Materialize entire result
def get_all_users_bad():
    users = []
    for row in db.query("SELECT * FROM users"):
        users.append(User(**row))
    return users  # 1 million users = 1 million objects in memory
 
# PATTERN: Generator - one at a time
def get_all_users_good():
    for row in db.query("SELECT * FROM users"):
        yield User(**row)  # Yields one at a time
 
# Consumer still uses same code:
for user in get_all_users_good():
    process(user)
# But only one User object in memory at a time
 
 
# =====================================================
# PRIMITIVE COLLECTIONS (Java-specific, but concept applies)
# =====================================================
 
# Java: ArrayList<Integer> vs IntArrayList (primitive)
# ArrayList<Integer>: Each int wrapped in Integer object (~16 bytes overhead)
# IntArrayList (e.g., Eclipse Collections): Raw ints (~4 bytes each)
# 
# 1 million integers:
# - ArrayList<Integer>: ~20MB
# - IntArrayList: ~4MB
 
# Python equivalent: array module vs list
import array
 
numbers_list = [1.0] * 1_000_000        # ~8MB per float + object overhead
numbers_array = array.array('d', [1.0] * 1_000_000)  # ~8MB total
 
# NumPy arrays even more efficient:
import numpy as np
numbers_numpy = np.ones(1_000_000, dtype=np.float64)  # ~8MB, highly optimized

The Best Optimization

The most effective memory optimization is often using less data. Do you need all 50 columns? Can you paginate results? Can you aggregate server-side? Before optimizing how you store data in memory, question whether you need all that data in memory.

Summary: Mastering Memory Constraints

Memory bottlenecks are silent killers—hard to detect, expensive to diagnose, and capable of causing cascading failures. Understanding memory management is essential for any engineer working on production systems.

Key Takeaways

•Heap dominates memory usage: Most memory issues are heap-related—too much allocation, poor GC behavior, or memory leaks.
•GC pauses cause latency spikes: Garbage collection introduces latency. Low-latency systems should use modern collectors (ZGC, Shenandoah) and minimize allocation.
•Memory leaks are gradual: They take hours or days to manifest. Look for continuous growth without plateau.
•Right-size your memory: Heap should be 2-4x working set. Container should have 20%+ headroom above heap for non-heap usage.
•Leave room for the runtime: JVM non-heap, native allocations, OS buffers all need memory. Don't set heap = container limit.
•Stream large datasets: Never load unboundedly into memory. Use generators, iterators, and streaming APIs.
•Monitor continuously: Memory issues are temporal. Historical data and alerting are essential for diagnosis.

What's Next:

With memory constraints covered, we'll examine disk I/O limitations in the next page. You'll learn about storage performance characteristics, I/O patterns that kill performance, and strategies for optimizing disk-bound workloads.

Page Complete

You now understand how memory affects application performance, can diagnose memory issues systematically, and know strategies for optimizing memory usage. The silent killer is no longer silent—you can measure it, understand it, and control it.

4 / 5

Loading learning content...

System Design (HLD)Identifying Bottlenecks

Identifying Performance Bottlenecks

LevelAdvanced

Duration90 mins

TopicIdentifying Bottlenecks

4 / 5

Memory Constraints: The Silent Performance Killer

Memory: The Resource Everyone Forgets Until It's Gone

Memory bottlenecks are insidious. Unlike CPU saturation (visible in top) or disk I/O (visible in iostat), memory issues often manifest indirectly as seemingly unrelated problems:

Sudden latency spikes with no apparent cause (GC pauses)
Gradual performance degradation over hours or days (memory leaks)
Sporadic failures under load (heap exhaustion)
High CPU with low useful work (constant garbage collection)
Mysterious process restarts (OOM killer)

Memory is the silent resource—taken for granted when available, catastrophic when exhausted. Understanding memory constraints is essential for any engineer working on production systems.

What You Will Learn

Understanding Application Memory

Before diagnosing memory problems, we need to understand how applications use memory. Different languages and runtimes have different memory models, but common patterns emerge:

Memory Regions in a Typical Application:

Memory Regions and Their Characteristics
Region	Contains	Allocation	Lifecycle	Common Issues
Stack	Function call frames, local variables	Automatic (LIFO)	Function scope	Stack overflow from deep recursion
Heap	Dynamic allocations (objects, arrays)	Manual or GC-managed	Until freed or GC'd	Leaks, fragmentation, GC pauses
Code/Text	Compiled code, instructions	At load time	Process lifetime	Rarely a problem
Static/Global	Global variables, constants	At load time	Process lifetime	Large static allocations
Thread-Local	Per-thread storage	Per-thread allocation	Thread lifetime	Memory growth with thread count

The Heap: Where Most Memory Lives

In most applications, the heap dominates memory usage. It's where objects, data structures, caches, and buffers live. Heap management is the primary focus of memory optimization:

Managed languages (Java, Python, Go, JavaScript): Heap is managed by garbage collector. GC automatically frees unused memory, but introduces pauses.
Systems languages (C, C++, Rust): Heap is manually managed. Developer explicitly allocates and frees memory, risking leaks or use-after-free bugs.

Most memory bottlenecks occur in the heap—too much allocated, too little available, or too much time spent managing it.

memory_usage_breakdown.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#!/bin/bash
# =====================================================
# UNDERSTANDING MEMORY USAGE ON LINUX
# =====================================================
 
# Process memory breakdown
cat /proc/<pid>/status | grep -E "Vm|Rss"
 
# VmPeak: Maximum virtual memory used
# VmSize: Current virtual memory size
# VmRSS:  Resident Set Size - actual RAM used
# VmData: Heap size
# VmStk:  Stack size
 
# More detailed: pmap
pmap -x <pid> | head -50
 
# Example output:
# Address           Kbytes     RSS   Dirty Mode  Mapping
# 0000000000400000    2048    1024       0 r-x-- java
# 00000000006f4000     128     128     128 rw--- java  (data segment)
# 00007f0000000000  262144  200000  200000 rw---   [heap]
# 
# The [heap] line shows your dynamic memory
 
 
# =====================================================
# JVM-SPECIFIC MEMORY ANALYSIS
# =====================================================
 
# JVM native memory breakdown
jcmd <pid> VM.native_memory summary
 
# Output shows:
# - Java Heap: where objects live
# - Class: loaded class metadata
# - Thread: per-thread stack memory
# - Code: JIT compiled code
# - GC: garbage collector data structures
# - Internal: JVM internal data
# - Symbol: interned strings
 
 
# =====================================================
# KEY METRICS TO MONITOR
# =====================================================
 
# 1. RSS (Resident Set Size): Actual RAM used
#    - If approaching system memory: risk of OOM
 
# 2. Heap usage vs heap size: 
#    - High usage (>80%): frequent GC, risk of OOM
#    - Low usage (<30%): heap oversized, wasting RAM
 
# 3. GC time percentage:
#    - >5% is concerning
#    - >10% is a serious problem
 
# 4. Memory growth over time:
#    - Constant growth without plateau = memory leak

Garbage Collection Impact on Performance

How Garbage Collection Works (Simplified):

Mark Phase: GC traverses object graph starting from "roots" (stack, static references). Reachable objects are marked as alive.
Sweep Phase: Unmarked objects are garbage—their memory is reclaimed.
Compact Phase (some GCs): Live objects are moved together to reduce fragmentation.

The GC Pause Problem:

Garbage Collector Comparison (JVM Focus)
Collector	Pause Characteristics	Throughput	Best For	Heap Size
G1 (default)	10-200ms pauses	High	General purpose, balanced	4GB-64GB
ZGC	< 10ms pauses (usually < 1ms)	Slightly lower	Latency-sensitive	Any, including TB scale
Shenandoah	< 10ms pauses	Moderate	Latency-sensitive (OpenJDK)	Medium to large
Parallel GC	Can be 1s+ for large heaps	Highest	Throughput, offline processing	Medium
Serial GC	Long pauses	Low	Small heaps, client apps	< 100MB

GC Tuning Principles:

GC Tuning Guidelines

•Right-size the heap: Too small = constant GC. Too large = infrequent but very long pauses. Start with 2-4x live data set.
•Reduce allocation rate: Every allocation eventually requires GC. Reduce object creation in hot paths. Reuse objects where practical.
•Choose appropriate collector: Latency-sensitive services need ZGC/Shenandoah. Batch processing can use Parallel GC for throughput.
•Tune for your workload: GC configuration should match your application's characteristics. Monitor before and after tuning.
•Avoid tenuring short-lived objects: Objects that survive young generation GC are promoted to old gen, where collection is more expensive.

gc_analysis.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#!/bin/bash
# =====================================================
# JVM GC ANALYSIS
# =====================================================
 
# Enable GC logging (Java 11+)
java -Xlog:gc*:file=gc.log:time,uptimemillis:filecount=5,filesize=100m -jar app.jar
 
# Key patterns to look for in GC logs:
# 
# [0.234s] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 23M->8M(256M) 4.234ms
#          ^^^^^^                      ^^^^^^^^^^^^^^^^^^    ^^^^^^^^^^  ^^^^^^^^
#          GC ID                       Reason                Heap change Pause time
#
# Pause time > 100ms: Investigate
# Frequent GC (every few seconds): Allocation rate too high
# Heap not shrinking after GC: Possible leak or right-sized
 
# Analyze GC logs with tools:
# - GCViewer: visual analysis
# - GCEasy: online analysis
# - JClarity Censum: detailed reports
 
 
# =====================================================
# DETECTING GC IMPACT ON LATENCY
# =====================================================
 
# Pattern: Request latency spikes correlate with GC pauses
#
# Application logs:      [10:45:32.100] Request completed in 5ms
#                        [10:45:32.200] Request completed in 312ms  <- spike
#                        [10:45:32.500] Request completed in 4ms
#
# GC logs:               [10:45:32.100] GC(42) Pause Full 280.5ms
#
# Correlation: The 312ms request included a 280ms GC pause
 
# JVM: Enable pause-time logging
# -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
 
# Application: Log with nanosecond timestamps
# Compare slow request timestamps with GC pause timestamps
 
 
# =====================================================
# REAL-TIME GC MONITORING
# =====================================================
 
# jstat: GC statistics
jstat -gc <pid> 1000  # Every 1 second
 
# Columns to watch:
# EC - Eden capacity
# EU - Eden used
# OC - Old gen capacity
# OU - Old gen used
# GCT - Total GC time
# FGCT - Full GC time
 
# If FGCT grows continuously: Full GCs happening (bad)
# If EU oscillates near EC: Young gen might be undersized
 
 
# =====================================================
# GC TUNING FOR LOW LATENCY
# =====================================================
 
# Switch to ZGC (Java 15+)
java -XX:+UseZGC -Xmx8g -jar app.jar
 
# ZGC characteristics:
# - Pause times < 10ms regardless of heap size
# - Concurrent marking, relocation, and reference processing
# - Slight throughput overhead (~5-15%)
 
# Or Shenandoah (OpenJDK)
java -XX:+UseShenandoahGC -Xmx8g -jar app.jar

The GC Tuning Trap

Memory Leaks — The Slow Death

How Memory Leaks Manifest:

•Gradual memory growth: RSS increases over hours/days, never decreasing
•Increasing GC frequency: As heap fills, GC runs more often
•Growing GC pause times: More live objects = more to scan
•OOM crashes: Eventually heap is exhausted
•Process restarts: OOM killer terminates process

Common Causes of Memory Leaks:

Memory Leak Patterns and Causes
Pattern	Description	Example	Detection
Uncleared collections	Objects added to collections, never removed	Global cache without eviction	Collection size grows unboundedly
Event listener leaks	Listeners registered, never unregistered	DOM event handlers in SPAs	Object count grows per interaction
Closure captures	Closures retain references to outer scope	Lambda capturing large context	Heap shows unexpected retained objects
Thread-local leaks	Thread-local data not cleaned up	ThreadLocal in thread pools	Per-thread memory grows
Native resource leaks	Native resources (DB connections, files) not closed	Unclosed streams, handles	ulimit hit, file descriptors exhausted
ClassLoader leaks	ClassLoaders retained, preventing class unloading	Repeated hot deployments	Metaspace/PermGen grows

finding_memory_leaks.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# =====================================================
# COMMON MEMORY LEAK PATTERNS
# =====================================================
 
# PATTERN 1: Unbounded cache without eviction
class LeakingCache:
    def __init__(self):
        self._cache = {}  # Grows forever!
    
    def get(self, key, compute_fn):
        if key not in self._cache:
            self._cache[key] = compute_fn()  # Never evicted
        return self._cache[key]
 
# FIX: Use LRU cache with size limit
from functools import lru_cache
 
@lru_cache(maxsize=1000)  # Max 1000 entries
def compute_cached(key):
    return expensive_computation(key)
 
 
# PATTERN 2: Global registries that aren't cleared
class EventBus:
    _listeners = []  # Class-level list - never cleared!
    
    @classmethod
    def register(cls, listener):
        cls._listeners.append(listener)  # Listener never removed
    
    # Missing: unregister method
 
# FIX: Weak references for optional retention
import weakref
 
class SafeEventBus:
    _listeners = []
    
    @classmethod
    def register(cls, listener):
        # Weak reference - GC can collect if no other refs
        cls._listeners.append(weakref.ref(listener))
    
    @classmethod
    def emit(cls, event):
        # Clean up dead references during emit
        cls._listeners = [l for l in cls._listeners if l() is not None]
        for listener_ref in cls._listeners:
            listener = listener_ref()
            if listener:
                listener.handle(event)
 
 
# PATTERN 3: Closure capturing large context
def create_handlers_bad(large_objects):
    handlers = []
    for obj in large_objects:
        # Each lambda captures entire 'obj' - prevents GC
        handlers.append(lambda: print(obj.id))
    return handlers
 
# FIX: Capture only needed data
def create_handlers_good(large_objects):
    handlers = []
    for obj in large_objects:
        obj_id = obj.id  # Extract only needed value
        handlers.append(lambda id=obj_id: print(id))
    return handlers
 
 
# =====================================================
# DETECTING MEMORY LEAKS IN PYTHON
# =====================================================
 
import tracemalloc
import gc
 
# Enable tracing
tracemalloc.start()
 
# ... run application ...
 
# Take snapshots and compare
snapshot1 = tracemalloc.take_snapshot()
 
# ... more application work ...
 
snapshot2 = tracemalloc.take_snapshot()
 
# Compare snapshots - shows what grew
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("[ Top memory increases ]")
for stat in top_stats[:10]:
    print(stat)
 
# Force GC and check unreachable objects
gc.collect()
print(f"Unreachable objects: {len(gc.garbage)}")

java_leak_detection.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#!/bin/bash
# =====================================================
# JAVA/JVM MEMORY LEAK DETECTION
# =====================================================
 
# Step 1: Take heap dump when memory is high
jmap -dump:format=b,file=heap.hprof <pid>
 
# Or enable automatic dump on OOM
java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps -jar app.jar
 
 
# Step 2: Analyze heap dump
# Tools: Eclipse MAT (Memory Analyzer Tool), VisualVM, JProfiler
 
# In Eclipse MAT:
# 1. Open heap dump
# 2. Run "Leak Suspects" report - automated detection
# 3. Check "Dominator Tree" - who retains most memory
# 4. Look for unexpected object counts
#
# Common findings:
# - HashMap with 10 million entries (should be 1000)
# - String[] taking 2GB (log accumulator?)
# - Many instances of same class (object pool not releasing)
 
 
# =====================================================
# LIVE OBJECT MONITORING
# =====================================================
 
# Class histogram - live object counts by class
jcmd <pid> GC.class_histogram | head -30
 
# Output:
#  num     #instances         #bytes  class name
# -----------------------------------------------
#    1:       4234567      169382680  java.lang.String
#    2:       1234567       39506144  java.util.HashMap$Node
#    3:        567890       27258720  [C  (char arrays)
 
# If #instances keeps growing: potential leak in that class
 
 
# =====================================================
# COMPARING HEAP DUMPS OVER TIME
# =====================================================
 
# Take dumps at different times during memory growth
jmap -dump:format=b,file=heap_t1.hprof <pid>
# Wait...
jmap -dump:format=b,file=heap_t2.hprof <pid>
 
# Compare in MAT:
# File -> Compare Heap Dumps
# Shows objects that grew between snapshots
# Focus investigation on growing classes

The Long-Running Test

Memory Sizing Strategies — Right-Sizing for Performance

Memory Sizing Principles:

Right-Sizing Guidelines

•Know your working set: Measure actual live data at peak load. Heap should be 2-4x working set size for healthy GC.
•Account for overhead: JVM, runtime, GC structures take memory beyond your data. Factor in ~20-30% overhead.
•Leave room for OS: Container should have 10-20% more RAM than heap for OS buffers, native allocations.
•Test at production scale: Memory behavior changes with data volume. Test with production-sized datasets.
•Monitor after deployment: Actual workload may differ from estimates. Adjust based on production metrics.

memory_sizing_examples.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# =====================================================
# KUBERNETES MEMORY CONFIGURATION
# =====================================================
 
# Common mistake: Set JVM heap = container limit
# Result: JVM uses native memory beyond heap, OOMKilled
 
# WRONG:
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    resources:
      limits:
        memory: "4Gi"  # Container limit
    env:
    - name: JAVA_OPTS
      value: "-Xmx4g"  # Heap = container limit = OOMKill!
 
 
# CORRECT: Leave headroom for non-heap memory
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    resources:
      requests:
        memory: "4Gi"   # Guaranteed memory
      limits:
        memory: "4Gi"   # Max memory (set equal for predictability)
    env:
    - name: JAVA_OPTS
      # Heap = 75% of container, leaves room for:
      # - Metaspace (~100-300MB)
      # - Native memory (thread stacks, JNI)
      # - Code cache
      # - GC overhead
      value: "-Xmx3g -Xms3g -XX:MaxMetaspaceSize=256m"
 
 
# =====================================================
# MEMORY SIZING CHECKLIST
# =====================================================
 
# For a JVM application:
# 1. Container memory = Heap + Metaspace + Native + Buffer
#    Example: 3GB heap + 256MB meta + 512MB native + 256MB buffer = 4GB
#
# 2. Set -Xms = -Xmx for predictable behavior
#    Avoids heap resizing overhead
#
# 3. Monitor these metrics:
#    - jvm_memory_used_bytes{area="heap"}
#    - jvm_memory_used_bytes{area="nonheap"}
#    - container_memory_working_set_bytes
#    - container_memory_rss
#
# 4. Alert if:
#    - Heap usage > 80% sustained
#    - Container memory approaching limit
#    - OOMKill events
 
 
# =====================================================
# NON-JVM APPLICATIONS
# =====================================================
 
# Python/Ruby/Node: No fixed heap, memory grows as needed
# Key: Monitor RSS and set memory limits to prevent runaway
 
# Python example (gunicorn):
# - Each worker uses ~100-500MB baseline
# - 8 workers = 800MB - 4GB
# - Set container limit with headroom
 
# Node.js:
# - Default heap limit ~1.5GB (varies by version)
# - Override with --max-old-space-size=4096 (MB)
# - Container should be larger than max heap

Cache Sizing Strategy:

Caches consume significant memory. Sizing caches correctly is crucial—too small defeats the purpose, too large wastes RAM and causes GC pressure.

cache_sizing.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# =====================================================
# CACHE SIZING STRATEGY
# =====================================================
 
# Key question: What's your target hit rate?
# 
# Zipf's Law: Most accesses hit a small set of items
# - 20% of items often account for 80%+ of accesses
# - Caching just the "hot" items provides most benefit
 
# Example: 1 million unique items, want 90% hit rate
# If access follows Zipf distribution:
# - Caching top 10% (100K items) might give 70% hit rate
# - Caching top 30% (300K items) might give 90% hit rate
 
from functools import lru_cache
import sys
 
# Measure cache entry size
sample_entry = {"id": 12345, "name": "Sample Product", "price": 99.99}
entry_size = sys.getsizeof(sample_entry)  # Approximate
 
# Calculate max entries for target memory
target_memory_mb = 512
max_entries = (target_memory_mb * 1024 * 1024) // entry_size
print(f"Max cache entries for {target_memory_mb}MB: {max_entries}")
 
 
# =====================================================
# MONITORING CACHE EFFECTIVENESS
# =====================================================
 
class MonitoredCache:
    def __init__(self, maxsize: int):
        self._cache = {}
        self._maxsize = maxsize
        self._hits = 0
        self._misses = 0
    
    def get(self, key):
        if key in self._cache:
            self._hits += 1
            return self._cache[key]
        self._misses += 1
        return None
    
    def put(self, key, value):
        if len(self._cache) >= self._maxsize:
            # Evict oldest (simplified - use proper LRU)
            oldest = next(iter(self._cache))
            del self._cache[oldest]
        self._cache[key] = value
    
    @property
    def hit_rate(self):
        total = self._hits + self._misses
        return self._hits / total if total > 0 else 0
    
    @property
    def size(self):
        return len(self._cache)
 
# Regular monitoring:
# - If hit_rate < 50%: cache too small or wrong data
# - If hit_rate < 20% and cache full: reconsider caching strategy
# - If hit_rate > 95% with room left: might be oversized
 
 
# =====================================================
# CACHE SIZING FORMULA
# =====================================================
 
def calculate_cache_size(
    unique_items: int,
    target_hit_rate: float,
    access_pattern: str = "zipf"  # or "uniform"
) -> int:
    """
    Estimate cache size needed for target hit rate.
    Assumes Zipf distribution by default (typical for web).
    """
    if access_pattern == "uniform":
        # Uniform access: need to cache hit_rate % of items
        return int(unique_items * target_hit_rate)
    else:
        # Zipf distribution: power law relationship
        # This is an approximation - real data varies
        zipf_exponent = 1.0
        # For 90% hit rate, typically need ~30% of items
        # For 80% hit rate, typically need ~15% of items
        # For 99% hit rate, might need ~50-70% of items
        size_multiplier = target_hit_rate ** (1 / zipf_exponent)
        return int(unique_items * size_multiplier * 0.5)
 
# Example:
# 1 million products, 90% hit rate target
needed_size = calculate_cache_size(1_000_000, 0.9)
# Result: ~300,000 entries (rough estimate)

Diagnosing Memory Bottlenecks — A Systematic Approach

When you suspect memory issues, follow this systematic diagnostic approach:

Step 1: Observe Memory Metrics

memory_diagnostics.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#!/bin/bash
# =====================================================
# STEP 1: CURRENT MEMORY STATE
# =====================================================
 
# System-wide memory
free -h
# total        used        free      shared  buff/cache   available
# Mem:          16Gi        8Gi       2Gi       100Mi        5Gi        7Gi
# 
# Key: available (not free) - includes reclaimable cache
 
# Process-specific
ps aux --sort=-%mem | head -10  # Top memory consumers
 
# Detailed process memory
cat /proc/<pid>/status | grep -E "VmRSS|VmSize|VmPeak"
 
 
# =====================================================
# STEP 2: MEMORY OVER TIME
# =====================================================
 
# Watch memory growth
watch -n 1 'ps -p <pid> -o rss,vsz,pmem'
 
# Or plot over time (requires logging)
while true; do
    echo "$(date +%s),$(ps -p <pid> -o rss=)"
    sleep 60
done > memory_log.csv
 
 
# =====================================================
# STEP 3: CHECK FOR OOM HISTORY
# =====================================================
 
# Kernel OOM killer logs
dmesg | grep -i "oom|killed"
 
# Kubernetes OOMKilled events
kubectl describe pod <pod-name> | grep OOM
 
 
# =====================================================
# STEP 4: GC ANALYSIS (IF APPLICABLE)
# =====================================================
 
# JVM: GC activity
jstat -gc <pid> 1000 5  # 5 samples, 1 second apart
 
# Check for:
# - FGC (Full GC) count increasing
# - FGCT (Full GC Time) growing
# - OU (Old gen Used) near OC (Old gen Capacity)
 
 
# =====================================================
# STEP 5: HEAP DUMP FOR DETAILED ANALYSIS
# =====================================================
 
# If memory is high and you need to understand why:
jmap -dump:format=b,file=heap.hprof <pid>
 
# Or for Python:
# guppy3 / tracemalloc / memory_profiler
 
# Analyze with appropriate tool for your stack

Memory Bottleneck Diagnostic Patterns
Symptom	Likely Cause	Investigation	Solution
RSS grows continuously	Memory leak	Compare heap dumps	Find and fix leak source
Frequent GC, long pauses	Heap too small for workload	Check heap utilization	Increase heap or reduce allocation
High RSS, low heap usage	Native memory leak	Check native memory tracking	Fix native resource handling
OOMKilled despite small heap	Container limit too low	Compare heap + non-heap to limit	Increase container memory
Latency spikes correlate with GC	Stop-the-world GC pauses	Check GC logs for pause times	Tune GC or use low-latency GC
Memory spike overnight	Batch job allocations	Profile batch job memory	Stream processing, chunk data

The Temporal Dimension

Memory Optimization Strategies

When memory is a constraint, these strategies help reduce usage without sacrificing functionality:

Memory Optimization Techniques

•Object pooling: Reuse objects instead of allocating new ones. Critical for high-frequency allocations (database connections, byte buffers).
•Streaming vs loading: Process data as a stream instead of loading entirely into memory. Essential for large files, bulk operations.
•Lazy initialization: Don't allocate until needed. Defer expensive initializations to first use.
•Right-sized data structures: Use appropriate types. ArrayList for indexed access, LinkedList for frequent insertions. Specialized collections for primitives.
•Immutable sharing: Immutable objects can be safely shared without copying. String interning, flyweight pattern.
•Off-heap storage: For large datasets, consider off-heap memory (DirectByteBuffer, memory-mapped files) to avoid GC pressure.
•Compression in memory: For reference data that's rarely accessed, consider compressed storage (reduced footprint, decompression cost).

memory_optimization_patterns.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# =====================================================
# STREAMING INSTEAD OF LOADING
# =====================================================
 
# ANTI-PATTERN: Load entire file into memory
def process_log_bad(filepath):
    with open(filepath) as f:
        lines = f.readlines()  # Loads entire file!
    
    for line in lines:
        process(line)
    
    # 10GB log file = 10GB+ memory usage
 
# PATTERN: Stream line by line
def process_log_good(filepath):
    with open(filepath) as f:
        for line in f:  # Streams, constant memory
            process(line)
    
    # 10GB log file = ~1MB memory usage
 
 
# =====================================================
# OBJECT POOLING
# =====================================================
 
from queue import Queue
from contextlib import contextmanager
 
class ObjectPool:
    """Reuse expensive objects instead of creating new ones."""
    
    def __init__(self, factory, size=10):
        self._factory = factory
        self._pool = Queue()
        for _ in range(size):
            self._pool.put(factory())
    
    @contextmanager
    def acquire(self):
        obj = self._pool.get()
        try:
            yield obj
        finally:
            self._pool.put(obj)  # Return to pool
 
# Usage:
buffer_pool = ObjectPool(lambda: bytearray(1024 * 1024), size=10)
 
def process_data(data):
    with buffer_pool.acquire() as buffer:
        # Reuse buffer instead of allocating new each time
        buffer[:len(data)] = data
        process(buffer)
 
 
# =====================================================
# GENERATORS FOR LAZY EVALUATION
# =====================================================
 
# ANTI-PATTERN: Materialize entire result
def get_all_users_bad():
    users = []
    for row in db.query("SELECT * FROM users"):
        users.append(User(**row))
    return users  # 1 million users = 1 million objects in memory
 
# PATTERN: Generator - one at a time
def get_all_users_good():
    for row in db.query("SELECT * FROM users"):
        yield User(**row)  # Yields one at a time
 
# Consumer still uses same code:
for user in get_all_users_good():
    process(user)
# But only one User object in memory at a time
 
 
# =====================================================
# PRIMITIVE COLLECTIONS (Java-specific, but concept applies)
# =====================================================
 
# Java: ArrayList<Integer> vs IntArrayList (primitive)
# ArrayList<Integer>: Each int wrapped in Integer object (~16 bytes overhead)
# IntArrayList (e.g., Eclipse Collections): Raw ints (~4 bytes each)
# 
# 1 million integers:
# - ArrayList<Integer>: ~20MB
# - IntArrayList: ~4MB
 
# Python equivalent: array module vs list
import array
 
numbers_list = [1.0] * 1_000_000        # ~8MB per float + object overhead
numbers_array = array.array('d', [1.0] * 1_000_000)  # ~8MB total
 
# NumPy arrays even more efficient:
import numpy as np
numbers_numpy = np.ones(1_000_000, dtype=np.float64)  # ~8MB, highly optimized

The Best Optimization

Summary: Mastering Memory Constraints

Key Takeaways

•Heap dominates memory usage: Most memory issues are heap-related—too much allocation, poor GC behavior, or memory leaks.
•GC pauses cause latency spikes: Garbage collection introduces latency. Low-latency systems should use modern collectors (ZGC, Shenandoah) and minimize allocation.
•Memory leaks are gradual: They take hours or days to manifest. Look for continuous growth without plateau.
•Right-size your memory: Heap should be 2-4x working set. Container should have 20%+ headroom above heap for non-heap usage.
•Leave room for the runtime: JVM non-heap, native allocations, OS buffers all need memory. Don't set heap = container limit.
•Stream large datasets: Never load unboundedly into memory. Use generators, iterators, and streaming APIs.
•Monitor continuously: Memory issues are temporal. Historical data and alerting are essential for diagnosis.

What's Next:

Page Complete

4 / 5