Loading content...
When you write x = 42 and then y = x, what exactly happens? Is y a copy of the value 42, or does it reference the same 42 that x points to? This question—trivial on the surface—reveals one of the most fundamental distinctions in programming language design: value semantics versus reference semantics.
For primitive types, this distinction has profound implications for:
This page explores how different languages handle primitive assignment and passing, and the hidden overhead costs that abstraction layers introduce.
By the end of this page, you will understand value semantics for primitives across languages, how object overhead affects memory and performance, and how to reason about what happens when you assign or pass primitive values in any language.
Value semantics means that when you assign a variable, you create an independent copy of the value. Changes to one variable don't affect the other. This is the intuitive model most people have when learning programming.
With pure value semantics:
x = 42
y = x // y gets its own copy of 42
y = 100 // x is still 42
This seems obvious, but it's actually not how all languages work for all types. The key question is: what gets copied—the value itself, or a reference (pointer) to the value?
C/C++ has true value semantics for primitive types. Assignment copies the bits:
1234567891011121314151617181920212223242526272829303132333435
#include <iostream> int main() { // Value semantics: assignment copies the value int x = 42; int y = x; // y gets its own copy of "42" y = 100; // Modifying y doesn't affect x std::cout << "x: " << x << ", y: " << y << "\n"; // x: 42, y: 100 // The memory layout: // x: [ 0x2A 0x00 0x00 0x00 ] (4 bytes containing 42) // y: [ 0x64 0x00 0x00 0x00 ] (4 bytes containing 100) // Completely separate memory locations std::cout << "Address of x: " << &x << "\n"; std::cout << "Address of y: " << &y << "\n"; // Different addresses - independent storage // Function parameters are also copies (by default) auto increment = [](int n) { n = n + 1; return n; }; int z = 10; int result = increment(z); std::cout << "z after increment(z): " << z << "\n"; // Still 10! std::cout << "result: " << result << "\n"; // 11 // z was not modified because 'n' was a copy // This is "pass by value" return 0;}Java also has value semantics for its eight primitive types—they are NOT objects:
1234567891011121314151617181920212223242526272829303132333435
public class ValueSemanticsJava { public static void main(String[] args) { // Primitive: pure value semantics int x = 42; int y = x; // y gets a copy of the value 42 y = 100; System.out.println("x: " + x + ", y: " + y); // x: 42, y: 100 // Passing primitives to methods: always by value int z = 10; incrementPrimitive(z); System.out.println("z after increment: " + z); // Still 10! // CONTRAST: Wrapper objects have reference semantics Integer objX = Integer.valueOf(42); Integer objY = objX; // objY references the SAME object // But Integer is immutable, so you can't mutate through either // objY = 100 creates a NEW Integer, doesn't modify the original objY = Integer.valueOf(100); System.out.println("objX: " + objX + ", objY: " + objY); // objX: 42, objY: 100 (appears same, but different mechanism) // The key difference shows with mutable objects (not primitives) int[] arrX = {1, 2, 3}; int[] arrY = arrX; // arrY references the SAME array arrY[0] = 999; System.out.println("arrX[0]: " + arrX[0]); // 999! Mutation visible } static void incrementPrimitive(int n) { n = n + 1; // Modifies local copy, original unchanged }}Java has value semantics for primitives (int, double, etc.) but reference semantics for objects (Integer, arrays, custom objects). This split is a common source of confusion. Primitives are copied; objects are referenced.
Reference semantics means that assignment copies a reference (or pointer) to the value, not the value itself. Both variables then point to the same underlying data. This is efficient for large objects but changes the programming model significantly.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
# In Python, ALL assignment creates references# Variables are names that point to objects x = 42y = x # y and x reference the SAME integer object # For small integers, Python interns them (optimization)print(f"x is y: {x is y}") # True - same object!print(f"id(x): {id(x)}, id(y): {id(y)}") # Same id # But wait - why doesn't modifying y affect x?y = 100print(f"x: {x}, y: {y}") # x: 42, y: 100 # Answer: IMMUTABILITY# y = 100 doesn't mutate the integer 42# It creates a NEW integer 100 and makes y point to it# x still points to the original 42 print(f"After y = 100:")print(f"id(x): {id(x)}") # Still the old idprint(f"id(y): {id(y)}") # New id! # The key insight: Python has reference semantics,# but IMMUTABLE types (int, float, str, tuple) behave# AS IF they had value semantics because they can't be mutated # CONTRAST with mutable types:list_x = [1, 2, 3]list_y = list_x # list_y references the SAME list list_y.append(4) # Mutates the shared listprint(f"list_x: {list_x}") # [1, 2, 3, 4] - Changed! # This is NOT the case for integers because you can't mutate them# There's no "42.set_value(100)" method # Function parameters also receive referencesdef attempt_modify(n): print(f"n before: {n}, id: {id(n)}") n = n + 100 # Creates new int, rebinds local 'n' print(f"n after: {n}, id: {id(n)}") value = 42attempt_modify(value)print(f"value after call: {value}") # Still 42!JavaScript has an unusual model: primitives have actual value semantics (not reference semantics with immutability):
1234567891011121314151617181920212223242526272829303132333435363738394041424344
// JavaScript primitives have TRUE value semanticslet x = 42;let y = x; // y gets a COPY of the value y = 100;console.log(`x: ${x}, y: ${y}`); // x: 42, y: 100 // Primitives are NOT objects (mostly)// There's no way to get "is same object" for primitives// They're simply copied on assignment // Functions receive copies of primitive valuesfunction attemptModify(n) { n = n + 100; console.log(`n inside: ${n}`);} let value = 42;attemptModify(value);console.log(`value after: ${value}`); // 42 - unchanged // CONTRAST: Objects have reference semanticslet objX = { num: 42 };let objY = objX; // objY references the SAME object objY.num = 100;console.log(`objX.num: ${objX.num}`); // 100 - changed! // Arrays toolet arrX = [1, 2, 3];let arrY = arrX; // ReferencearrY.push(4);console.log(`arrX: ${arrX}`); // [1, 2, 3, 4] // The "wrapper object" confusionlet strPrimitive = "hello";let strObject = new String("hello"); console.log(typeof strPrimitive); // "string" - primitiveconsole.log(typeof strObject); // "object" - wrapper // Autoboxing: calling methods on primitives creates temp wrappersconsole.log(strPrimitive.toUpperCase()); // Works via autoboxing// But strPrimitive itself is NOT an objectWhether a language uses value or reference semantics for primitives, the end result is similar BECAUSE primitives are immutable. You can't modify an integer in place. Assignment always creates a new binding. This is why Python's reference-based integers 'feel' like value types.
| Language | Primitive Assignment | Object/Array Assignment | Why Primitives Behave as Expected |
|---|---|---|---|
| C/C++ | Copies bits (true value semantics) | Copies struct/object (by default) | Value is literally copied |
| Java | Copies value (stack allocated) | Copies reference (heap allocated) | Primitive is not on heap |
| Python | Copies reference to object | Copies reference to object | Immutability prevents shared-state bugs |
| JavaScript | Copies primitive value | Copies reference to object | Primitives are special non-objects |
Every layer of abstraction adds overhead. For primitives, this overhead manifests primarily in memory consumption. Let's quantify the actual costs.
In object-oriented runtimes, every object carries metadata for:
This metadata exists in an object header that precedes the actual data.
| Language/Type | Actual Value | Overhead | Total Memory |
|---|---|---|---|
| C int | 4 bytes | 0 bytes | 4 bytes |
| C++ int | 4 bytes | 0 bytes | 4 bytes |
| Java int (primitive) | 4 bytes | 0 bytes | 4 bytes |
| Java Integer (wrapper) | 4 bytes | 12-16 bytes header + padding | 16-24 bytes |
| Python int (small) | ~4 bytes value | ~24 bytes header | 28 bytes |
| JavaScript Number | 8 bytes (double) | Engine-dependent | 8-24 bytes |
Let's calculate the memory impact for storing one million integers:
12345678910111213141516171819202122232425
Storing 1,000,000 integers: C/C++ (int array): - 1,000,000 × 4 bytes = 4,000,000 bytes = ~3.8 MB - Contiguous, cache-friendly Java (int[] array): - 1,000,000 × 4 bytes = 4,000,000 bytes + ~16 bytes header = ~3.8 MB - Still contiguous, nearly optimal Java (ArrayList<Integer>): - Object references: 1,000,000 × 8 bytes = 8,000,000 bytes - Integer objects: 1,000,000 × 16-24 bytes = 16,000,000-24,000,000 bytes - Total: ~24-32 MB (6-8× more!) - Non-contiguous, cache-unfriendly Python (list of ints): - List structure: ~9 MB (references + overhead) - Integer objects: 1,000,000 × 28 bytes = 28,000,000 bytes - Total: ~37 MB (nearly 10× more!) - Non-contiguous, poor cache performance NumPy (np.array with int64): - 1,000,000 × 8 bytes = 8,000,000 bytes + small header = ~7.6 MB - Contiguous, cache-friendly, nearly C performanceA Python list of integers uses nearly 10× more memory than a C array. This isn't a bug—it's the cost of Python's flexibility and dynamic typing. For small datasets it doesn't matter; for millions of elements it's the difference between fitting in cache and thrashing.
123456789101112131415161718192021222324252627282930
import sysimport numpy as np # Measure actual overhead in Pythondef measure_list_memory(n): """Approximate memory for a list of integers""" lst = list(range(n)) # Size of the list object itself (references) list_size = sys.getsizeof(lst) # Size of integer objects (small ints are interned, so approximation) int_sizes = sum(sys.getsizeof(i) for i in lst[:1000]) avg_int_size = int_sizes / 1000 total_estimate = list_size + n * avg_int_size return list_size, avg_int_size, total_estimate n = 1_000_000list_size, avg_int, total = measure_list_memory(n)print(f"List structure: {list_size / 1e6:.2f} MB")print(f"Average int size: {avg_int:.0f} bytes")print(f"Total estimate: {total / 1e6:.2f} MB") # Compare with NumPynp_arr = np.arange(n, dtype=np.int64)print(f"NumPy array (int64): {np_arr.nbytes / 1e6:.2f} MB") # Memory ratioprint(f"Python list / NumPy ratio: {total / np_arr.nbytes:.1f}×")Beyond memory, abstraction adds runtime performance overhead. Every operation that's a single CPU instruction in C becomes a multi-step process in higher-level languages.
+) may involve virtual table lookups.Let's measure the same operation—summing integers—across languages:
12345678910111213141516171819202122232425262728
#include <iostream>#include <chrono>#include <vector> int main() { const int N = 100'000'000; // Native array int* arr = new int[N]; for (int i = 0; i < N; i++) arr[i] = i; auto start = std::chrono::high_resolution_clock::now(); long long sum = 0; for (int i = 0; i < N; i++) { sum += arr[i]; } auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start); std::cout << "Sum: " << sum << "\n"; std::cout << "Time: " << duration.count() << " ms\n"; // Typical result: ~50-100ms with -O3 delete[] arr; return 0;}NumPy's secret is NOT that Python became faster—it's that NumPy sidesteps Python entirely for the actual computation. The sum() method runs C code over contiguous memory. Python only orchestrates; it doesn't compute. This pattern—"move the hot loop to C"—is universal in high-performance Python.
Knowing the overhead exists is important, but knowing when it matters is crucial for making good engineering decisions. Not all code is performance-critical, and premature optimization is indeed the root of much evil.
123456789101112131415161718192021222324252627282930
Decision: Is primitive overhead a concern? START │ ▼Is this code on the critical path? │ ├─ NO ──► Use the highest-level abstraction. Stop. │ ▼ YESIs the data size > 100,000 elements? │ ├─ NO ──► Use the highest-level abstraction. Stop. │ ▼ YESIs this inside a tight loop (>1M iterations)? │ ├─ NO ──► Use native arrays but high-level language. Stop. │ ▼ YESDo you need latency < 10ms? │ ├─ NO ──► Use NumPy/typed arrays, vectorize operations. Stop. │ ▼ YESConsider: - C/C++ for maximum control - Rust for safety + performance - Julia for numeric computing - Hand-optimized Java with primitives onlyModern systems often combine languages at different abstraction levels. Understanding how primitives translate across boundaries is essential for polyglot development.
| Concept | C/C++ | Java | Python | JavaScript | Protocol Buffers |
|---|---|---|---|---|---|
| 32-bit signed | int32_t | int | int (small) | Number (limited) | int32 |
| 64-bit signed | int64_t | long | int | BigInt | int64 |
| 64-bit float | double | double | float | Number | double |
| Boolean | bool | boolean | bool | boolean | bool |
| UTF-8 text | char*/std::string | String (UTF-16!) | str | string (UTF-16) | string |
| Byte array | uint8_t* | byte[] | bytes | Uint8Array | bytes |
The most common cross-language bug involves string encoding. C uses bytes (often UTF-8), Java/JavaScript use UTF-16 internally, Python 3 uses Unicode code points. Always be explicit about encoding at boundaries. Assume nothing.
1234567891011121314151617181920212223242526272829303132
# Python calling C via ctypes (simplified example)import ctypes # Load a C shared library# lib = ctypes.CDLL("./mylib.so") # Define argument and return types# lib.add_integers.argtypes = [ctypes.c_int, ctypes.c_int]# lib.add_integers.restype = ctypes.c_int # Call it - Python int is converted to C int# result = lib.add_integers(42, 58) # What happens under the hood:# 1. Python checks types and converts Python int (object) to C int (raw bits)# 2. Pushes arguments according to C calling convention# 3. Calls into native code# 4. C function executes (no Python overhead)# 5. Return value converted back to Python int object # For arrays (NumPy makes this efficient):import numpy as np # NumPy arrays have a C-compatible memory layoutarr = np.array([1, 2, 3, 4], dtype=np.int32) # The data pointer can be passed directly to C# data_ptr = arr.ctypes.data_as(ctypes.POINTER(ctypes.c_int32))# lib.process_array(data_ptr, len(arr)) # This is why NumPy is fast: NO conversion needed,# C operates directly on the same memory Python seesWe've explored the conceptual foundations of value semantics and abstraction overhead—two interconnected ideas that explain why the same primitive type behaves differently across languages.
Understanding value semantics and abstraction overhead isn't about always choosing the lowest-level language. It's about making informed tradeoffs. Use high-level abstractions for developer productivity, but recognize when and where to drop down for performance. The best engineers move fluidly up and down the abstraction ladder as requirements demand.
You have completed Module 9: Language-Level View. You now understand how primitive types appear across C/C++, Java, Python, and JavaScript; how abstraction level affects memory, performance, and behavior; and how value semantics and overhead considerations guide practical engineering decisions. This knowledge will help you navigate polyglot environments and make informed choices across the language landscape.