Shared Memory Via Virtual Memory - Learning Module

Loading content...

0/227

Shared Libraries

The Unsung Heroes of Modern Computing

Every program you run—from your web browser to your text editor to your terminal—shares something remarkable: they all use the same copy of the C library in memory. The printf() function that your editor calls is the same physical code that your browser uses.

On a typical Linux desktop with 200 running processes:

Without shared libraries: 200 copies of libc (~10 MB each) = 2 GB just for the C library
With shared libraries: 1 copy of libc = 10 MB total

Shared libraries are the foundational mechanism enabling practical multi-process computing. But they're far more than memory optimization—they enable:

Security updates without recompiling applications
Plugin architectures where functionality loads dynamically
Modular software with interchangeable components
Smaller executables with faster disk I/O and downloads

This page dissects exactly how shared libraries work, from compilation through runtime symbol resolution, revealing the elegant engineering that makes modern software ecosystems possible.

What You Will Learn

By the end of this page, you will understand: how shared libraries differ from static libraries, the complete loading and linking process, Position-Independent Code (PIC) and its necessity, the Global Offset Table (GOT) and Procedure Linkage Table (PLT), lazy binding and its performance implications, and how the dynamic linker orchestrates runtime symbol resolution. You'll gain practical knowledge for debugging linking issues and optimizing application startup.

Static vs Dynamic Linking

Before diving into shared libraries, we must understand the fundamental distinction between static and dynamic linking—two different answers to the question: When and how do we resolve symbols (function calls, variable references) to actual memory addresses?

Static Linking

•When: At compile time (by the static linker ld)
•How: Copies library code directly into executable
•Result: Self-contained executable, no runtime dependencies
•Executable size: Large (includes all library code)
•Memory at runtime: Each process has its own copy
•Updates: Must recompile to get library updates
•Symbol resolution: Fixed addresses at link time

Dynamic Linking

•When: At runtime (by the dynamic linker ld.so)
•How: References library code, loaded separately
•Result: Requires shared libraries present at runtime
•Executable size: Small (just references to libraries)
•Memory at runtime: Single copy shared across processes
•Updates: Library updates automatically used by all programs
•Symbol resolution: Resolved at load time or first use

linking_comparison.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Compile a simple program with static vs dynamic linking
 
# Dynamic linking (default)
$ gcc -o hello_dynamic hello.c
$ ls -lh hello_dynamic
-rwxr-xr-x 1 user user 16K hello_dynamic
 
$ ldd hello_dynamic  # Show dynamic dependencies
linux-vdso.so.1
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
/lib64/ld-linux-x86-64.so.2
 
# Static linking
$ gcc -static -o hello_static hello.c
$ ls -lh hello_static
-rwxr-xr-x 1 user user 880K hello_static
 
$ ldd hello_static
not a dynamic executable  # No dependencies!
 
# Memory comparison at runtime
$ ps -o rss= -p $(pgrep hello_dynamic)
1200   # KB resident set
$ ps -o rss= -p $(pgrep hello_static)
1280   # Similar - both touch similar amounts of code
 
# But with 100 instances of each...
# Dynamic: 100 * small_private + 1 * shared_libc ≈ 50 MB total
# Static:  100 * full_binary ≈ 88 GB total (each is self-contained)

When to Use Static Linking

Static linking isn't obsolete—it's the right choice for: (1) Embedded systems with no dynamic linker, (2) Single-process deployments where sharing has no benefit, (3) Container images where reproducibility matters (no external dependencies), (4) Security-critical applications that must avoid library substitution attacks, (5) Performance-critical paths where lazy binding overhead is unacceptable.

Anatomy of a Shared Library

A shared library (.so on Linux/Unix, .dll on Windows, .dylib on macOS) is a special type of ELF (Executable and Linkable Format) file designed to be loaded at runtime and shared across processes. Let's examine its structure:

Converting Mermaid diagram...

Key Sections in a Shared Library
Section	Contents	Purpose for Sharing
.text	Executable code	Shared read-only across all processes
.rodata	Read-only data (strings, constants)	Shared read-only across all processes
.data	Initialized writable data	Copy-on-Write; each process gets private copy on first write
.bss	Uninitialized data (zero-filled)	Private per process (zero pages until written)
.plt	Procedure Linkage Table	Shared; trampolines for lazy binding
.got	Global Offset Table	Private per process; holds resolved addresses
.got.plt	GOT for PLT entries	Private; initially hold PLT stub addresses
.dynsym	Dynamic symbol table	Read-only; lists exported/imported symbols
.rela.dyn	Data relocations	Used at load time to fix address references
.rela.plt	PLT relocations	Used for lazy binding of function calls

examine_library.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Examine a shared library's structure
 
# View ELF headers
$ readelf -h /lib/x86_64-linux-gnu/libc.so.6
ELF Header:
  Type:                       DYN (Shared object file)
  Entry point address:        0x29cd0
  ...
 
# View program headers (segments)
$ readelf -l /lib/x86_64-linux-gnu/libc.so.6 | head -30
Program Headers:
  Type    Offset     VirtAddr           FileSiz  MemSiz   Flg Align
  LOAD    0x0000000  0x0000000000000000 0x1c041c 0x1c041c R   0x1000
  LOAD    0x001ce000 0x00000000001ce000 0x159e0f 0x159e0f R E 0x1000  # TEXT
  LOAD    0x00328000 0x0000000000328000 0x54d4   0x54d4   RW  0x1000  # DATA
 
# View dynamic section
$ readelf -d /lib/x86_64-linux-gnu/libc.so.6
Dynamic section:
  SONAME       libc.so.6
  NEEDED       ld-linux-x86-64.so.2
  SYMTAB       0x4e88
  STRTAB       0x159f8
  ...
 
# List exported symbols
$ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep ' printf'
00000000000607a0 T printf
00000000000603d0 T printf_size
 
# View relocations
$ readelf -r /lib/x86_64-linux-gnu/libc.so.6 | head -20

SONAME: The Library Identity

The SONAME (Shared Object Name) is the library's canonical identity, typically including the major version: libc.so.6. The actual file may be libc-2.31.so with symlinks libc.so.6 → libc-2.31.so. Executables record the SONAME, not the filename, allowing minor version upgrades without relinking. The linker flag -Wl,-soname,libfoo.so.1 sets this during compilation.

Position-Independent Code (PIC)

The Problem: Where Will the Library Load?

When compiling a regular executable, the compiler knows exactly where the code will be loaded (at a fixed address specified in the ELF header). But shared libraries face a dilemma:

Library A might be loaded at address 0x7f000000 in Process 1
The same library A might be at address 0x7e000000 in Process 2
There's no way to predict the load address at compile time

If library code contained hardcoded addresses like:

mov rax, [0x7f001234]  ; Load global variable
call 0x7f005678        ; Call internal function

This code would only work if loaded at exactly address 0x7f000000. Sharing would be impossible because each process would need different addresses in the code.

The Solution: Position-Independent Code

PIC uses relative addressing: all code and data references are expressed relative to the current instruction pointer, not as absolute addresses.

pic_vs_non_pic.asm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
; ==========================================
; Non-PIC code (absolute addressing) 
; Cannot be shared - addresses baked into code
; ==========================================
 
; Access global variable 'counter'
mov eax, [0x4012a0]        ; Absolute address - only works at one load address
 
; Call function 'helper'
call 0x401100              ; Absolute address - breaks if library moves
 
; ==========================================
; PIC code (relative addressing)
; CAN be shared - all references are relative
; ==========================================
 
; Access global variable via GOT (Global Offset Table)
lea rbx, [rip + _GLOBAL_OFFSET_TABLE_]  ; RIP-relative: GOT address
mov rax, [rbx + counter@GOTOFF]          ; Load 'counter' via GOT offset
 
; Call internal function (RIP-relative)
call helper@PLT                           ; Through PLT for external symbols
; or for internal functions:
call .helper - . + rip                    ; Direct RIP-relative call
 
; Access local data (RIP-relative)
lea rax, [rip + .LC0]      ; String constant at RIP-relative address
 
; ==========================================
; How x86-64 enables PIC efficiently
; ==========================================
; The RIP-relative addressing mode:
;   [rip + displacement]
; 
; This computes: address_of_next_instruction + displacement
; Since the displacement is relative, the code works at ANY load address!

Why RIP-Relative Addressing Works

Consider code at runtime:

Instruction at address: 0x7f00001000
[rip + displacement], where displacement = 0x500

Accessed address: 0x7f00001000 + (instruction size) + 0x500
                = 0x7f00001507 (assuming 7-byte instruction)

Now if the same code loads at a different address:

Instruction at address: 0x7e00001000
[rip + displacement], where displacement = 0x500 (unchanged!)

Accessed address: 0x7e00001000 + 7 + 0x500
                = 0x7e00001507

The relative relationship between the instruction and the data is preserved, regardless of where the library loads. The code bytes are identical, hence shareable.

PIC on 32-bit x86

32-bit x86 lacks RIP-relative addressing. PIC on x86-32 requires obtaining the current instruction address through a call/pop sequence (since call pushes the return address), then computing offsets from there. This 'thunk' pattern adds overhead and complexity: call __x86.get_pc_thunk.bx; add ebx, GLOBAL_OFFSET_TABLE_. This is why PIC had more performance impact on 32-bit systems.

Compiling with PIC

•-fpic: Generate position-independent code. GOT has size limit (platform-dependent, ~32KB on some archs). Slightly more efficient.
•-fPIC: Generate position-independent code. No GOT size limit. Use this for large libraries.
•-pie: Create a position-independent executable (for ASLR of main program).
•Default on x86-64: Most compilers default to PIC for shared libraries. The -fno-pic flag disables it.
•Performance: Modern x86-64 PIC has near-zero overhead due to efficient RIP-relative addressing. 32-bit x86 PIC can have 5-10% overhead.

The Global Offset Table and Procedure Linkage Table

Even with position-independent code, shared libraries face another challenge: external symbols. When library A calls a function in library B, where is that function? The address isn't known until runtime when both libraries are loaded.

The Global Offset Table (GOT) and Procedure Linkage Table (PLT) solve this problem elegantly.

What is the GOT?

The GOT is a table of pointers located in the data segment (writable memory). Each entry in the GOT holds the actual runtime address of a global symbol (function or variable) from another library.

How it works:

At compile time, references to external symbols become GOT lookups:

// Source code
int x = external_var;

// Compiled (conceptually)
GOT[external_var_slot] = <resolved at runtime>
int x = *GOT[external_var_slot];

At load time, the dynamic linker fills in GOT entries with actual addresses.
At runtime, the code reads through the GOT to access external data.

GOT structure per process:

GOT[0]: Reserved (pointer to _DYNAMIC)
GOT[1]: Pointer to link_map structure
GOT[2]: Pointer to resolver function (for lazy binding)
GOT[3]: Address of first external symbol
GOT[4]: Address of second external symbol
...

Why the GOT is per-process:

The GOT is in writable memory and contains addresses specific to how libraries are loaded in each process. Process A might load libX.so at address 0x7f10... while Process B loads it at 0x7f20... Their GOT entries for libX symbols differ.

The GOT cannot be shared — it's part of the private data segment.

Converting Mermaid diagram...

The Dynamic Linker: ld.so

The dynamic linker (also called the runtime linker) is the unsung hero that makes shared libraries work. On Linux, it's typically ld-linux.so or ld-linux-x86-64.so.2. Let's trace exactly what happens when a dynamically-linked program starts.

Dynamic Linker Startup Sequence

•Kernel loads the executable — The kernel reads the ELF header, maps the executable's segments into memory, and notes the PT_INTERP segment specifying the dynamic linker path.
•Kernel loads the dynamic linker — Before transferring control to the program, the kernel loads the dynamic linker into the process's address space.
•Control transfers to ld.so — The kernel starts execution at the dynamic linker's entry point, not the program's main().
•ld.so self-relocates — The dynamic linker itself is position-independent. It first resolves its own relocations (bootstrapping).
•Process NEEDED entries — The linker reads the executable's dynamic section, finding all directly required libraries (DT_NEEDED entries).
•Load required libraries — Each library is loaded, and its dependencies are recursively processed (breadth-first or depth-first, implementation-dependent).
•Symbol resolution — For immediate binding, all symbols are resolved now. For lazy binding, only data relocations are processed; function relocations wait.
•Initialize libraries — Each library's .init and .init_array functions are called (in dependency order: dependencies first).
•Transfer to main() — Finally, control passes to the program's _start, which calls __libc_start_main, which calls main().

tracing_dynamic_linking.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Trace dynamic linker activity
 
# See library search and loading
$ LD_DEBUG=libs ./my_program
    12345: find library=libc.so.6 [0]; searching
    12345:  search path=/lib/x86_64-linux-gnu  (system search path)
    12345:   trying file=/lib/x86_64-linux-gnu/libc.so.6
    12345:
    12345: calling init: /lib/x86_64-linux-gnu/libc.so.6
 
# See symbol resolution
$ LD_DEBUG=bindings ./my_program
    12345: binding file ./my_program [0] to /lib/x86_64-linux-gnu/libc.so.6
           symbol 'printf' [GLIBC_2.2.5]
    12345: binding file ./my_program [0] to /lib/x86_64-linux-gnu/libc.so.6
           symbol 'malloc' [GLIBC_2.2.5]
 
# See all details
$ LD_DEBUG=all ./my_program 2>&1 | head -100
 
# See what ld.so would load without running the program
$ ldd ./my_program
    linux-vdso.so.1 (0x00007ffcc37fe000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8b3c200000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8b3c022000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f8b3c600000)
 
# Show library search order
$ ldconfig -p | grep libc
    libc.so.6 (libc6,x86-64) => /lib/x86_64-linux-gnu/libc.so.6

Library Search Order
Priority	Source	Purpose
1	DT_RPATH in executable	Legacy; rarely used (security concerns)
2	LD_LIBRARY_PATH environment	Development/testing; not for production
3	DT_RUNPATH in executable	Modern alternative to RPATH
4	/etc/ld.so.cache	Cached search paths (ldconfig -p)
5	Default paths (/lib, /usr/lib)	System libraries

LD_LIBRARY_PATH Security

LD_LIBRARY_PATH is ignored for setuid/setgid programs to prevent privilege escalation attacks. An attacker could otherwise point it to a directory containing a malicious libc.so.6. For production deployments, use /etc/ld.so.conf.d/ or RUNPATH embedded in the executable instead.

Symbol Versioning and Compatibility

How does glibc maintain binary compatibility across decades? A program compiled in 2005 still runs on a 2024 system. The answer is symbol versioning, a sophisticated mechanism that allows multiple implementations of the same symbol to coexist.

The Problem:

Suppose memcpy behavior changes between glibc versions:

glibc 2.2.5: memcpy handles overlapping regions
glibc 2.14: memcpy is faster but doesn't handle overlaps (use memmove)

Old programs relying on the overlap behavior would break on the new system.

The Solution: Symbol Versions

The library provides multiple versions of memcpy:

memcpy@GLIBC_2.2.5   → old implementation
memcpy@@GLIBC_2.14   → new implementation (default for new programs)

Programs compiled against old glibc record they need memcpy@GLIBC_2.2.5. New programs get memcpy@GLIBC_2.14. The library contains both!

symbol_versioning.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# View versioned symbols in libc
$ readelf -sV /lib/x86_64-linux-gnu/libc.so.6 | grep memcpy
  1284: 00000000000a9100   436 IFUNC   GLOBAL DEFAULT   16 memcpy@@GLIBC_2.14
  1285: 00000000000a90d0     9 FUNC    GLOBAL DEFAULT   16 memcpy@GLIBC_2.2.5
 
# The @@ symbol is the default (for newly linked programs)
# The @ symbol is for backward compatibility
 
# Check what version your program needs
$ readelf -V ./my_program
Version needs section '.gnu.version_r':
  Addr: 0x00000000004003e8  Offset: 0x0003e8  Link: 6 (.dynstr)
  000000: Version: 1  File: libc.so.6  Cnt: 1
  0x0010:   Name: GLIBC_2.2.5  Flags: none  Version: 2
 
# Creating versioned symbols in your own library
$ cat version.map
MYLIB_1.0 {
    global:
        my_function;
    local:
        *;
};
 
MYLIB_2.0 {
    global:
        my_function;  # New version
} MYLIB_1.0;          # Inherits from 1.0
 
$ gcc -shared -o libmy.so my.c -Wl,--version-script=version.map

Symbol Versioning Guarantees

•Binary compatibility: Programs continue to work when system libraries are upgraded.
•New features: Libraries can add new functionality without breaking old programs.
•Behavior changes: When semantics must change, a new version is created.
•Clear errors: If a required version is missing, the dynamic linker reports exactly which version of which symbol is needed.
•Deprecation path: Old versions can emit warnings before eventual removal.

Auditing Symbol Requirements

To find the minimum glibc version required by a binary: objdump -T ./binary | grep GLIBC | sed 's/.*GLIBC_//' | sort -V | tail -1. This helps when deploying to older systems. If your program requires GLIBC_2.28 but the target has 2.17, you'll need static linking or a compatibility library.

Measuring Shared Library Efficiency

Understanding the real memory savings from shared libraries requires careful measurement. Not all parts of a library are shared equally, and the savings depend on how many processes use the library.

What Gets Shared in a Shared Library
Section	Shareable?	Reason
.text (code)	✅ Yes	Read-only, execute-only; identical across processes
.rodata (constants)	✅ Yes	Read-only; identical across processes
.plt (procedure linkage)	✅ Yes	Read-only code stubs; identical
.got.plt (GOT for PLT)	❌ No	Writable; contains process-specific addresses
.got (global offset table)	❌ No	Writable; process-specific
.data (initialized data)	⚠️ COW	Initially shared; copied on write
.bss (uninitialized data)	❌ No	Private per-process (anonymous pages)

measure_sharing.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Measure actual sharing for a library
 
# 1. Find memory mappings for libc
$ grep libc /proc/$(pgrep -f firefox | head -1)/smaps
7f8b3c022000-7f8b3c1df000 r-xp ... /lib/x86_64-linux-gnu/libc-2.31.so
Size:               1844 kB
Rss:                1204 kB
Pss:                  12 kB   # ← Proportional share! 1204 KB shared by ~100 processes
Shared_Clean:       1204 kB   # ← All clean shared pages
Private_Clean:         0 kB
Private_Dirty:         0 kB
 
7f8b3c1df000-7f8b3c1e9000 rw-p ... /lib/x86_64-linux-gnu/libc-2.31.so
Size:                 40 kB
Rss:                  40 kB
Pss:                  40 kB   # ← All private (data segment)
Shared_Clean:          0 kB
Private_Dirty:        40 kB   # ← Process-private dirty pages
 
# 2. Count processes sharing libc
$ lsof /lib/x86_64-linux-gnu/libc.so.6 2>/dev/null | wc -l
247
 
# 3. Calculate savings
# Without sharing: 247 processes * 1.8 MB code = 445 MB
# With sharing: 1.8 MB (shared) + 247 * 40 KB (private) = 11.7 MB
# Savings: 433 MB (97% reduction!)
 
# 4. System-wide library sharing
$ smem -t -k | tail -10
# Shows RSS vs PSS for all processes, revealing sharing

The PSS Metric

Proportional Set Size (PSS) is the fairest measure of per-process memory usage. For shared pages, PSS divides the page size by the number of sharers. If 100 processes share a 4 KB page, each is charged 40 bytes of PSS. This gives a true picture of what memory would be freed if a process exited.

Optimizing Library Sharing

To maximize sharing efficiency:

Technique	Impact	How
Minimize writable sections	More read-only sharing	Avoid global mutable state; use const
Use `-fPIC` correctly	Enable sharing	Required for shared libraries
Prelink libraries	Reduce private GOT updates	`prelink -a` (deprecated due to ASLR conflicts)
Large page support	Fewer TLB entries for shared libs	Transparent Huge Pages for .text
Avoid `dlopen()` in tight loops	Reduce GOT dirtying	Cache handles, load libraries once
Symbol visibility	Reduce relocation needs	`__attribute__((visibility("hidden")))`

Summary: Shared Libraries

We've explored the complete lifecycle and internals of shared libraries. Let's consolidate the key takeaways:

Key Takeaways

•Shared libraries enable massive memory savings — A single copy of library code serves all processes, reducing system memory usage by orders of magnitude.
•Position-Independent Code (PIC) is essential — RIP-relative addressing on x86-64 allows code to work at any load address, enabling the sharing of code pages.
•The GOT/PLT mechanism bridges the gap — The Global Offset Table holds resolved addresses (private), while the Procedure Linkage Table provides shared code trampolines for lazy binding.
•Lazy binding trades startup time for first-call latency — Functions are resolved on first use; LD_BIND_NOW and Full RELRO trade startup cost for security and determinism.
•The dynamic linker orchestrates everything — ld.so loads libraries, resolves symbols, and maintains the complex dependency graph at runtime.
•Symbol versioning ensures compatibility — Multiple symbol versions coexist, allowing binary compatibility across library upgrades spanning decades.
•Not all library content is shared — Code and read-only data share; GOT, data sections, and BSS are per-process. Understanding this is key to memory analysis.

What's Next:

Now that we understand how shared libraries enable code sharing across processes, we'll explore inter-process shared memory — a mechanism that allows processes to share data segments for explicit communication. Unlike the implicit sharing of library code, shared memory for IPC requires coordination, synchronization, and careful security considerations.

Page Complete

You now understand shared libraries at a systems programming level: PIC, GOT, PLT, lazy binding, the dynamic linker, and symbol versioning. This knowledge is essential for debugging linking issues, optimizing application startup, and understanding memory usage in production systems.

Shared Libraries

The Unsung Heroes of Modern Computing

On a typical Linux desktop with 200 running processes:

Without shared libraries: 200 copies of libc (~10 MB each) = 2 GB just for the C library
With shared libraries: 1 copy of libc = 10 MB total

Shared libraries are the foundational mechanism enabling practical multi-process computing. But they're far more than memory optimization—they enable:

Security updates without recompiling applications
Plugin architectures where functionality loads dynamically
Modular software with interchangeable components
Smaller executables with faster disk I/O and downloads

This page dissects exactly how shared libraries work, from compilation through runtime symbol resolution, revealing the elegant engineering that makes modern software ecosystems possible.

What You Will Learn

Static vs Dynamic Linking

Static Linking

•When: At compile time (by the static linker ld)
•How: Copies library code directly into executable
•Result: Self-contained executable, no runtime dependencies
•Executable size: Large (includes all library code)
•Memory at runtime: Each process has its own copy
•Updates: Must recompile to get library updates
•Symbol resolution: Fixed addresses at link time

Dynamic Linking

•When: At runtime (by the dynamic linker ld.so)
•How: References library code, loaded separately
•Result: Requires shared libraries present at runtime
•Executable size: Small (just references to libraries)
•Memory at runtime: Single copy shared across processes
•Updates: Library updates automatically used by all programs
•Symbol resolution: Resolved at load time or first use

linking_comparison.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Compile a simple program with static vs dynamic linking
 
# Dynamic linking (default)
$ gcc -o hello_dynamic hello.c
$ ls -lh hello_dynamic
-rwxr-xr-x 1 user user 16K hello_dynamic
 
$ ldd hello_dynamic  # Show dynamic dependencies
linux-vdso.so.1
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
/lib64/ld-linux-x86-64.so.2
 
# Static linking
$ gcc -static -o hello_static hello.c
$ ls -lh hello_static
-rwxr-xr-x 1 user user 880K hello_static
 
$ ldd hello_static
not a dynamic executable  # No dependencies!
 
# Memory comparison at runtime
$ ps -o rss= -p $(pgrep hello_dynamic)
1200   # KB resident set
$ ps -o rss= -p $(pgrep hello_static)
1280   # Similar - both touch similar amounts of code
 
# But with 100 instances of each...
# Dynamic: 100 * small_private + 1 * shared_libc ≈ 50 MB total
# Static:  100 * full_binary ≈ 88 GB total (each is self-contained)

When to Use Static Linking

Anatomy of a Shared Library

Converting Mermaid diagram...

Key Sections in a Shared Library
Section	Contents	Purpose for Sharing
.text	Executable code	Shared read-only across all processes
.rodata	Read-only data (strings, constants)	Shared read-only across all processes
.data	Initialized writable data	Copy-on-Write; each process gets private copy on first write
.bss	Uninitialized data (zero-filled)	Private per process (zero pages until written)
.plt	Procedure Linkage Table	Shared; trampolines for lazy binding
.got	Global Offset Table	Private per process; holds resolved addresses
.got.plt	GOT for PLT entries	Private; initially hold PLT stub addresses
.dynsym	Dynamic symbol table	Read-only; lists exported/imported symbols
.rela.dyn	Data relocations	Used at load time to fix address references
.rela.plt	PLT relocations	Used for lazy binding of function calls

examine_library.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Examine a shared library's structure
 
# View ELF headers
$ readelf -h /lib/x86_64-linux-gnu/libc.so.6
ELF Header:
  Type:                       DYN (Shared object file)
  Entry point address:        0x29cd0
  ...
 
# View program headers (segments)
$ readelf -l /lib/x86_64-linux-gnu/libc.so.6 | head -30
Program Headers:
  Type    Offset     VirtAddr           FileSiz  MemSiz   Flg Align
  LOAD    0x0000000  0x0000000000000000 0x1c041c 0x1c041c R   0x1000
  LOAD    0x001ce000 0x00000000001ce000 0x159e0f 0x159e0f R E 0x1000  # TEXT
  LOAD    0x00328000 0x0000000000328000 0x54d4   0x54d4   RW  0x1000  # DATA
 
# View dynamic section
$ readelf -d /lib/x86_64-linux-gnu/libc.so.6
Dynamic section:
  SONAME       libc.so.6
  NEEDED       ld-linux-x86-64.so.2
  SYMTAB       0x4e88
  STRTAB       0x159f8
  ...
 
# List exported symbols
$ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep ' printf'
00000000000607a0 T printf
00000000000603d0 T printf_size
 
# View relocations
$ readelf -r /lib/x86_64-linux-gnu/libc.so.6 | head -20

SONAME: The Library Identity

Position-Independent Code (PIC)

The Problem: Where Will the Library Load?

When compiling a regular executable, the compiler knows exactly where the code will be loaded (at a fixed address specified in the ELF header). But shared libraries face a dilemma:

Library A might be loaded at address 0x7f000000 in Process 1
The same library A might be at address 0x7e000000 in Process 2
There's no way to predict the load address at compile time

If library code contained hardcoded addresses like:

mov rax, [0x7f001234]  ; Load global variable
call 0x7f005678        ; Call internal function

This code would only work if loaded at exactly address 0x7f000000. Sharing would be impossible because each process would need different addresses in the code.

The Solution: Position-Independent Code

PIC uses relative addressing: all code and data references are expressed relative to the current instruction pointer, not as absolute addresses.

pic_vs_non_pic.asm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
; ==========================================
; Non-PIC code (absolute addressing) 
; Cannot be shared - addresses baked into code
; ==========================================
 
; Access global variable 'counter'
mov eax, [0x4012a0]        ; Absolute address - only works at one load address
 
; Call function 'helper'
call 0x401100              ; Absolute address - breaks if library moves
 
; ==========================================
; PIC code (relative addressing)
; CAN be shared - all references are relative
; ==========================================
 
; Access global variable via GOT (Global Offset Table)
lea rbx, [rip + _GLOBAL_OFFSET_TABLE_]  ; RIP-relative: GOT address
mov rax, [rbx + counter@GOTOFF]          ; Load 'counter' via GOT offset
 
; Call internal function (RIP-relative)
call helper@PLT                           ; Through PLT for external symbols
; or for internal functions:
call .helper - . + rip                    ; Direct RIP-relative call
 
; Access local data (RIP-relative)
lea rax, [rip + .LC0]      ; String constant at RIP-relative address
 
; ==========================================
; How x86-64 enables PIC efficiently
; ==========================================
; The RIP-relative addressing mode:
;   [rip + displacement]
; 
; This computes: address_of_next_instruction + displacement
; Since the displacement is relative, the code works at ANY load address!

Why RIP-Relative Addressing Works

Consider code at runtime:

Instruction at address: 0x7f00001000
[rip + displacement], where displacement = 0x500

Accessed address: 0x7f00001000 + (instruction size) + 0x500
                = 0x7f00001507 (assuming 7-byte instruction)

Now if the same code loads at a different address:

Instruction at address: 0x7e00001000
[rip + displacement], where displacement = 0x500 (unchanged!)

Accessed address: 0x7e00001000 + 7 + 0x500
                = 0x7e00001507

The relative relationship between the instruction and the data is preserved, regardless of where the library loads. The code bytes are identical, hence shareable.

PIC on 32-bit x86

Compiling with PIC

•-fpic: Generate position-independent code. GOT has size limit (platform-dependent, ~32KB on some archs). Slightly more efficient.
•-fPIC: Generate position-independent code. No GOT size limit. Use this for large libraries.
•-pie: Create a position-independent executable (for ASLR of main program).
•Default on x86-64: Most compilers default to PIC for shared libraries. The -fno-pic flag disables it.
•Performance: Modern x86-64 PIC has near-zero overhead due to efficient RIP-relative addressing. 32-bit x86 PIC can have 5-10% overhead.

The Global Offset Table and Procedure Linkage Table

The Global Offset Table (GOT) and Procedure Linkage Table (PLT) solve this problem elegantly.

What is the GOT?

How it works:

At compile time, references to external symbols become GOT lookups:

// Source code
int x = external_var;

// Compiled (conceptually)
GOT[external_var_slot] = <resolved at runtime>
int x = *GOT[external_var_slot];

At load time, the dynamic linker fills in GOT entries with actual addresses.
At runtime, the code reads through the GOT to access external data.

GOT structure per process:

GOT[0]: Reserved (pointer to _DYNAMIC)
GOT[1]: Pointer to link_map structure
GOT[2]: Pointer to resolver function (for lazy binding)
GOT[3]: Address of first external symbol
GOT[4]: Address of second external symbol
...

Why the GOT is per-process:

The GOT cannot be shared — it's part of the private data segment.

Converting Mermaid diagram...

The Dynamic Linker: ld.so

Dynamic Linker Startup Sequence

•Kernel loads the executable — The kernel reads the ELF header, maps the executable's segments into memory, and notes the PT_INTERP segment specifying the dynamic linker path.
•Kernel loads the dynamic linker — Before transferring control to the program, the kernel loads the dynamic linker into the process's address space.
•Control transfers to ld.so — The kernel starts execution at the dynamic linker's entry point, not the program's main().
•ld.so self-relocates — The dynamic linker itself is position-independent. It first resolves its own relocations (bootstrapping).
•Process NEEDED entries — The linker reads the executable's dynamic section, finding all directly required libraries (DT_NEEDED entries).
•Load required libraries — Each library is loaded, and its dependencies are recursively processed (breadth-first or depth-first, implementation-dependent).
•Symbol resolution — For immediate binding, all symbols are resolved now. For lazy binding, only data relocations are processed; function relocations wait.
•Initialize libraries — Each library's .init and .init_array functions are called (in dependency order: dependencies first).
•Transfer to main() — Finally, control passes to the program's _start, which calls __libc_start_main, which calls main().

tracing_dynamic_linking.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Trace dynamic linker activity
 
# See library search and loading
$ LD_DEBUG=libs ./my_program
    12345: find library=libc.so.6 [0]; searching
    12345:  search path=/lib/x86_64-linux-gnu  (system search path)
    12345:   trying file=/lib/x86_64-linux-gnu/libc.so.6
    12345:
    12345: calling init: /lib/x86_64-linux-gnu/libc.so.6
 
# See symbol resolution
$ LD_DEBUG=bindings ./my_program
    12345: binding file ./my_program [0] to /lib/x86_64-linux-gnu/libc.so.6
           symbol 'printf' [GLIBC_2.2.5]
    12345: binding file ./my_program [0] to /lib/x86_64-linux-gnu/libc.so.6
           symbol 'malloc' [GLIBC_2.2.5]
 
# See all details
$ LD_DEBUG=all ./my_program 2>&1 | head -100
 
# See what ld.so would load without running the program
$ ldd ./my_program
    linux-vdso.so.1 (0x00007ffcc37fe000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8b3c200000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8b3c022000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f8b3c600000)
 
# Show library search order
$ ldconfig -p | grep libc
    libc.so.6 (libc6,x86-64) => /lib/x86_64-linux-gnu/libc.so.6

Library Search Order
Priority	Source	Purpose
1	DT_RPATH in executable	Legacy; rarely used (security concerns)
2	LD_LIBRARY_PATH environment	Development/testing; not for production
3	DT_RUNPATH in executable	Modern alternative to RPATH
4	/etc/ld.so.cache	Cached search paths (ldconfig -p)
5	Default paths (/lib, /usr/lib)	System libraries

LD_LIBRARY_PATH Security

Symbol Versioning and Compatibility

The Problem:

Suppose memcpy behavior changes between glibc versions:

glibc 2.2.5: memcpy handles overlapping regions
glibc 2.14: memcpy is faster but doesn't handle overlaps (use memmove)

Old programs relying on the overlap behavior would break on the new system.

The Solution: Symbol Versions

The library provides multiple versions of memcpy:

memcpy@GLIBC_2.2.5   → old implementation
memcpy@@GLIBC_2.14   → new implementation (default for new programs)

Programs compiled against old glibc record they need memcpy@GLIBC_2.2.5. New programs get memcpy@GLIBC_2.14. The library contains both!

symbol_versioning.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# View versioned symbols in libc
$ readelf -sV /lib/x86_64-linux-gnu/libc.so.6 | grep memcpy
  1284: 00000000000a9100   436 IFUNC   GLOBAL DEFAULT   16 memcpy@@GLIBC_2.14
  1285: 00000000000a90d0     9 FUNC    GLOBAL DEFAULT   16 memcpy@GLIBC_2.2.5
 
# The @@ symbol is the default (for newly linked programs)
# The @ symbol is for backward compatibility
 
# Check what version your program needs
$ readelf -V ./my_program
Version needs section '.gnu.version_r':
  Addr: 0x00000000004003e8  Offset: 0x0003e8  Link: 6 (.dynstr)
  000000: Version: 1  File: libc.so.6  Cnt: 1
  0x0010:   Name: GLIBC_2.2.5  Flags: none  Version: 2
 
# Creating versioned symbols in your own library
$ cat version.map
MYLIB_1.0 {
    global:
        my_function;
    local:
        *;
};
 
MYLIB_2.0 {
    global:
        my_function;  # New version
} MYLIB_1.0;          # Inherits from 1.0
 
$ gcc -shared -o libmy.so my.c -Wl,--version-script=version.map

Symbol Versioning Guarantees

•Binary compatibility: Programs continue to work when system libraries are upgraded.
•New features: Libraries can add new functionality without breaking old programs.
•Behavior changes: When semantics must change, a new version is created.
•Clear errors: If a required version is missing, the dynamic linker reports exactly which version of which symbol is needed.
•Deprecation path: Old versions can emit warnings before eventual removal.

Auditing Symbol Requirements

Measuring Shared Library Efficiency

Understanding the real memory savings from shared libraries requires careful measurement. Not all parts of a library are shared equally, and the savings depend on how many processes use the library.

What Gets Shared in a Shared Library
Section	Shareable?	Reason
.text (code)	✅ Yes	Read-only, execute-only; identical across processes
.rodata (constants)	✅ Yes	Read-only; identical across processes
.plt (procedure linkage)	✅ Yes	Read-only code stubs; identical
.got.plt (GOT for PLT)	❌ No	Writable; contains process-specific addresses
.got (global offset table)	❌ No	Writable; process-specific
.data (initialized data)	⚠️ COW	Initially shared; copied on write
.bss (uninitialized data)	❌ No	Private per-process (anonymous pages)

measure_sharing.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Measure actual sharing for a library
 
# 1. Find memory mappings for libc
$ grep libc /proc/$(pgrep -f firefox | head -1)/smaps
7f8b3c022000-7f8b3c1df000 r-xp ... /lib/x86_64-linux-gnu/libc-2.31.so
Size:               1844 kB
Rss:                1204 kB
Pss:                  12 kB   # ← Proportional share! 1204 KB shared by ~100 processes
Shared_Clean:       1204 kB   # ← All clean shared pages
Private_Clean:         0 kB
Private_Dirty:         0 kB
 
7f8b3c1df000-7f8b3c1e9000 rw-p ... /lib/x86_64-linux-gnu/libc-2.31.so
Size:                 40 kB
Rss:                  40 kB
Pss:                  40 kB   # ← All private (data segment)
Shared_Clean:          0 kB
Private_Dirty:        40 kB   # ← Process-private dirty pages
 
# 2. Count processes sharing libc
$ lsof /lib/x86_64-linux-gnu/libc.so.6 2>/dev/null | wc -l
247
 
# 3. Calculate savings
# Without sharing: 247 processes * 1.8 MB code = 445 MB
# With sharing: 1.8 MB (shared) + 247 * 40 KB (private) = 11.7 MB
# Savings: 433 MB (97% reduction!)
 
# 4. System-wide library sharing
$ smem -t -k | tail -10
# Shows RSS vs PSS for all processes, revealing sharing

The PSS Metric

Optimizing Library Sharing

To maximize sharing efficiency:

Technique	Impact	How
Minimize writable sections	More read-only sharing	Avoid global mutable state; use const
Use `-fPIC` correctly	Enable sharing	Required for shared libraries
Prelink libraries	Reduce private GOT updates	`prelink -a` (deprecated due to ASLR conflicts)
Large page support	Fewer TLB entries for shared libs	Transparent Huge Pages for .text
Avoid `dlopen()` in tight loops	Reduce GOT dirtying	Cache handles, load libraries once
Symbol visibility	Reduce relocation needs	`__attribute__((visibility("hidden")))`

Summary: Shared Libraries

We've explored the complete lifecycle and internals of shared libraries. Let's consolidate the key takeaways:

Key Takeaways

•Shared libraries enable massive memory savings — A single copy of library code serves all processes, reducing system memory usage by orders of magnitude.
•Position-Independent Code (PIC) is essential — RIP-relative addressing on x86-64 allows code to work at any load address, enabling the sharing of code pages.
•The GOT/PLT mechanism bridges the gap — The Global Offset Table holds resolved addresses (private), while the Procedure Linkage Table provides shared code trampolines for lazy binding.
•Lazy binding trades startup time for first-call latency — Functions are resolved on first use; LD_BIND_NOW and Full RELRO trade startup cost for security and determinism.
•The dynamic linker orchestrates everything — ld.so loads libraries, resolves symbols, and maintains the complex dependency graph at runtime.
•Symbol versioning ensures compatibility — Multiple symbol versions coexist, allowing binary compatibility across library upgrades spanning decades.
•Not all library content is shared — Code and read-only data share; GOT, data sections, and BSS are per-process. Understanding this is key to memory analysis.

What's Next:

Page Complete