Loading content...
Every program you run—from your web browser to your text editor to your terminal—shares something remarkable: they all use the same copy of the C library in memory. The printf() function that your editor calls is the same physical code that your browser uses.
On a typical Linux desktop with 200 running processes:
Shared libraries are the foundational mechanism enabling practical multi-process computing. But they're far more than memory optimization—they enable:
This page dissects exactly how shared libraries work, from compilation through runtime symbol resolution, revealing the elegant engineering that makes modern software ecosystems possible.
By the end of this page, you will understand: how shared libraries differ from static libraries, the complete loading and linking process, Position-Independent Code (PIC) and its necessity, the Global Offset Table (GOT) and Procedure Linkage Table (PLT), lazy binding and its performance implications, and how the dynamic linker orchestrates runtime symbol resolution. You'll gain practical knowledge for debugging linking issues and optimizing application startup.
Before diving into shared libraries, we must understand the fundamental distinction between static and dynamic linking—two different answers to the question: When and how do we resolve symbols (function calls, variable references) to actual memory addresses?
ld)ld.so)1234567891011121314151617181920212223242526272829
# Compile a simple program with static vs dynamic linking # Dynamic linking (default)$ gcc -o hello_dynamic hello.c$ ls -lh hello_dynamic-rwxr-xr-x 1 user user 16K hello_dynamic $ ldd hello_dynamic # Show dynamic dependencieslinux-vdso.so.1libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6/lib64/ld-linux-x86-64.so.2 # Static linking$ gcc -static -o hello_static hello.c$ ls -lh hello_static-rwxr-xr-x 1 user user 880K hello_static $ ldd hello_staticnot a dynamic executable # No dependencies! # Memory comparison at runtime$ ps -o rss= -p $(pgrep hello_dynamic)1200 # KB resident set$ ps -o rss= -p $(pgrep hello_static)1280 # Similar - both touch similar amounts of code # But with 100 instances of each...# Dynamic: 100 * small_private + 1 * shared_libc ≈ 50 MB total# Static: 100 * full_binary ≈ 88 GB total (each is self-contained)Static linking isn't obsolete—it's the right choice for: (1) Embedded systems with no dynamic linker, (2) Single-process deployments where sharing has no benefit, (3) Container images where reproducibility matters (no external dependencies), (4) Security-critical applications that must avoid library substitution attacks, (5) Performance-critical paths where lazy binding overhead is unacceptable.
A shared library (.so on Linux/Unix, .dll on Windows, .dylib on macOS) is a special type of ELF (Executable and Linkable Format) file designed to be loaded at runtime and shared across processes. Let's examine its structure:
| Section | Contents | Purpose for Sharing |
|---|---|---|
| .text | Executable code | Shared read-only across all processes |
| .rodata | Read-only data (strings, constants) | Shared read-only across all processes |
| .data | Initialized writable data | Copy-on-Write; each process gets private copy on first write |
| .bss | Uninitialized data (zero-filled) | Private per process (zero pages until written) |
| .plt | Procedure Linkage Table | Shared; trampolines for lazy binding |
| .got | Global Offset Table | Private per process; holds resolved addresses |
| .got.plt | GOT for PLT entries | Private; initially hold PLT stub addresses |
| .dynsym | Dynamic symbol table | Read-only; lists exported/imported symbols |
| .rela.dyn | Data relocations | Used at load time to fix address references |
| .rela.plt | PLT relocations | Used for lazy binding of function calls |
123456789101112131415161718192021222324252627282930313233
# Examine a shared library's structure # View ELF headers$ readelf -h /lib/x86_64-linux-gnu/libc.so.6ELF Header: Type: DYN (Shared object file) Entry point address: 0x29cd0 ... # View program headers (segments)$ readelf -l /lib/x86_64-linux-gnu/libc.so.6 | head -30Program Headers: Type Offset VirtAddr FileSiz MemSiz Flg Align LOAD 0x0000000 0x0000000000000000 0x1c041c 0x1c041c R 0x1000 LOAD 0x001ce000 0x00000000001ce000 0x159e0f 0x159e0f R E 0x1000 # TEXT LOAD 0x00328000 0x0000000000328000 0x54d4 0x54d4 RW 0x1000 # DATA # View dynamic section$ readelf -d /lib/x86_64-linux-gnu/libc.so.6Dynamic section: SONAME libc.so.6 NEEDED ld-linux-x86-64.so.2 SYMTAB 0x4e88 STRTAB 0x159f8 ... # List exported symbols$ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep ' printf'00000000000607a0 T printf00000000000603d0 T printf_size # View relocations$ readelf -r /lib/x86_64-linux-gnu/libc.so.6 | head -20The SONAME (Shared Object Name) is the library's canonical identity, typically including the major version: libc.so.6. The actual file may be libc-2.31.so with symlinks libc.so.6 → libc-2.31.so. Executables record the SONAME, not the filename, allowing minor version upgrades without relinking. The linker flag -Wl,-soname,libfoo.so.1 sets this during compilation.
The Problem: Where Will the Library Load?
When compiling a regular executable, the compiler knows exactly where the code will be loaded (at a fixed address specified in the ELF header). But shared libraries face a dilemma:
If library code contained hardcoded addresses like:
mov rax, [0x7f001234] ; Load global variable
call 0x7f005678 ; Call internal function
This code would only work if loaded at exactly address 0x7f000000. Sharing would be impossible because each process would need different addresses in the code.
The Solution: Position-Independent Code
PIC uses relative addressing: all code and data references are expressed relative to the current instruction pointer, not as absolute addresses.
123456789101112131415161718192021222324252627282930313233343536
; ==========================================; Non-PIC code (absolute addressing) ; Cannot be shared - addresses baked into code; ========================================== ; Access global variable 'counter'mov eax, [0x4012a0] ; Absolute address - only works at one load address ; Call function 'helper'call 0x401100 ; Absolute address - breaks if library moves ; ==========================================; PIC code (relative addressing); CAN be shared - all references are relative; ========================================== ; Access global variable via GOT (Global Offset Table)lea rbx, [rip + _GLOBAL_OFFSET_TABLE_] ; RIP-relative: GOT addressmov rax, [rbx + counter@GOTOFF] ; Load 'counter' via GOT offset ; Call internal function (RIP-relative)call helper@PLT ; Through PLT for external symbols; or for internal functions:call .helper - . + rip ; Direct RIP-relative call ; Access local data (RIP-relative)lea rax, [rip + .LC0] ; String constant at RIP-relative address ; ==========================================; How x86-64 enables PIC efficiently; ==========================================; The RIP-relative addressing mode:; [rip + displacement]; ; This computes: address_of_next_instruction + displacement; Since the displacement is relative, the code works at ANY load address!Why RIP-Relative Addressing Works
Consider code at runtime:
Instruction at address: 0x7f00001000
[rip + displacement], where displacement = 0x500
Accessed address: 0x7f00001000 + (instruction size) + 0x500
= 0x7f00001507 (assuming 7-byte instruction)
Now if the same code loads at a different address:
Instruction at address: 0x7e00001000
[rip + displacement], where displacement = 0x500 (unchanged!)
Accessed address: 0x7e00001000 + 7 + 0x500
= 0x7e00001507
The relative relationship between the instruction and the data is preserved, regardless of where the library loads. The code bytes are identical, hence shareable.
32-bit x86 lacks RIP-relative addressing. PIC on x86-32 requires obtaining the current instruction address through a call/pop sequence (since call pushes the return address), then computing offsets from there. This 'thunk' pattern adds overhead and complexity: call __x86.get_pc_thunk.bx; add ebx, GLOBAL_OFFSET_TABLE_. This is why PIC had more performance impact on 32-bit systems.
-fno-pic flag disables it.Even with position-independent code, shared libraries face another challenge: external symbols. When library A calls a function in library B, where is that function? The address isn't known until runtime when both libraries are loaded.
The Global Offset Table (GOT) and Procedure Linkage Table (PLT) solve this problem elegantly.
What is the GOT?
The GOT is a table of pointers located in the data segment (writable memory). Each entry in the GOT holds the actual runtime address of a global symbol (function or variable) from another library.
How it works:
// Source code
int x = external_var;
// Compiled (conceptually)
GOT[external_var_slot] = <resolved at runtime>
int x = *GOT[external_var_slot];
At load time, the dynamic linker fills in GOT entries with actual addresses.
At runtime, the code reads through the GOT to access external data.
GOT structure per process:
GOT[0]: Reserved (pointer to _DYNAMIC)
GOT[1]: Pointer to link_map structure
GOT[2]: Pointer to resolver function (for lazy binding)
GOT[3]: Address of first external symbol
GOT[4]: Address of second external symbol
...
Why the GOT is per-process:
The GOT is in writable memory and contains addresses specific to how libraries are loaded in each process. Process A might load libX.so at address 0x7f10... while Process B loads it at 0x7f20... Their GOT entries for libX symbols differ.
The GOT cannot be shared — it's part of the private data segment.
The dynamic linker (also called the runtime linker) is the unsung hero that makes shared libraries work. On Linux, it's typically ld-linux.so or ld-linux-x86-64.so.2. Let's trace exactly what happens when a dynamically-linked program starts.
123456789101112131415161718192021222324252627282930
# Trace dynamic linker activity # See library search and loading$ LD_DEBUG=libs ./my_program 12345: find library=libc.so.6 [0]; searching 12345: search path=/lib/x86_64-linux-gnu (system search path) 12345: trying file=/lib/x86_64-linux-gnu/libc.so.6 12345: 12345: calling init: /lib/x86_64-linux-gnu/libc.so.6 # See symbol resolution$ LD_DEBUG=bindings ./my_program 12345: binding file ./my_program [0] to /lib/x86_64-linux-gnu/libc.so.6 symbol 'printf' [GLIBC_2.2.5] 12345: binding file ./my_program [0] to /lib/x86_64-linux-gnu/libc.so.6 symbol 'malloc' [GLIBC_2.2.5] # See all details$ LD_DEBUG=all ./my_program 2>&1 | head -100 # See what ld.so would load without running the program$ ldd ./my_program linux-vdso.so.1 (0x00007ffcc37fe000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8b3c200000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8b3c022000) /lib64/ld-linux-x86-64.so.2 (0x00007f8b3c600000) # Show library search order$ ldconfig -p | grep libc libc.so.6 (libc6,x86-64) => /lib/x86_64-linux-gnu/libc.so.6| Priority | Source | Purpose |
|---|---|---|
| 1 | DT_RPATH in executable | Legacy; rarely used (security concerns) |
| 2 | LD_LIBRARY_PATH environment | Development/testing; not for production |
| 3 | DT_RUNPATH in executable | Modern alternative to RPATH |
| 4 | /etc/ld.so.cache | Cached search paths (ldconfig -p) |
| 5 | Default paths (/lib, /usr/lib) | System libraries |
LD_LIBRARY_PATH is ignored for setuid/setgid programs to prevent privilege escalation attacks. An attacker could otherwise point it to a directory containing a malicious libc.so.6. For production deployments, use /etc/ld.so.conf.d/ or RUNPATH embedded in the executable instead.
How does glibc maintain binary compatibility across decades? A program compiled in 2005 still runs on a 2024 system. The answer is symbol versioning, a sophisticated mechanism that allows multiple implementations of the same symbol to coexist.
The Problem:
Suppose memcpy behavior changes between glibc versions:
memcpy handles overlapping regionsmemcpy is faster but doesn't handle overlaps (use memmove)Old programs relying on the overlap behavior would break on the new system.
The Solution: Symbol Versions
The library provides multiple versions of memcpy:
memcpy@GLIBC_2.2.5 → old implementation
memcpy@@GLIBC_2.14 → new implementation (default for new programs)
Programs compiled against old glibc record they need memcpy@GLIBC_2.2.5. New programs get memcpy@GLIBC_2.14. The library contains both!
123456789101112131415161718192021222324252627282930
# View versioned symbols in libc$ readelf -sV /lib/x86_64-linux-gnu/libc.so.6 | grep memcpy 1284: 00000000000a9100 436 IFUNC GLOBAL DEFAULT 16 memcpy@@GLIBC_2.14 1285: 00000000000a90d0 9 FUNC GLOBAL DEFAULT 16 memcpy@GLIBC_2.2.5 # The @@ symbol is the default (for newly linked programs)# The @ symbol is for backward compatibility # Check what version your program needs$ readelf -V ./my_programVersion needs section '.gnu.version_r': Addr: 0x00000000004003e8 Offset: 0x0003e8 Link: 6 (.dynstr) 000000: Version: 1 File: libc.so.6 Cnt: 1 0x0010: Name: GLIBC_2.2.5 Flags: none Version: 2 # Creating versioned symbols in your own library$ cat version.mapMYLIB_1.0 { global: my_function; local: *;}; MYLIB_2.0 { global: my_function; # New version} MYLIB_1.0; # Inherits from 1.0 $ gcc -shared -o libmy.so my.c -Wl,--version-script=version.mapTo find the minimum glibc version required by a binary: objdump -T ./binary | grep GLIBC | sed 's/.*GLIBC_//' | sort -V | tail -1. This helps when deploying to older systems. If your program requires GLIBC_2.28 but the target has 2.17, you'll need static linking or a compatibility library.
Understanding the real memory savings from shared libraries requires careful measurement. Not all parts of a library are shared equally, and the savings depend on how many processes use the library.
| Section | Shareable? | Reason |
|---|---|---|
| .text (code) | ✅ Yes | Read-only, execute-only; identical across processes |
| .rodata (constants) | ✅ Yes | Read-only; identical across processes |
| .plt (procedure linkage) | ✅ Yes | Read-only code stubs; identical |
| .got.plt (GOT for PLT) | ❌ No | Writable; contains process-specific addresses |
| .got (global offset table) | ❌ No | Writable; process-specific |
| .data (initialized data) | ⚠️ COW | Initially shared; copied on write |
| .bss (uninitialized data) | ❌ No | Private per-process (anonymous pages) |
12345678910111213141516171819202122232425262728293031
# Measure actual sharing for a library # 1. Find memory mappings for libc$ grep libc /proc/$(pgrep -f firefox | head -1)/smaps7f8b3c022000-7f8b3c1df000 r-xp ... /lib/x86_64-linux-gnu/libc-2.31.soSize: 1844 kBRss: 1204 kBPss: 12 kB # ← Proportional share! 1204 KB shared by ~100 processesShared_Clean: 1204 kB # ← All clean shared pagesPrivate_Clean: 0 kBPrivate_Dirty: 0 kB 7f8b3c1df000-7f8b3c1e9000 rw-p ... /lib/x86_64-linux-gnu/libc-2.31.soSize: 40 kBRss: 40 kBPss: 40 kB # ← All private (data segment)Shared_Clean: 0 kBPrivate_Dirty: 40 kB # ← Process-private dirty pages # 2. Count processes sharing libc$ lsof /lib/x86_64-linux-gnu/libc.so.6 2>/dev/null | wc -l247 # 3. Calculate savings# Without sharing: 247 processes * 1.8 MB code = 445 MB# With sharing: 1.8 MB (shared) + 247 * 40 KB (private) = 11.7 MB# Savings: 433 MB (97% reduction!) # 4. System-wide library sharing$ smem -t -k | tail -10# Shows RSS vs PSS for all processes, revealing sharingProportional Set Size (PSS) is the fairest measure of per-process memory usage. For shared pages, PSS divides the page size by the number of sharers. If 100 processes share a 4 KB page, each is charged 40 bytes of PSS. This gives a true picture of what memory would be freed if a process exited.
Optimizing Library Sharing
To maximize sharing efficiency:
| Technique | Impact | How |
|---|---|---|
| Minimize writable sections | More read-only sharing | Avoid global mutable state; use const |
Use -fPIC correctly | Enable sharing | Required for shared libraries |
| Prelink libraries | Reduce private GOT updates | prelink -a (deprecated due to ASLR conflicts) |
| Large page support | Fewer TLB entries for shared libs | Transparent Huge Pages for .text |
Avoid dlopen() in tight loops | Reduce GOT dirtying | Cache handles, load libraries once |
| Symbol visibility | Reduce relocation needs | __attribute__((visibility("hidden"))) |
We've explored the complete lifecycle and internals of shared libraries. Let's consolidate the key takeaways:
What's Next:
Now that we understand how shared libraries enable code sharing across processes, we'll explore inter-process shared memory — a mechanism that allows processes to share data segments for explicit communication. Unlike the implicit sharing of library code, shared memory for IPC requires coordination, synchronization, and careful security considerations.
You now understand shared libraries at a systems programming level: PIC, GOT, PLT, lazy binding, the dynamic linker, and symbol versioning. This knowledge is essential for debugging linking issues, optimizing application startup, and understanding memory usage in production systems.