Ext2 Ext3 Ext4 - Learning Module

Loading content...

0/227

Linux Extended File System

The File System That Powers Linux

When Linus Torvalds released Linux 0.01 in 1991, the fledgling operating system borrowed the Minix file system from Andrew Tanenbaum's educational OS. This was a practical necessity—Linux needed some way to store files—but Minix's limitations quickly became apparent. It supported filenames of only 14 characters and partitions no larger than 64 megabytes. Even by early 1990s standards, these constraints were unacceptable for a system with ambitions beyond educational use.

This limitation sparked a revolution: the development of the Extended File System (ext) family, which would grow to become the most influential file system lineage in Linux history. From ext2's elegant simplicity to ext3's reliability breakthrough to ext4's modern performance optimizations, this file system family has evolved alongside Linux itself, powering everything from embedded devices to the world's most powerful supercomputers.

Understanding the ext file system family isn't merely academic exercise—it's essential knowledge for any systems programmer, Linux administrator, or operating systems engineer. The design patterns, trade-offs, and engineering decisions embodied in ext2/ext3/ext4 represent decades of accumulated wisdom about persistent storage management.

What You Will Learn

By the end of this page, you will understand the historical evolution of the ext file system family, the design philosophy that guided each iteration, and the high-level architecture shared across ext2, ext3, and ext4. You'll gain insight into why Linux needed its own file system and how each version addressed the limitations of its predecessor.

Historical Evolution: From Minix to ext4

The ext file system family represents one of the most significant engineering efforts in Linux history. Each iteration addressed specific limitations while maintaining backward compatibility and design philosophy coherence.

The Minix File System Era (1991)

Linux's journey began with the Minix file system, designed by Andrew Tanenbaum for his educational operating system:

14-character filename limit: Severely restrictive for practical use
64 MB maximum partition size: Inadequate even for 1991 hard drives
Simple but limiting design: Single inode table, single block group
No optimization for modern storage: No consideration for disk caching or sequential access patterns

These limitations weren't Minix's fault—it was designed for teaching, not production use. But Linux needed more.

Evolution of the Extended File System Family
File System	Year	Key Innovation	Maximum Limits
Minix	1987	Educational simplicity	64 MB volume, 14-char names
ext	1992	Extended limits	2 GB volume, 255-char names
ext2	1993	Block groups, reliability	4 TB volume (later 32 TB)
ext3	2001	Journaling	Same as ext2 + crash recovery
ext4	2008	Extents, 48-bit addressing	1 EB volume, 16 TB files

The Original ext (1992)

Rémy Card developed the first extended file system to overcome Minix's limitations:

Extended filename limit to 255 characters
Supported volumes up to 2 GB
Introduced Virtual File System (VFS) compatibility
Still used a simple structure without block groups

However, ext had its own problems: fragmentation, poor performance with file modifications, and timestamp limitations. These issues drove the development of ext2.

ext2: The Foundation (1993)

Rémy Card, along with Theodore Ts'o and Stephen Tweedie, designed ext2 as a complete rethinking of the extended file system concept:

Block group organization: Partitioned the disk for improved locality
Separation of metadata and data: Inodes separate from file content
Preallocated inode space: Fixed allocation strategy for metadata
Sparse superblock placement: Redundancy without excessive overhead

ext2 became the standard Linux file system for nearly a decade, proven reliable for daily use.

Converting Mermaid diagram...

ext3: Reliability Through Journaling (2001)

Stephen Tweedie led the development of ext3, focusing on a critical weakness: crash recovery. ext2 could suffer data corruption after unexpected shutdowns:

Journaling implementation: Write-ahead logging for metadata operations
Three journaling modes: Flexibility between performance and safety
Backward compatibility: ext3 partitions readable as ext2
Online resizing: Grow file systems without unmounting

ext3's journaling eliminated the dreaded fsck runs that could take hours on large partitions.

ext4: Modern Requirements (2008)

Theodore Ts'o led ext4 development, addressing ext3's scalability limits:

Extents replace indirect blocks: Dramatically better large file handling
48-bit block addressing: Theoretical 1 EB volume support
Delayed allocation: Improved sequential write performance
Persistent preallocation: Guaranteed contiguous space
Nanosecond timestamps: Precision for modern applications

ext4 remains the default file system for most Linux distributions today, balancing maturity, performance, and reliability.

Evolutionary, Not Revolutionary

A key design principle throughout the ext family is evolutionary development. Each version builds on its predecessor rather than starting fresh, maintaining compatibility while adding features. ext3 can mount ext2 partitions directly; ext4 maintains ext3 compatibility mode. This approach minimized risk and simplified migration.

Design Philosophy and Principles

The ext file system family embodies specific design principles that guided its development across three decades. Understanding these principles illuminates why the file systems work the way they do.

Principle 1: Locality of Reference

The ext file systems are designed around the assumption that files in the same directory are frequently accessed together, and that sequential file access is common:

Block groups keep related inodes and their data blocks close
Directory inodes are spread across block groups to distribute load
Preallocated blocks encourage data locality

This design optimizes for rotational disk seeks while remaining beneficial for SSDs.

Core Design Principles

•Locality of Reference — Keep related data physically close to minimize seek time and improve cache efficiency
•Reliability Over Performance — Prefer designs that maintain data integrity, even at modest performance cost
•Backward Compatibility — New features should not break ability to mount older file systems
•Unix Semantics — Full support for POSIX permissions, symbolic links, hard links, and special files
•Simplicity Where Possible — Avoid complexity unless the feature provides clear, substantial benefit
•Graceful Degradation — Corruption in one area should not cascade to unrelated data

Principle 2: Reliability Over Performance

The ext designers consistently prioritized data safety:

ext2's synchronous metadata writes (at performance cost)
ext3's journaling even when it added write amplification
ext4's barriers and checksums for data integrity

This philosophy reflects Linux's server heritage where data loss is unacceptable.

Principle 3: POSIX Compatibility

Unlike FAT or NTFS, the ext family was designed from the ground up for Unix semantics:

Proper permission bits (owner/group/other, rwx)
Hard and symbolic links
Special files (devices, sockets, FIFOs)
Case-sensitive filenames
Inode-based metadata separation

Principle 4: Separation of Concerns

The ext architecture cleanly separates:

Inodes: Metadata about files (timestamps, permissions, block pointers)
Data blocks: Actual file content
Directory entries: Mappings from names to inode numbers
Superblock: File system metadata

This separation enables efficient metadata operations without touching file data and vice versa.

Why This Matters

Understanding design philosophy helps predict behavior in unfamiliar situations. When debugging file system issues or designing applications, knowing that ext prioritizes reliability over raw performance explains many default behaviors (like sync operations, journal writes, and barrier usage).

High-Level Architecture Overview

All three ext file systems share a common high-level architecture, with ext3 and ext4 adding features on top of ext2's foundation. Let's examine the fundamental structure.

The Partition Layout

An ext file system divides the storage partition into fixed-size block groups. Block groups are the fundamental organizational unit, containing both data and metadata.

Converting Mermaid diagram...

Key Components:

Boot Sector (1024 bytes): Reserved for bootloader code, not managed by the file system
Superblock: Contains file system metadata—block size, inode count, free block count, state flags, and more. Critical for mounting.
Group Descriptors: Table describing each block group—locations of bitmaps, inode tables, and free counts.
Block Bitmap: One bit per data block in the group, tracking allocation status.
Inode Bitmap: One bit per inode slot in the group, tracking which inodes are in use.
Inode Table: Array of inode structures containing file metadata.
Data Blocks: Actual file and directory content.

Sparse Superblock Feature

Storingbackups of the superblock and group descriptors in every block group would waste significant space. ext2+ uses "sparse superblock" placement—backups stored only in block groups 0, 1, and powers of 3, 5, and 7:

Groups with superblock backups: 0, 1, 3, 5, 7, 9, 25, 27, 49, 81, 125, 243, ...

This provides redundancy for recovery while minimizing overhead.

Block Group Component Sizes (4KB block size example)
Component	Typical Size	Purpose
Superblock	1 block (4 KB)	File system metadata
Group Descriptors	Variable (depends on group count)	Per-group metadata
Block Bitmap	1 block = 32,768 blocks tracked	Block allocation status
Inode Bitmap	1 block = 32,768 inodes tracked	Inode allocation status
Inode Table	Variable (inodes per group × 256 bytes)	File metadata storage
Data Blocks	Remainder of group	File and directory content

Block Size Matters

The block size (1KB, 2KB, or 4KB) is chosen at file system creation and cannot be changed. Larger blocks reduce metadata overhead and improve sequential performance but waste space for small files. 4KB is the modern default, matching typical page sizes and storage sector sizes.

Inodes in the ext File System

The inode is the cornerstone of Unix file system design, and the ext family implements inodes with careful attention to efficiency and extensibility.

What an Inode Contains

Each inode stores everything about a file except its name and content:

ext4_inode_structure.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Simplified ext4 inode structure (actual struct is ext4_inode)
struct ext4_inode {
    __le16  i_mode;         // File type and permissions
    __le16  i_uid;          // Owner user ID (low 16 bits)
    __le32  i_size_lo;      // Size in bytes (low 32 bits)
    __le32  i_atime;        // Last access time
    __le32  i_ctime;        // Last inode change time
    __le32  i_mtime;        // Last modification time
    __le32  i_dtime;        // Deletion time
    __le16  i_gid;          // Owner group ID (low 16 bits)
    __le16  i_links_count;  // Number of hard links
    __le32  i_blocks_lo;    // Block count (512-byte units, low 32 bits)
    __le32  i_flags;        // File flags (immutable, append-only, etc.)
    
    union {
        // ext2/ext3: 12 direct + 3 indirect block pointers
        struct {
            __le32  i_block[15];  // Block pointers
        };
        // ext4: Extent tree root
        struct {
            struct ext4_extent_header   i_extent_header;
            struct ext4_extent          i_extent[4];
        };
    };
    
    __le32  i_generation;   // File version (NFS)
    __le32  i_file_acl_lo;  // Extended attributes block
    __le32  i_size_high;    // Size in bytes (high 32 bits)
    __le32  i_obso_faddr;   // Obsolete fragment address
    
    // Additional fields for ext4 (256 bytes total)
    __le16  i_extra_isize;  // Size of extra inode fields
    __le16  i_checksum_hi;  // Inode checksum (high 16 bits)
    __le32  i_ctime_extra;  // Extra ctime precision (nsec << 2 | epoch)
    __le32  i_mtime_extra;  // Extra mtime precision
    __le32  i_atime_extra;  // Extra atime precision
    __le32  i_crtime;       // File creation time
    __le32  i_crtime_extra; // Extra creation time precision
    __le32  i_version_hi;   // Inode version (high 32 bits)
    __le32  i_projid;       // Project ID
};

Inode Size Evolution

Version	Inode Size	Key Additions
ext2	128 bytes	Core metadata, 15 block pointers
ext3	128+ bytes	Optional larger inodes
ext4	256 bytes (default)	Nanosecond timestamps, creation time, checksums

The Block Pointer Array

In ext2/ext3, the i_block[15] array uses a hierarchical scheme:

i_block[0-11]: Direct pointers to data blocks
i_block[12]: Single indirect block (pointer to block of pointers)
i_block[13]: Double indirect block (pointer to pointers to pointers)
i_block[14]: Triple indirect block (three levels of indirection)

With 4KB blocks and 4-byte pointers:

Type	Pointers	Blocks Addressed	Max Data
12 Direct	12	12	48 KB
1 Indirect	1024	1,024	4 MB
1 Double	1024²	1,048,576	4 GB
1 Triple	1024³	1,073,741,824	4 TB

ext4's Extent Revolution

ext4 replaces this indirect block scheme with extents—ranges of contiguous blocks:

Extent: (logical block 0, physical block 1000, length 500)
        "Logical blocks 0-499 map to physical blocks 1000-1499"

A single extent can describe millions of contiguous blocks, dramatically reducing metadata overhead for large files. We'll explore extents in depth later in this module.

Inode Count is Fixed

The number of inodes is determined at file system creation (using the -N or bytes-per-inode parameters to mke2fs). Once created, you cannot add more inodes. A system storing millions of tiny files can run out of inodes while having plenty of free space. Default ratio: one inode per 16KB of storage.

Directory Implementation

In the ext file system family, directories are special files whose data blocks contain directory entries. The directory structure has evolved significantly:

Classic ext2 Directory Entries (Linear List)

The original ext2 directory format uses variable-length entries in a linear list:

ext2_dir_entry.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// ext2 directory entry (original format)
struct ext2_dir_entry {
    __le32  inode;          // Inode number (0 = unused entry)
    __le16  rec_len;        // Length of this entry (for alignment)
    __le16  name_len;       // Actual filename length
    char    name[];         // Filename (NOT null-terminated)
};
 
// ext2 directory entry v2 (with file type)
struct ext2_dir_entry_2 {
    __le32  inode;          // Inode number
    __le16  rec_len;        // Entry length
    __u8    name_len;       // Filename length
    __u8    file_type;      // File type (dir, regular, symlink, etc.)
    char    name[];         // Filename
};
 
// File type values
#define EXT2_FT_UNKNOWN     0
#define EXT2_FT_REG_FILE    1   // Regular file
#define EXT2_FT_DIR         2   // Directory
#define EXT2_FT_CHRDEV      3   // Character device
#define EXT2_FT_BLKDEV      4   // Block device
#define EXT2_FT_FIFO        5   // Named pipe
#define EXT2_FT_SOCK        6   // Socket
#define EXT2_FT_SYMLINK     7   // Symbolic link

Directory Layout Example:

+--------+--------+--------+----------+--------+--------+--------+----------+
| inode  | rec_len| name   | name     | inode  | rec_len| name   | name     |
| 2      | 12     | len=1  | "."     | 2      | 12     | len=2  | ".."    |
+--------+--------+--------+----------+--------+--------+--------+----------+
| inode  | rec_len| name   | name              | inode  | rec_len| name     |
| 45678  | 20     | len=10 | "hello.txt"      | 12345  | <rem>  | "dir"   |
+--------+--------+--------+-------------------+--------+--------+----------+

Key observations:

Entries are packed consecutively within data blocks
rec_len allows variable-size entries (including padding)
Deleted entries have inode = 0 but remain in the list
Last entry's rec_len extends to block end

The Performance Problem

Linear search through directory entries has O(n) complexity. For directories with thousands of files, operations like ls or stat became painfully slow.

Hash Tree Directories (HTree)

ext3 introduced optional hash tree (HTree) indexing for directories:

Directory blocks organized as a B-tree variant
Entries hashed by filename for fast lookup
Falls back to linear if tree becomes corrupted
Enabled by default in ext3/ext4

Converting Mermaid diagram...

Directory Lookup Performance Comparison
Structure	Lookup Time	Insert Time	Practical Limit
Linear (ext2 default)	O(n)	O(1) amortized	~1,000 entries
HTree (ext3+)	O(log n)	O(log n)	~10,000,000+ entries
With inline data	O(1) for tiny dirs	N/A	~60 bytes of entries

Feature Comparison: ext2 vs ext3 vs ext4

While sharing a common architecture, each ext version introduced significant features. Let's compare them systematically.

Comprehensive Feature Comparison
Feature	ext2	ext3	ext4
Maximum volume size	4 TB (later 32 TB)	4/16/32 TB	1 EB (exabyte)
Maximum file size	2 TB	2 TB	16 TB
Maximum filename length	255 bytes	255 bytes	255 bytes
Journaling	❌ None	✅ Yes	✅ Yes (with checksums)
Block allocation	Indirect blocks	Indirect blocks	Extents + indirect fallback
Online defragmentation	❌ No	❌ No	✅ Via e4defrag
Delayed allocation	❌ No	❌ No	✅ Yes
Persistent preallocation	❌ No	❌ No	✅ fallocate()
Subsecond timestamps	❌ No	❌ No	✅ Nanosecond
Creation timestamp	❌ No	❌ No	✅ Yes
Directory indexing	❌ Linear only	✅ HTree	✅ HTree
Metadata checksums	❌ No	❌ No	✅ Optional
Barrier support	N/A	✅ Yes	✅ Yes (default)
Quota journaling	❌ No	❌ No	✅ Yes

Why ext2 Still Has a Place

Despite being "outdated," ext2 remains valuable:

Boot partitions: No journal overhead for read-mostly /boot
Flash storage with wear-leveling: Less write amplification
Embedded systems: Simpler, smaller driver code
Recovery scenarios: Easier to repair without journal complexity

Why ext3 Over ext2

The single killer feature: crash recovery. Without journaling:

Unclean shutdown requires full fsck (filesystem check)
fsck on large volumes can take hours
Data structures may be inconsistent
Risk of data loss during recovery

With journaling:

Recovery replays committed journal transactions (seconds)
Guaranteed metadata consistency
Much lower risk of corruption

Why ext4 Over ext3

For modern systems, ext4's advantages are compelling:

Extents: Dramatically better large file performance
Delayed allocation: Improved write throughput
Multiblock allocation: Reduced fragmentation
Nanosecond timestamps: Required by modern applications (databases, logging)
Persistent preallocation: Guaranteed space for critical files

Migration Path

ext partitions can often be upgraded in place: ext2 → ext3 by adding a journal (tune2fs -j), ext3 → ext4 by enabling extent support (tune2fs -O extent). However, in-place upgrades don't convert existing files to use new features—only newly created files benefit.

ext4 in Modern Linux Systems

ext4 remains the default file system for most Linux distributions, despite competition from newer options like btrfs and XFS. Understanding its role helps contextualize technical decisions.

Default in Major Distributions:

Ubuntu: ext4 default since 10.04 (2010)
Fedora: ext4 default (with btrfs option)
RHEL/CentOS: XFS default for /, ext4 widely supported
Debian: ext4 default
Android: ext4 for system/data until recent encrypted F2FS adoption

Why ext4 Endures:

ext4 Strengths

•Maturity: 15+ years of production testing
•Reliability: Proven crash recovery
•Performance: Excellent for most workloads
•Tool support: Extensive utilities
•Documentation: Well-understood behavior
•Compatibility: Works everywhere Linux runs

ext4 Limitations

•No snapshots: Unlike btrfs/ZFS
•No built-in compression: Must use eCryptfs/fscrypt
•No built-in RAID: Requires mdadm/LVM
•No data checksums: Metadata only
•Limited online shrink: Grow only
•32K subdirectory limit: Per directory

Typical ext4 Usage Scenarios:

Scenario	Configuration	Notes
Root filesystem	Default mount options	Balanced performance/safety
/home	user_xattr,acl	Extended attributes for desktop integration
Database storage	noatime,nodiratime,data=ordered	Reduce metadata writes
Log aggregation	noatime,nobarrier*	Maximum write performance
Virtual machine images	discard,noatime	SSD optimization
Container storage	noatime,journal_checksum	Reliability with performance

*nobarrier reduces safety but improves performance; use only with battery-backed cache.

Performance Tuning Parameters:

# Create optimized ext4 filesystem
mkfs.ext4 -O ^metadata_csum,^64bit -E stride=16,stripe_width=64 /dev/sda1

# Mount with performance options
mount -o noatime,commit=60,barrier=0,data=writeback /dev/sda1 /mnt

# Adjust runtime parameters
echo 0 > /proc/fs/ext4/sda1/max_batch_time
echo 5 > /proc/sys/vm/laptop_mode  # Aggressive write coalescing

Emerging Competition

While ext4 remains dominant, btrfs offers snapshots and checksums, XFS excels at parallel I/O, and F2FS is optimized for flash storage. For new deployments, evaluate whether ext4's stability or competitors' features better match requirements.

Summary: The Linux Extended File System Family

We've established the foundational understanding of the ext file system family. Let's consolidate the key concepts before diving into detailed component analysis:

Key Takeaways

•The ext family evolved to address specific Linux needs — From Minix's limitations through ext4's modern features, each version solved real problems while maintaining compatibility.
•Block groups are the fundamental organizational unit — Partitions are divided into block groups containing related metadata and data, improving locality.
•Inodes store all metadata except filename — The separation of metadata (inode) from naming (directory entry) from content (data blocks) is central to ext design.
•Journaling was ext3's breakthrough — Write-ahead logging eliminated hours-long fsck operations after crashes.
•ext4 introduced extents for efficiency — Replacing indirect blocks with contiguous extent ranges dramatically improved large file handling.
•Design prioritizes reliability — Consistency and data safety take precedence over raw performance throughout the family.

What's Next:

The following pages dive deep into the specific components:

Block Groups: How partitioning the disk into groups improves locality and enables parallelism
Superblock: The critical metadata structure that defines file system parameters
Ext3 Journaling: Write-ahead logging implementation and journaling modes
Ext4 Extents: Modern extent-based allocation replacing indirect block pointers

With this foundational understanding, you're prepared to explore each component in the depth required for true mastery of Linux file system internals.

Page Complete

You now understand the historical evolution, design philosophy, and high-level architecture of the ext file system family. This foundation will support your understanding of block groups, superblocks, journaling, and extent allocation in the pages that follow.

Linux Extended File System

The File System That Powers Linux

What You Will Learn

Historical Evolution: From Minix to ext4

The Minix File System Era (1991)

Linux's journey began with the Minix file system, designed by Andrew Tanenbaum for his educational operating system:

14-character filename limit: Severely restrictive for practical use
64 MB maximum partition size: Inadequate even for 1991 hard drives
Simple but limiting design: Single inode table, single block group
No optimization for modern storage: No consideration for disk caching or sequential access patterns

These limitations weren't Minix's fault—it was designed for teaching, not production use. But Linux needed more.

Evolution of the Extended File System Family
File System	Year	Key Innovation	Maximum Limits
Minix	1987	Educational simplicity	64 MB volume, 14-char names
ext	1992	Extended limits	2 GB volume, 255-char names
ext2	1993	Block groups, reliability	4 TB volume (later 32 TB)
ext3	2001	Journaling	Same as ext2 + crash recovery
ext4	2008	Extents, 48-bit addressing	1 EB volume, 16 TB files

The Original ext (1992)

Rémy Card developed the first extended file system to overcome Minix's limitations:

Extended filename limit to 255 characters
Supported volumes up to 2 GB
Introduced Virtual File System (VFS) compatibility
Still used a simple structure without block groups

However, ext had its own problems: fragmentation, poor performance with file modifications, and timestamp limitations. These issues drove the development of ext2.

ext2: The Foundation (1993)

Rémy Card, along with Theodore Ts'o and Stephen Tweedie, designed ext2 as a complete rethinking of the extended file system concept:

Block group organization: Partitioned the disk for improved locality
Separation of metadata and data: Inodes separate from file content
Preallocated inode space: Fixed allocation strategy for metadata
Sparse superblock placement: Redundancy without excessive overhead

ext2 became the standard Linux file system for nearly a decade, proven reliable for daily use.

Converting Mermaid diagram...

ext3: Reliability Through Journaling (2001)

Stephen Tweedie led the development of ext3, focusing on a critical weakness: crash recovery. ext2 could suffer data corruption after unexpected shutdowns:

Journaling implementation: Write-ahead logging for metadata operations
Three journaling modes: Flexibility between performance and safety
Backward compatibility: ext3 partitions readable as ext2
Online resizing: Grow file systems without unmounting

ext3's journaling eliminated the dreaded fsck runs that could take hours on large partitions.

ext4: Modern Requirements (2008)

Theodore Ts'o led ext4 development, addressing ext3's scalability limits:

Extents replace indirect blocks: Dramatically better large file handling
48-bit block addressing: Theoretical 1 EB volume support
Delayed allocation: Improved sequential write performance
Persistent preallocation: Guaranteed contiguous space
Nanosecond timestamps: Precision for modern applications

ext4 remains the default file system for most Linux distributions today, balancing maturity, performance, and reliability.

Evolutionary, Not Revolutionary

Design Philosophy and Principles

Principle 1: Locality of Reference

The ext file systems are designed around the assumption that files in the same directory are frequently accessed together, and that sequential file access is common:

Block groups keep related inodes and their data blocks close
Directory inodes are spread across block groups to distribute load
Preallocated blocks encourage data locality

This design optimizes for rotational disk seeks while remaining beneficial for SSDs.

Core Design Principles

•Locality of Reference — Keep related data physically close to minimize seek time and improve cache efficiency
•Reliability Over Performance — Prefer designs that maintain data integrity, even at modest performance cost
•Backward Compatibility — New features should not break ability to mount older file systems
•Unix Semantics — Full support for POSIX permissions, symbolic links, hard links, and special files
•Simplicity Where Possible — Avoid complexity unless the feature provides clear, substantial benefit
•Graceful Degradation — Corruption in one area should not cascade to unrelated data

Principle 2: Reliability Over Performance

The ext designers consistently prioritized data safety:

ext2's synchronous metadata writes (at performance cost)
ext3's journaling even when it added write amplification
ext4's barriers and checksums for data integrity

This philosophy reflects Linux's server heritage where data loss is unacceptable.

Principle 3: POSIX Compatibility

Unlike FAT or NTFS, the ext family was designed from the ground up for Unix semantics:

Proper permission bits (owner/group/other, rwx)
Hard and symbolic links
Special files (devices, sockets, FIFOs)
Case-sensitive filenames
Inode-based metadata separation

Principle 4: Separation of Concerns

The ext architecture cleanly separates:

Inodes: Metadata about files (timestamps, permissions, block pointers)
Data blocks: Actual file content
Directory entries: Mappings from names to inode numbers
Superblock: File system metadata

This separation enables efficient metadata operations without touching file data and vice versa.

Why This Matters

High-Level Architecture Overview

All three ext file systems share a common high-level architecture, with ext3 and ext4 adding features on top of ext2's foundation. Let's examine the fundamental structure.

The Partition Layout

An ext file system divides the storage partition into fixed-size block groups. Block groups are the fundamental organizational unit, containing both data and metadata.

Converting Mermaid diagram...

Key Components:

Boot Sector (1024 bytes): Reserved for bootloader code, not managed by the file system
Superblock: Contains file system metadata—block size, inode count, free block count, state flags, and more. Critical for mounting.
Group Descriptors: Table describing each block group—locations of bitmaps, inode tables, and free counts.
Block Bitmap: One bit per data block in the group, tracking allocation status.
Inode Bitmap: One bit per inode slot in the group, tracking which inodes are in use.
Inode Table: Array of inode structures containing file metadata.
Data Blocks: Actual file and directory content.

Sparse Superblock Feature

Groups with superblock backups: 0, 1, 3, 5, 7, 9, 25, 27, 49, 81, 125, 243, ...

This provides redundancy for recovery while minimizing overhead.

Block Group Component Sizes (4KB block size example)
Component	Typical Size	Purpose
Superblock	1 block (4 KB)	File system metadata
Group Descriptors	Variable (depends on group count)	Per-group metadata
Block Bitmap	1 block = 32,768 blocks tracked	Block allocation status
Inode Bitmap	1 block = 32,768 inodes tracked	Inode allocation status
Inode Table	Variable (inodes per group × 256 bytes)	File metadata storage
Data Blocks	Remainder of group	File and directory content

Block Size Matters

Inodes in the ext File System

The inode is the cornerstone of Unix file system design, and the ext family implements inodes with careful attention to efficiency and extensibility.

What an Inode Contains

Each inode stores everything about a file except its name and content:

ext4_inode_structure.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Simplified ext4 inode structure (actual struct is ext4_inode)
struct ext4_inode {
    __le16  i_mode;         // File type and permissions
    __le16  i_uid;          // Owner user ID (low 16 bits)
    __le32  i_size_lo;      // Size in bytes (low 32 bits)
    __le32  i_atime;        // Last access time
    __le32  i_ctime;        // Last inode change time
    __le32  i_mtime;        // Last modification time
    __le32  i_dtime;        // Deletion time
    __le16  i_gid;          // Owner group ID (low 16 bits)
    __le16  i_links_count;  // Number of hard links
    __le32  i_blocks_lo;    // Block count (512-byte units, low 32 bits)
    __le32  i_flags;        // File flags (immutable, append-only, etc.)
    
    union {
        // ext2/ext3: 12 direct + 3 indirect block pointers
        struct {
            __le32  i_block[15];  // Block pointers
        };
        // ext4: Extent tree root
        struct {
            struct ext4_extent_header   i_extent_header;
            struct ext4_extent          i_extent[4];
        };
    };
    
    __le32  i_generation;   // File version (NFS)
    __le32  i_file_acl_lo;  // Extended attributes block
    __le32  i_size_high;    // Size in bytes (high 32 bits)
    __le32  i_obso_faddr;   // Obsolete fragment address
    
    // Additional fields for ext4 (256 bytes total)
    __le16  i_extra_isize;  // Size of extra inode fields
    __le16  i_checksum_hi;  // Inode checksum (high 16 bits)
    __le32  i_ctime_extra;  // Extra ctime precision (nsec << 2 | epoch)
    __le32  i_mtime_extra;  // Extra mtime precision
    __le32  i_atime_extra;  // Extra atime precision
    __le32  i_crtime;       // File creation time
    __le32  i_crtime_extra; // Extra creation time precision
    __le32  i_version_hi;   // Inode version (high 32 bits)
    __le32  i_projid;       // Project ID
};

Inode Size Evolution

Version	Inode Size	Key Additions
ext2	128 bytes	Core metadata, 15 block pointers
ext3	128+ bytes	Optional larger inodes
ext4	256 bytes (default)	Nanosecond timestamps, creation time, checksums

The Block Pointer Array

In ext2/ext3, the i_block[15] array uses a hierarchical scheme:

i_block[0-11]: Direct pointers to data blocks
i_block[12]: Single indirect block (pointer to block of pointers)
i_block[13]: Double indirect block (pointer to pointers to pointers)
i_block[14]: Triple indirect block (three levels of indirection)

With 4KB blocks and 4-byte pointers:

Type	Pointers	Blocks Addressed	Max Data
12 Direct	12	12	48 KB
1 Indirect	1024	1,024	4 MB
1 Double	1024²	1,048,576	4 GB
1 Triple	1024³	1,073,741,824	4 TB

ext4's Extent Revolution

ext4 replaces this indirect block scheme with extents—ranges of contiguous blocks:

Extent: (logical block 0, physical block 1000, length 500)
        "Logical blocks 0-499 map to physical blocks 1000-1499"

A single extent can describe millions of contiguous blocks, dramatically reducing metadata overhead for large files. We'll explore extents in depth later in this module.

Inode Count is Fixed

Directory Implementation

In the ext file system family, directories are special files whose data blocks contain directory entries. The directory structure has evolved significantly:

Classic ext2 Directory Entries (Linear List)

The original ext2 directory format uses variable-length entries in a linear list:

ext2_dir_entry.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// ext2 directory entry (original format)
struct ext2_dir_entry {
    __le32  inode;          // Inode number (0 = unused entry)
    __le16  rec_len;        // Length of this entry (for alignment)
    __le16  name_len;       // Actual filename length
    char    name[];         // Filename (NOT null-terminated)
};
 
// ext2 directory entry v2 (with file type)
struct ext2_dir_entry_2 {
    __le32  inode;          // Inode number
    __le16  rec_len;        // Entry length
    __u8    name_len;       // Filename length
    __u8    file_type;      // File type (dir, regular, symlink, etc.)
    char    name[];         // Filename
};
 
// File type values
#define EXT2_FT_UNKNOWN     0
#define EXT2_FT_REG_FILE    1   // Regular file
#define EXT2_FT_DIR         2   // Directory
#define EXT2_FT_CHRDEV      3   // Character device
#define EXT2_FT_BLKDEV      4   // Block device
#define EXT2_FT_FIFO        5   // Named pipe
#define EXT2_FT_SOCK        6   // Socket
#define EXT2_FT_SYMLINK     7   // Symbolic link

Directory Layout Example:

+--------+--------+--------+----------+--------+--------+--------+----------+
| inode  | rec_len| name   | name     | inode  | rec_len| name   | name     |
| 2      | 12     | len=1  | "."     | 2      | 12     | len=2  | ".."    |
+--------+--------+--------+----------+--------+--------+--------+----------+
| inode  | rec_len| name   | name              | inode  | rec_len| name     |
| 45678  | 20     | len=10 | "hello.txt"      | 12345  | <rem>  | "dir"   |
+--------+--------+--------+-------------------+--------+--------+----------+

Key observations:

Entries are packed consecutively within data blocks
rec_len allows variable-size entries (including padding)
Deleted entries have inode = 0 but remain in the list
Last entry's rec_len extends to block end

The Performance Problem

Linear search through directory entries has O(n) complexity. For directories with thousands of files, operations like ls or stat became painfully slow.

Hash Tree Directories (HTree)

ext3 introduced optional hash tree (HTree) indexing for directories:

Directory blocks organized as a B-tree variant
Entries hashed by filename for fast lookup
Falls back to linear if tree becomes corrupted
Enabled by default in ext3/ext4

Converting Mermaid diagram...

Directory Lookup Performance Comparison
Structure	Lookup Time	Insert Time	Practical Limit
Linear (ext2 default)	O(n)	O(1) amortized	~1,000 entries
HTree (ext3+)	O(log n)	O(log n)	~10,000,000+ entries
With inline data	O(1) for tiny dirs	N/A	~60 bytes of entries

Feature Comparison: ext2 vs ext3 vs ext4

While sharing a common architecture, each ext version introduced significant features. Let's compare them systematically.

Comprehensive Feature Comparison
Feature	ext2	ext3	ext4
Maximum volume size	4 TB (later 32 TB)	4/16/32 TB	1 EB (exabyte)
Maximum file size	2 TB	2 TB	16 TB
Maximum filename length	255 bytes	255 bytes	255 bytes
Journaling	❌ None	✅ Yes	✅ Yes (with checksums)
Block allocation	Indirect blocks	Indirect blocks	Extents + indirect fallback
Online defragmentation	❌ No	❌ No	✅ Via e4defrag
Delayed allocation	❌ No	❌ No	✅ Yes
Persistent preallocation	❌ No	❌ No	✅ fallocate()
Subsecond timestamps	❌ No	❌ No	✅ Nanosecond
Creation timestamp	❌ No	❌ No	✅ Yes
Directory indexing	❌ Linear only	✅ HTree	✅ HTree
Metadata checksums	❌ No	❌ No	✅ Optional
Barrier support	N/A	✅ Yes	✅ Yes (default)
Quota journaling	❌ No	❌ No	✅ Yes

Why ext2 Still Has a Place

Despite being "outdated," ext2 remains valuable:

Boot partitions: No journal overhead for read-mostly /boot
Flash storage with wear-leveling: Less write amplification
Embedded systems: Simpler, smaller driver code
Recovery scenarios: Easier to repair without journal complexity

Why ext3 Over ext2

The single killer feature: crash recovery. Without journaling:

Unclean shutdown requires full fsck (filesystem check)
fsck on large volumes can take hours
Data structures may be inconsistent
Risk of data loss during recovery

With journaling:

Recovery replays committed journal transactions (seconds)
Guaranteed metadata consistency
Much lower risk of corruption

Why ext4 Over ext3

For modern systems, ext4's advantages are compelling:

Extents: Dramatically better large file performance
Delayed allocation: Improved write throughput
Multiblock allocation: Reduced fragmentation
Nanosecond timestamps: Required by modern applications (databases, logging)
Persistent preallocation: Guaranteed space for critical files

Migration Path

ext4 in Modern Linux Systems

ext4 remains the default file system for most Linux distributions, despite competition from newer options like btrfs and XFS. Understanding its role helps contextualize technical decisions.

Default in Major Distributions:

Ubuntu: ext4 default since 10.04 (2010)
Fedora: ext4 default (with btrfs option)
RHEL/CentOS: XFS default for /, ext4 widely supported
Debian: ext4 default
Android: ext4 for system/data until recent encrypted F2FS adoption

Why ext4 Endures:

ext4 Strengths

•Maturity: 15+ years of production testing
•Reliability: Proven crash recovery
•Performance: Excellent for most workloads
•Tool support: Extensive utilities
•Documentation: Well-understood behavior
•Compatibility: Works everywhere Linux runs

ext4 Limitations

•No snapshots: Unlike btrfs/ZFS
•No built-in compression: Must use eCryptfs/fscrypt
•No built-in RAID: Requires mdadm/LVM
•No data checksums: Metadata only
•Limited online shrink: Grow only
•32K subdirectory limit: Per directory

Typical ext4 Usage Scenarios:

Scenario	Configuration	Notes
Root filesystem	Default mount options	Balanced performance/safety
/home	user_xattr,acl	Extended attributes for desktop integration
Database storage	noatime,nodiratime,data=ordered	Reduce metadata writes
Log aggregation	noatime,nobarrier*	Maximum write performance
Virtual machine images	discard,noatime	SSD optimization
Container storage	noatime,journal_checksum	Reliability with performance

*nobarrier reduces safety but improves performance; use only with battery-backed cache.

Performance Tuning Parameters:

# Create optimized ext4 filesystem
mkfs.ext4 -O ^metadata_csum,^64bit -E stride=16,stripe_width=64 /dev/sda1

# Mount with performance options
mount -o noatime,commit=60,barrier=0,data=writeback /dev/sda1 /mnt

# Adjust runtime parameters
echo 0 > /proc/fs/ext4/sda1/max_batch_time
echo 5 > /proc/sys/vm/laptop_mode  # Aggressive write coalescing

Emerging Competition

Summary: The Linux Extended File System Family

We've established the foundational understanding of the ext file system family. Let's consolidate the key concepts before diving into detailed component analysis:

Key Takeaways

•The ext family evolved to address specific Linux needs — From Minix's limitations through ext4's modern features, each version solved real problems while maintaining compatibility.
•Block groups are the fundamental organizational unit — Partitions are divided into block groups containing related metadata and data, improving locality.
•Inodes store all metadata except filename — The separation of metadata (inode) from naming (directory entry) from content (data blocks) is central to ext design.
•Journaling was ext3's breakthrough — Write-ahead logging eliminated hours-long fsck operations after crashes.
•ext4 introduced extents for efficiency — Replacing indirect blocks with contiguous extent ranges dramatically improved large file handling.
•Design prioritizes reliability — Consistency and data safety take precedence over raw performance throughout the family.

What's Next:

The following pages dive deep into the specific components:

Block Groups: How partitioning the disk into groups improves locality and enables parallelism
Superblock: The critical metadata structure that defines file system parameters
Ext3 Journaling: Write-ahead logging implementation and journaling modes
Ext4 Extents: Modern extent-based allocation replacing indirect block pointers

With this foundational understanding, you're prepared to explore each component in the depth required for true mastery of Linux file system internals.

Page Complete