Database Management SystemsIn-Memory Databases

In-Memory Databases

LevelAdvanced

Duration75 mins

TopicIn-Memory Databases

3 / 5

SAP HANA

The Enterprise In-Memory Revolution

In 2010, SAP introduced a database that would fundamentally reshape enterprise computing. SAP HANA (High-performance ANalytic Appliance) wasn't just another database—it was a declaration that the era of disk-based enterprise systems was ending.

HANA represented decades of database research culminating in a production system capable of running critical enterprise workloads entirely in memory. Its design influenced an entire generation of database systems and demonstrated that in-memory computing could scale to enterprise requirements.

This page examines HANA in depth: its architectural innovations, hybrid row-column engine, and the design decisions that made it the foundation for SAP's entire enterprise software portfolio.

What You Will Learn

By the end of this page, you will understand: HANA's architectural philosophy and key innovations, how the hybrid row-column engine works, the role of HANA in SAP's enterprise ecosystem, advanced features like multi-tiered storage and native HANA development, and when HANA is the right choice for your workload.

The Genesis of SAP HANA

The Problem SAP Faced

By the mid-2000s, SAP's enterprise customers were hitting performance walls. SAP ERP systems—running on traditional databases like Oracle, DB2, or SQL Server—struggled with:

Reporting Latency: Complex business reports took hours or overnight batch runs
Aggregate Tables: Pre-computed aggregates required massive storage and complex ETL
Separate OLTP/OLAP Systems: Transactional and analytical workloads couldn't coexist efficiently
Data Growth: As data volumes exploded, traditional scaling hit diminishing returns

SAP's vision was radical: eliminate the architectural separation between transactional and analytical processing by making everything fast enough to run in real-time.

The Research Foundation

HANA drew on decades of academic research:

TREX: SAP's columnar search engine, developed in the late 1990s
MaxDB: SAP's in-memory technology (liveCache)
H-Store/VoltDB: Academic work on main-memory OLTP from MIT/Brown
MonetDB: Columnar database research from CWI Amsterdam
Hasso Plattner Institute Research: Extensive work on in-memory data management

SAP HANA Evolution Timeline
Year	Version/Event	Key Development
2010	Initial Release	In-memory, column-oriented database for analytics
2012	HANA 1.0 SP5	Added row store for OLTP, enabling hybrid workloads
2013	Suite on HANA	SAP Business Suite certified to run on HANA
2015	S/4HANA Launch	Complete SAP ERP rebuilt native on HANA
2016	HANA 2.0	Multi-tenant, advanced analytics, warm data tiering
2018	HANA Cockpit	Enhanced administration and monitoring
2020	HANA Cloud	Cloud-native managed HANA service
2023	HANA Cloud QRC	Quarterly release cycle, enhanced vector/AI features

The Strategic Bet

SAP made a multi-billion dollar strategic bet on in-memory technology. Rather than continuing to support multiple traditional databases, SAP rebuilt its entire product line on HANA. S/4HANA, SAP's current ERP platform, runs ONLY on HANA—a platform-level commitment unprecedented in enterprise software.

HANA Architecture Overview

HANA's architecture is designed from the ground up for in-memory operation while supporting durability, high availability, and enterprise-grade reliability.

Core Architectural Components

hana_architecture.txt

SAP HANA Architecture Overview
 
┌─────────────────────────────────────────────────────────────────────────┐
│                              INDEX SERVER                                │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                        SQL/MDX Processor                        │   │
│  │  • Query parsing, optimization, execution planning              │   │
│  │  • Join engine, aggregation engine                              │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                   │                                      │
│         ┌─────────────────────────┼─────────────────────────┐           │
│         ▼                         ▼                         ▼           │
│  ┌──────────────┐  ┌─────────────────────────┐  ┌─────────────────┐    │
│  │   ROW STORE  │  │       COLUMN STORE      │  │   TEXT SEARCH   │    │
│  │              │  │                         │  │   (TREX indexer)│    │
│  │ • OLTP       │  │ • Analytics (OLAP)      │  │ • Full-text     │    │
│  │ • Transact.  │  │ • Aggregations          │  │ • Fuzzy search  │    │
│  │ • Row format │  │ • Compression           │  │ • Text mining   │    │
│  └──────────────┘  └─────────────────────────┘  └─────────────────┘    │
│                                   │                                      │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                       PERSISTENCE LAYER                          │   │
│  │  • Savepoints, Logging, Recovery                                 │   │
│  │  • Log volumes, Data volumes                                     │   │
│  └─────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘
         │                              │                        │
         ▼                              ▼                        ▼
┌──────────────┐             ┌──────────────────────┐    ┌─────────────┐
│ NAME SERVER  │             │   XS ENGINE          │    │ PREPROCESSOR│
│ (Distributed │             │ (Web application     │    │ SERVER      │
│  catalog)    │             │  server)             │    │ (ETL, CALC) │
└──────────────┘             └──────────────────────┘    └─────────────┘

Key Components Explained

1. Index Server

The core database engine containing:

Session Manager: Handles client connections and authentication
SQL/MDX Processor: Parses, optimizes, and executes queries
Transaction Manager: ACID compliance, MVCC implementation
Authorization Manager: Fine-grained access control
Persistence Layer: Durability through logging and savepoints

2. Row Store

Optimized for transactional workloads:

Classic row-oriented storage layout
NSE (Native Storage Extension) for on-disk spilling
Used for system tables, small lookup tables, and OLTP-heavy tables

3. Column Store

The heart of HANA's analytical power:

True columnar storage with advanced compression
Delta merge architecture (main + delta stores)
Dictionary encoding, run-length encoding, sparse encoding
Vectorized query execution

4. Extended Application Services (XS)

Built-in web application server enabling "Native HANA development"—applications running directly on the database without middleware layers.

The Hybrid Row-Column Engine

HANA's defining innovation is its hybrid engine that seamlessly combines row and column storage, enabling both OLTP and OLAP on the same data without ETL or replication.

Why Hybrid Matters

Traditional architectures force a choice:

Row stores excel at transactions but struggle with analytics
Column stores excel at analytics but struggle with transactions

Enterprises needed both. The conventional solution—replicate data to separate OLTP and OLAP systems—meant:

Minutes to hours of data latency
Complex ETL pipelines to maintain
Storage costs doubled
Inconsistencies between operational and analytical data

HANA's hybrid approach eliminates this dichotomy.

delta_merge_architecture.txt
HANA Column Store Delta-Merge Architecture
 
WRITE PATH:
┌─────────────────────────────────────────────────────────────────┐
│  INSERT/UPDATE arrives                                          │
│       │                                                         │
│       ▼                                                         │
│  ┌───────────────────────────────────────────┐                  │
│  │           L1 DELTA (Row-Oriented)         │                  │
│  │  • Unsorted, uncompressed                 │                  │
│  │  • Optimized for single-row inserts       │                  │
│  │  • Small buffer (MBs)                     │                  │
│  └───────────────────────────────────────────┘                  │
│       │ (background merge when threshold)                       │
│       ▼                                                         │
│  ┌───────────────────────────────────────────┐                  │
│  │          L2 DELTA (Column-Oriented)       │                  │
│  │  • Sorted dictionary, basic compression    │                  │
│  │  • Larger buffer (10s-100s MB)            │                  │
│  └───────────────────────────────────────────┘                  │
│       │ (MERGE operation when threshold)                        │
│       ▼                                                         │
│  ┌───────────────────────────────────────────┐                  │
│  │              MAIN STORE                    │                  │
│  │  • Fully compressed, optimized for reads  │                  │
│  │  • Read-optimized columnar layout         │                  │
│  │  • Global dictionary, heavy compression    │                  │
│  └───────────────────────────────────────────┘                  │
└─────────────────────────────────────────────────────────────────┘
 
READ PATH:
Query reads from BOTH Main Store AND Delta Stores
Results are merged transparently
Latest version always visible

The Delta Merge Process

L1 Delta (Row Delta): Incoming writes go to a small, row-oriented buffer optimized for single-row operations. This provides fast insert performance without touching the compressed main store.
L2 Delta (Sorted Dictionary): When L1 fills, data is converted to columnar format with basic compression and sorted by dictionary values. This provides good query performance while still accepting updates.
Main Store Merge: Background processes periodically merge L2 delta into the main store, applying full compression optimizations. This is the "delta merge" operation that database administrators monitor and tune.

Why This Design Works

Write Performance: Inserts avoid expensive main store compression operations
Read Performance: Main store is heavily optimized for scan performance
Consistency: MVCC ensures readers always see consistent data across all stores
Compression: Main store achieves maximum compression ratios
Memory Efficiency: Delta stores are small; main store is heavily compressed

Delta Merge Tuning

Delta merge operations can cause temporary performance variations. For high-throughput systems, HANA administrators configure merge thresholds, scheduling, and parallelism. Auto-merge handles most cases, but understanding the mechanism helps diagnose performance patterns.

HANA Compression Technologies

Compression in HANA isn't just about saving space—it's a performance feature. Smaller data means more fits in cache, more fits in memory, and faster scans.

Compression Techniques

HANA automatically selects the optimal compression for each column based on data characteristics:

HANA Column Compression Methods
Technique	How It Works	Best For	Typical Ratio
Dictionary Encoding	Replace values with integer codes referencing dictionary	String columns with repeated values	5:1 to 20:1
Run-Length Encoding (RLE)	Store (value, count) for consecutive identical values	Sorted columns, low cardinality	10:1 to 100:1
Cluster Encoding	Group similar values into clusters with shared prefix	Similar strings, hierarchical codes	4:1 to 15:1
Sparse Encoding	Store only non-default values with positions	Columns with many NULLs or defaults	10:1 to 50:1
Bit-Packed/Prefix Encoding	Use minimum bits for value range	Integer columns with limited range	2:1 to 8:1
Indirect Encoding	Secondary dictionary for large dictionaries	High-cardinality strings	3:1 to 10:1

Dictionary Encoding in Detail

Dictionary encoding is HANA's cornerstone compression technique. Consider a column storing country names:

dictionary_encoding_hana.txt
HANA Dictionary Encoding Example
 
Original Column: CUSTOMER_COUNTRY (10 million rows)
['United States', 'Germany', 'United States', 'France', 'Germany', 
 'United States', 'United Kingdom', 'Germany', ...]
 
Dictionary (Sorted):
0 → 'France'           (1.2 million occurrences)
1 → 'Germany'          (2.5 million occurrences)  
2 → 'United Kingdom'   (1.8 million occurrences)
3 → 'United States'    (4.5 million occurrences)
 
Encoded Column (attribute vector):
[3, 1, 3, 0, 1, 3, 2, 1, ...]  (2 bits per value = 0.25 bytes)
 
Storage Analysis:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Original: 10M rows × ~12 bytes avg = 120 MB
  Dictionary: 4 entries × ~15 bytes = 60 bytes
  Attribute Vector: 10M × 2 bits = 2.5 MB
  Compressed Total: ~2.5 MB
  Compression Ratio: ~48:1
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
Query Execution Benefit:
• Equality filter: Compare encoded integers (2 bits), not strings
• GROUP BY: Already grouped by dictionary value
• Aggregation: SIMD processes 128 values per operation

Compression and Query Execution

HANA executes queries directly on compressed data wherever possible:

Predicate Evaluation: Filters comparing to dictionary values can compare encoded integers
Join Processing: Dictionary-encoded columns can join by comparing encoded values
Aggregation: Count/sum operations work on compressed representation
Late Materialization: Only decompress columns actually needed for output

This "operate on compressed data" capability multiplies the benefit: smaller data AND faster processing.

Typical Compression Ratios

Real SAP customer deployments typically see overall compression ratios of 3:1 to 10:1 compared to disk-based databases. Transaction data with many string columns might achieve 5:1, while highly structured data with repeated patterns can exceed 20:1. This compression enables datasets of 10-30 TB raw size to fit in 1-3 TB of RAM.

HANA Query Processing

HANA's query engine leverages in-memory architecture to achieve performance that would be impossible with disk-based designs.

Query Execution Model

HANA uses a combination of execution strategies:

HANA Query Execution Features

•Vectorized Execution: Process data in batches (vectors) of thousands of values, enabling SIMD operations and cache-efficient processing.
•Parallelization: Queries automatically parallelize across available CPU cores. Column scans, aggregations, and joins all leverage multi-core processing.
•Partition Pruning: When tables are partitioned, only relevant partitions are scanned based on filter predicates.
•Push-down Optimization: Predicates and projections are pushed as close to data as possible, minimizing data movement.
•Lazy Evaluation: Only materialize columns that are actually needed for results; intermediate results stay in compressed form.
•Compilation: Frequently executed queries can be compiled to native code for maximum performance.

Join Processing

HANA implements multiple join algorithms optimized for in-memory execution:

Hash Join: Standard for equi-joins. Build hash table on smaller table, probe with larger table. Fully in-memory, no spilling concerns.

Merge Join: Effective when both sides are already sorted (common for column stores with sorted dictionaries).

Partition-Wise Join: When both tables are partitioned on join keys, each partition pair can be joined independently in parallel.

Bloom Filter Optimization: Before expensive join operations, Bloom filters prune non-matching rows, dramatically reducing join input size.

hana_parallel_query.txt
HANA Parallel Query Execution
 
Query: SELECT region, SUM(amount) FROM sales 
       WHERE year = 2023 GROUP BY region;
 
┌─────────────────────────────────────────────────────────────────┐
│  SALES Table (100 million rows, 16 partitions)                  │
│                                                                 │
│  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ... ┌─────┐                   │
│  │ P1  │ │ P2  │ │ P3  │ │ P4  │     │ P16 │                   │
│  └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘     └──┬──┘                   │
│     │       │       │       │           │                       │
│     ▼       ▼       ▼       ▼           ▼                       │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │            PARALLEL SCAN + FILTER (16 threads)           │  │
│  │            WHERE year = 2023 (filter on encoded value)   │  │
│  └──────────────────────────────────────────────────────────┘  │
│     │       │       │       │           │                       │
│     ▼       ▼       ▼       ▼           ▼                       │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │            PARALLEL LOCAL AGGREGATION                     │  │
│  │            SUM(amount) GROUP BY region (per partition)    │  │
│  └──────────────────────────────────────────────────────────┘  │
│                          │                                      │
│                          ▼                                      │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │            GLOBAL MERGE AGGREGATION                       │  │
│  │            Combine partial results from partitions        │  │
│  └──────────────────────────────────────────────────────────┘  │
│                          │                                      │
│                          ▼                                      │
│                    FINAL RESULT                                 │
│                    (5 regions, aggregated)                      │
└─────────────────────────────────────────────────────────────────┘
 
Execution Time: ~200ms (vs. minutes in disk-based systems)

Understanding EXPLAIN PLAN

HANA's EXPLAIN PLAN output shows parallel execution patterns, partition pruning decisions, and join strategies. Learning to read HANA execution plans is essential for query optimization. Key indicators: parallelism degree, table access methods (column vs. row store), and join types.

Persistence and Recovery

Despite being an "in-memory" database, HANA provides full ACID durability through sophisticated persistence mechanisms.

Write-Ahead Logging (Redo Log)

Every transactional change is written to the redo log before being acknowledged. The log is on persistent storage (SSD or high-performance disk), ensuring durability even if the server loses power.

Savepoints

Periodically, HANA writes consistent snapshots of the entire database to persistent storage. These savepoints:

Capture entire in-memory state
Allow log truncation (old log entries no longer needed for recovery)
Provide fast recovery starting points

hana_persistence.txt
HANA Persistence Architecture
 
MEMORY (In-Memory Database State)
┌─────────────────────────────────────────────────────────────────┐
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │ Row Store   │  │Column Store │  │  Catalog    │ ◄── Live     │
│  │   Tables    │  │  (Main+Δ)   │  │  Metadata   │     State    │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└─────────────────────────────────────────────────────────────────┘
         │                   │                    │
         │    Transaction    │                    │
         │    completes      │                    │
         ▼                   ▼                    ▼
┌─────────────────────────────────────────────────────────────────┐
│                  REDO LOG (on SSD/Disk)                         │
│  ┌───────┬───────┬───────┬───────┬───────┬───────┬─────────┐   │
│  │ LSN 1 │ LSN 2 │ LSN 3 │ LSN 4 │ LSN 5 │ LSN 6 │   ...   │   │
│  └───────┴───────┴───────┴───────┴───────┴───────┴─────────┘   │
│  Every transaction writes here BEFORE acknowledgment            │
│  (synchronous write for durability)                             │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ Periodic (every few minutes)
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      SAVEPOINT                                   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Complete consistent snapshot of in-memory state        │   │
│  │  Written to DATA VOLUME (persistent storage)            │   │
│  │  Enables log truncation (old LSNs no longer needed)     │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
 
RECOVERY PROCESS:
1. Load most recent savepoint (fast—direct load)
2. Replay redo log from savepoint LSN to current
3. Database fully recovered, ready for connections

Recovery Time Optimization

Recovery time is critical for availability. HANA optimizes recovery through:

Parallel Log Replay: Multiple threads replay different portions of the log simultaneously
Incremental Savepoints: Only changed pages are written, not entire database
PreLoad for Tables: Critical tables can be configured for early loading during startup
Fast Restart Option: Keep data on persistent memory (Intel Optane) for near-instant restart

High Availability

For mission-critical deployments, HANA supports:

System Replication: Synchronous or asynchronous replication to standby servers
Host Auto-Failover: Automatic failover within a cluster
Tenant Database Isolation: Multiple isolated databases on shared infrastructure
Backup Integration: Online backup to third-party backup systems

Recovery Time Considerations

Recovery time depends on database size and log volume since last savepoint. Large databases (multi-TB) may take 10-30 minutes to recover. For applications requiring sub-minute RTO, synchronous system replication with automatic failover is essential.

HANA in the Enterprise Ecosystem

HANA's significance extends beyond its technical capabilities—it serves as the foundation for SAP's entire enterprise software portfolio.

S/4HANA: The Simplified ERP

S/4HANA is SAP's current-generation ERP, designed exclusively for HANA. The database's capabilities enabled dramatic simplifications:

Aggregate Table Elimination: Traditional ERPs maintained hundreds of aggregate tables for reporting performance. With HANA's real-time analytics, these are eliminated.
Single Source of Truth: No more separate OLTP and OLAP systems. Operational and analytical queries hit the same data.
Simplified Data Model: The S/4HANA database schema is 75% smaller than earlier SAP ERP systems.
Real-Time Insights: Financial reports that took overnight batch runs now complete in seconds.

SAP HANA Ecosystem Components
Component	Purpose	Key Capabilities
S/4HANA	Core ERP	Finance, supply chain, manufacturing, sales
BW/4HANA	Data warehousing	Enterprise analytics, OLAP cubes on HANA
SAP Analytics Cloud	Cloud BI	Dashboards, planning, predictive analytics
SAP Data Intelligence	Data integration	ETL, data pipelines, metadata management
SAP HANA Cloud	Database-as-a-Service	Managed HANA in hyperscaler clouds
SAP Business Technology Platform	PaaS	Custom app development on HANA stack

Native HANA Development

HANA isn't just a database—it's an application platform. The XS (Extended Application Services) engine enables:

Server-Side JavaScript: Business logic running directly in the database tier
OData Services: REST APIs generated automatically from database artifacts
Calculation Views: Complex analytics defined graphically, executed in-database
Stored Procedures: SQLScript for complex database logic

This "code-to-data" approach minimizes data movement by pushing computation to where data resides.

Multi-Model Capabilities

Modern HANA supports multiple data models: relational, document (JSON), graph, and spatial. This multi-model capability reduces the need for specialized databases. A single HANA instance can store transactional data relationally, documents as JSON, and relationship networks as graphs—all queryable via SQL.

When to Choose HANA

HANA represents a significant investment. Understanding when it's the right choice—and when alternatives may be better—is essential for architects.

HANA is Ideal For

•SAP application environments (mandatory for S/4HANA)
•Hybrid OLTP/OLAP workloads needing both on same data
•Real-time analytics where latency matters critically
•Enterprise data warehousing at large scale
•Organizations already invested in SAP ecosystem
•Complex analytics with large parallel scan requirements
•Multi-model requirements (relational + graph + JSON)

Consider Alternatives When

•Purely OLTP workloads (VoltDB, CockroachDB may be simpler)
•Cost is primary constraint (HANA licensing is significant)
•Dataset far exceeds available RAM budget
•No existing SAP investment or expertise
•Simple key-value access patterns (Redis is more appropriate)
•Open-source requirement (PostgreSQL, ClickHouse alternatives)
•Pure analytics without transactions (ClickHouse, Druid)

Cost Considerations

HANA's costs include:

Licensing: Per-GB or per-TB RAM licensing (significant expense)
Hardware: Certified HANA appliances or cloud instances with large memory
Expertise: HANA administration requires specialized skills
Migration: Migrating to HANA involves substantial project costs

However, total cost of ownership (TCO) should account for:

Eliminated aggregate tables and ETL
Simplified architecture (fewer systems to maintain)
Reduced report development time
Business value of real-time insights

Page Complete

You now have a comprehensive understanding of SAP HANA: its genesis, architecture, hybrid engine design, compression technologies, query processing, persistence mechanisms, and role in the enterprise ecosystem. Next, we'll examine Redis—a fundamentally different in-memory database optimized for simplicity and extreme performance in specific use cases.

3 / 5

Loading learning content...

Database Management SystemsIn-Memory Databases

In-Memory Databases

LevelAdvanced

Duration75 mins

TopicIn-Memory Databases

3 / 5

SAP HANA

The Enterprise In-Memory Revolution

This page examines HANA in depth: its architectural innovations, hybrid row-column engine, and the design decisions that made it the foundation for SAP's entire enterprise software portfolio.

What You Will Learn

The Genesis of SAP HANA

The Problem SAP Faced

By the mid-2000s, SAP's enterprise customers were hitting performance walls. SAP ERP systems—running on traditional databases like Oracle, DB2, or SQL Server—struggled with:

Reporting Latency: Complex business reports took hours or overnight batch runs
Aggregate Tables: Pre-computed aggregates required massive storage and complex ETL
Separate OLTP/OLAP Systems: Transactional and analytical workloads couldn't coexist efficiently
Data Growth: As data volumes exploded, traditional scaling hit diminishing returns

SAP's vision was radical: eliminate the architectural separation between transactional and analytical processing by making everything fast enough to run in real-time.

The Research Foundation

HANA drew on decades of academic research:

TREX: SAP's columnar search engine, developed in the late 1990s
MaxDB: SAP's in-memory technology (liveCache)
H-Store/VoltDB: Academic work on main-memory OLTP from MIT/Brown
MonetDB: Columnar database research from CWI Amsterdam
Hasso Plattner Institute Research: Extensive work on in-memory data management

SAP HANA Evolution Timeline
Year	Version/Event	Key Development
2010	Initial Release	In-memory, column-oriented database for analytics
2012	HANA 1.0 SP5	Added row store for OLTP, enabling hybrid workloads
2013	Suite on HANA	SAP Business Suite certified to run on HANA
2015	S/4HANA Launch	Complete SAP ERP rebuilt native on HANA
2016	HANA 2.0	Multi-tenant, advanced analytics, warm data tiering
2018	HANA Cockpit	Enhanced administration and monitoring
2020	HANA Cloud	Cloud-native managed HANA service
2023	HANA Cloud QRC	Quarterly release cycle, enhanced vector/AI features

The Strategic Bet

HANA Architecture Overview

HANA's architecture is designed from the ground up for in-memory operation while supporting durability, high availability, and enterprise-grade reliability.

Core Architectural Components

hana_architecture.txt

SAP HANA Architecture Overview
 
┌─────────────────────────────────────────────────────────────────────────┐
│                              INDEX SERVER                                │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                        SQL/MDX Processor                        │   │
│  │  • Query parsing, optimization, execution planning              │   │
│  │  • Join engine, aggregation engine                              │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                   │                                      │
│         ┌─────────────────────────┼─────────────────────────┐           │
│         ▼                         ▼                         ▼           │
│  ┌──────────────┐  ┌─────────────────────────┐  ┌─────────────────┐    │
│  │   ROW STORE  │  │       COLUMN STORE      │  │   TEXT SEARCH   │    │
│  │              │  │                         │  │   (TREX indexer)│    │
│  │ • OLTP       │  │ • Analytics (OLAP)      │  │ • Full-text     │    │
│  │ • Transact.  │  │ • Aggregations          │  │ • Fuzzy search  │    │
│  │ • Row format │  │ • Compression           │  │ • Text mining   │    │
│  └──────────────┘  └─────────────────────────┘  └─────────────────┘    │
│                                   │                                      │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                       PERSISTENCE LAYER                          │   │
│  │  • Savepoints, Logging, Recovery                                 │   │
│  │  • Log volumes, Data volumes                                     │   │
│  └─────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘
         │                              │                        │
         ▼                              ▼                        ▼
┌──────────────┐             ┌──────────────────────┐    ┌─────────────┐
│ NAME SERVER  │             │   XS ENGINE          │    │ PREPROCESSOR│
│ (Distributed │             │ (Web application     │    │ SERVER      │
│  catalog)    │             │  server)             │    │ (ETL, CALC) │
└──────────────┘             └──────────────────────┘    └─────────────┘

Key Components Explained

1. Index Server

The core database engine containing:

Session Manager: Handles client connections and authentication
SQL/MDX Processor: Parses, optimizes, and executes queries
Transaction Manager: ACID compliance, MVCC implementation
Authorization Manager: Fine-grained access control
Persistence Layer: Durability through logging and savepoints

2. Row Store

Optimized for transactional workloads:

Classic row-oriented storage layout
NSE (Native Storage Extension) for on-disk spilling
Used for system tables, small lookup tables, and OLTP-heavy tables

3. Column Store

The heart of HANA's analytical power:

True columnar storage with advanced compression
Delta merge architecture (main + delta stores)
Dictionary encoding, run-length encoding, sparse encoding
Vectorized query execution

4. Extended Application Services (XS)

Built-in web application server enabling "Native HANA development"—applications running directly on the database without middleware layers.

The Hybrid Row-Column Engine

HANA's defining innovation is its hybrid engine that seamlessly combines row and column storage, enabling both OLTP and OLAP on the same data without ETL or replication.

Why Hybrid Matters

Traditional architectures force a choice:

Row stores excel at transactions but struggle with analytics
Column stores excel at analytics but struggle with transactions

Enterprises needed both. The conventional solution—replicate data to separate OLTP and OLAP systems—meant:

Minutes to hours of data latency
Complex ETL pipelines to maintain
Storage costs doubled
Inconsistencies between operational and analytical data

HANA's hybrid approach eliminates this dichotomy.

delta_merge_architecture.txt
HANA Column Store Delta-Merge Architecture
 
WRITE PATH:
┌─────────────────────────────────────────────────────────────────┐
│  INSERT/UPDATE arrives                                          │
│       │                                                         │
│       ▼                                                         │
│  ┌───────────────────────────────────────────┐                  │
│  │           L1 DELTA (Row-Oriented)         │                  │
│  │  • Unsorted, uncompressed                 │                  │
│  │  • Optimized for single-row inserts       │                  │
│  │  • Small buffer (MBs)                     │                  │
│  └───────────────────────────────────────────┘                  │
│       │ (background merge when threshold)                       │
│       ▼                                                         │
│  ┌───────────────────────────────────────────┐                  │
│  │          L2 DELTA (Column-Oriented)       │                  │
│  │  • Sorted dictionary, basic compression    │                  │
│  │  • Larger buffer (10s-100s MB)            │                  │
│  └───────────────────────────────────────────┘                  │
│       │ (MERGE operation when threshold)                        │
│       ▼                                                         │
│  ┌───────────────────────────────────────────┐                  │
│  │              MAIN STORE                    │                  │
│  │  • Fully compressed, optimized for reads  │                  │
│  │  • Read-optimized columnar layout         │                  │
│  │  • Global dictionary, heavy compression    │                  │
│  └───────────────────────────────────────────┘                  │
└─────────────────────────────────────────────────────────────────┘
 
READ PATH:
Query reads from BOTH Main Store AND Delta Stores
Results are merged transparently
Latest version always visible

The Delta Merge Process

L1 Delta (Row Delta): Incoming writes go to a small, row-oriented buffer optimized for single-row operations. This provides fast insert performance without touching the compressed main store.
L2 Delta (Sorted Dictionary): When L1 fills, data is converted to columnar format with basic compression and sorted by dictionary values. This provides good query performance while still accepting updates.
Main Store Merge: Background processes periodically merge L2 delta into the main store, applying full compression optimizations. This is the "delta merge" operation that database administrators monitor and tune.

Why This Design Works

Write Performance: Inserts avoid expensive main store compression operations
Read Performance: Main store is heavily optimized for scan performance
Consistency: MVCC ensures readers always see consistent data across all stores
Compression: Main store achieves maximum compression ratios
Memory Efficiency: Delta stores are small; main store is heavily compressed

Delta Merge Tuning

HANA Compression Technologies

Compression in HANA isn't just about saving space—it's a performance feature. Smaller data means more fits in cache, more fits in memory, and faster scans.

Compression Techniques

HANA automatically selects the optimal compression for each column based on data characteristics:

HANA Column Compression Methods
Technique	How It Works	Best For	Typical Ratio
Dictionary Encoding	Replace values with integer codes referencing dictionary	String columns with repeated values	5:1 to 20:1
Run-Length Encoding (RLE)	Store (value, count) for consecutive identical values	Sorted columns, low cardinality	10:1 to 100:1
Cluster Encoding	Group similar values into clusters with shared prefix	Similar strings, hierarchical codes	4:1 to 15:1
Sparse Encoding	Store only non-default values with positions	Columns with many NULLs or defaults	10:1 to 50:1
Bit-Packed/Prefix Encoding	Use minimum bits for value range	Integer columns with limited range	2:1 to 8:1
Indirect Encoding	Secondary dictionary for large dictionaries	High-cardinality strings	3:1 to 10:1

Dictionary Encoding in Detail

Dictionary encoding is HANA's cornerstone compression technique. Consider a column storing country names:

dictionary_encoding_hana.txt
HANA Dictionary Encoding Example
 
Original Column: CUSTOMER_COUNTRY (10 million rows)
['United States', 'Germany', 'United States', 'France', 'Germany', 
 'United States', 'United Kingdom', 'Germany', ...]
 
Dictionary (Sorted):
0 → 'France'           (1.2 million occurrences)
1 → 'Germany'          (2.5 million occurrences)  
2 → 'United Kingdom'   (1.8 million occurrences)
3 → 'United States'    (4.5 million occurrences)
 
Encoded Column (attribute vector):
[3, 1, 3, 0, 1, 3, 2, 1, ...]  (2 bits per value = 0.25 bytes)
 
Storage Analysis:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Original: 10M rows × ~12 bytes avg = 120 MB
  Dictionary: 4 entries × ~15 bytes = 60 bytes
  Attribute Vector: 10M × 2 bits = 2.5 MB
  Compressed Total: ~2.5 MB
  Compression Ratio: ~48:1
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
Query Execution Benefit:
• Equality filter: Compare encoded integers (2 bits), not strings
• GROUP BY: Already grouped by dictionary value
• Aggregation: SIMD processes 128 values per operation

Compression and Query Execution

HANA executes queries directly on compressed data wherever possible:

Predicate Evaluation: Filters comparing to dictionary values can compare encoded integers
Join Processing: Dictionary-encoded columns can join by comparing encoded values
Aggregation: Count/sum operations work on compressed representation
Late Materialization: Only decompress columns actually needed for output

This "operate on compressed data" capability multiplies the benefit: smaller data AND faster processing.

Typical Compression Ratios

HANA Query Processing

HANA's query engine leverages in-memory architecture to achieve performance that would be impossible with disk-based designs.

Query Execution Model

HANA uses a combination of execution strategies:

HANA Query Execution Features

•Vectorized Execution: Process data in batches (vectors) of thousands of values, enabling SIMD operations and cache-efficient processing.
•Parallelization: Queries automatically parallelize across available CPU cores. Column scans, aggregations, and joins all leverage multi-core processing.
•Partition Pruning: When tables are partitioned, only relevant partitions are scanned based on filter predicates.
•Push-down Optimization: Predicates and projections are pushed as close to data as possible, minimizing data movement.
•Lazy Evaluation: Only materialize columns that are actually needed for results; intermediate results stay in compressed form.
•Compilation: Frequently executed queries can be compiled to native code for maximum performance.

Join Processing

HANA implements multiple join algorithms optimized for in-memory execution:

Hash Join: Standard for equi-joins. Build hash table on smaller table, probe with larger table. Fully in-memory, no spilling concerns.

Merge Join: Effective when both sides are already sorted (common for column stores with sorted dictionaries).

Partition-Wise Join: When both tables are partitioned on join keys, each partition pair can be joined independently in parallel.

Bloom Filter Optimization: Before expensive join operations, Bloom filters prune non-matching rows, dramatically reducing join input size.

hana_parallel_query.txt
HANA Parallel Query Execution
 
Query: SELECT region, SUM(amount) FROM sales 
       WHERE year = 2023 GROUP BY region;
 
┌─────────────────────────────────────────────────────────────────┐
│  SALES Table (100 million rows, 16 partitions)                  │
│                                                                 │
│  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ... ┌─────┐                   │
│  │ P1  │ │ P2  │ │ P3  │ │ P4  │     │ P16 │                   │
│  └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘     └──┬──┘                   │
│     │       │       │       │           │                       │
│     ▼       ▼       ▼       ▼           ▼                       │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │            PARALLEL SCAN + FILTER (16 threads)           │  │
│  │            WHERE year = 2023 (filter on encoded value)   │  │
│  └──────────────────────────────────────────────────────────┘  │
│     │       │       │       │           │                       │
│     ▼       ▼       ▼       ▼           ▼                       │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │            PARALLEL LOCAL AGGREGATION                     │  │
│  │            SUM(amount) GROUP BY region (per partition)    │  │
│  └──────────────────────────────────────────────────────────┘  │
│                          │                                      │
│                          ▼                                      │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │            GLOBAL MERGE AGGREGATION                       │  │
│  │            Combine partial results from partitions        │  │
│  └──────────────────────────────────────────────────────────┘  │
│                          │                                      │
│                          ▼                                      │
│                    FINAL RESULT                                 │
│                    (5 regions, aggregated)                      │
└─────────────────────────────────────────────────────────────────┘
 
Execution Time: ~200ms (vs. minutes in disk-based systems)

Understanding EXPLAIN PLAN

Persistence and Recovery

Despite being an "in-memory" database, HANA provides full ACID durability through sophisticated persistence mechanisms.

Write-Ahead Logging (Redo Log)

Every transactional change is written to the redo log before being acknowledged. The log is on persistent storage (SSD or high-performance disk), ensuring durability even if the server loses power.

Savepoints

Periodically, HANA writes consistent snapshots of the entire database to persistent storage. These savepoints:

Capture entire in-memory state
Allow log truncation (old log entries no longer needed for recovery)
Provide fast recovery starting points

hana_persistence.txt
HANA Persistence Architecture
 
MEMORY (In-Memory Database State)
┌─────────────────────────────────────────────────────────────────┐
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │ Row Store   │  │Column Store │  │  Catalog    │ ◄── Live     │
│  │   Tables    │  │  (Main+Δ)   │  │  Metadata   │     State    │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└─────────────────────────────────────────────────────────────────┘
         │                   │                    │
         │    Transaction    │                    │
         │    completes      │                    │
         ▼                   ▼                    ▼
┌─────────────────────────────────────────────────────────────────┐
│                  REDO LOG (on SSD/Disk)                         │
│  ┌───────┬───────┬───────┬───────┬───────┬───────┬─────────┐   │
│  │ LSN 1 │ LSN 2 │ LSN 3 │ LSN 4 │ LSN 5 │ LSN 6 │   ...   │   │
│  └───────┴───────┴───────┴───────┴───────┴───────┴─────────┘   │
│  Every transaction writes here BEFORE acknowledgment            │
│  (synchronous write for durability)                             │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ Periodic (every few minutes)
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      SAVEPOINT                                   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Complete consistent snapshot of in-memory state        │   │
│  │  Written to DATA VOLUME (persistent storage)            │   │
│  │  Enables log truncation (old LSNs no longer needed)     │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
 
RECOVERY PROCESS:
1. Load most recent savepoint (fast—direct load)
2. Replay redo log from savepoint LSN to current
3. Database fully recovered, ready for connections

Recovery Time Optimization

Recovery time is critical for availability. HANA optimizes recovery through:

Parallel Log Replay: Multiple threads replay different portions of the log simultaneously
Incremental Savepoints: Only changed pages are written, not entire database
PreLoad for Tables: Critical tables can be configured for early loading during startup
Fast Restart Option: Keep data on persistent memory (Intel Optane) for near-instant restart

High Availability

For mission-critical deployments, HANA supports:

System Replication: Synchronous or asynchronous replication to standby servers
Host Auto-Failover: Automatic failover within a cluster
Tenant Database Isolation: Multiple isolated databases on shared infrastructure
Backup Integration: Online backup to third-party backup systems

Recovery Time Considerations

HANA in the Enterprise Ecosystem

HANA's significance extends beyond its technical capabilities—it serves as the foundation for SAP's entire enterprise software portfolio.

S/4HANA: The Simplified ERP

S/4HANA is SAP's current-generation ERP, designed exclusively for HANA. The database's capabilities enabled dramatic simplifications:

Aggregate Table Elimination: Traditional ERPs maintained hundreds of aggregate tables for reporting performance. With HANA's real-time analytics, these are eliminated.
Single Source of Truth: No more separate OLTP and OLAP systems. Operational and analytical queries hit the same data.
Simplified Data Model: The S/4HANA database schema is 75% smaller than earlier SAP ERP systems.
Real-Time Insights: Financial reports that took overnight batch runs now complete in seconds.

SAP HANA Ecosystem Components
Component	Purpose	Key Capabilities
S/4HANA	Core ERP	Finance, supply chain, manufacturing, sales
BW/4HANA	Data warehousing	Enterprise analytics, OLAP cubes on HANA
SAP Analytics Cloud	Cloud BI	Dashboards, planning, predictive analytics
SAP Data Intelligence	Data integration	ETL, data pipelines, metadata management
SAP HANA Cloud	Database-as-a-Service	Managed HANA in hyperscaler clouds
SAP Business Technology Platform	PaaS	Custom app development on HANA stack

Native HANA Development

HANA isn't just a database—it's an application platform. The XS (Extended Application Services) engine enables:

Server-Side JavaScript: Business logic running directly in the database tier
OData Services: REST APIs generated automatically from database artifacts
Calculation Views: Complex analytics defined graphically, executed in-database
Stored Procedures: SQLScript for complex database logic

This "code-to-data" approach minimizes data movement by pushing computation to where data resides.

Multi-Model Capabilities

When to Choose HANA

HANA represents a significant investment. Understanding when it's the right choice—and when alternatives may be better—is essential for architects.

HANA is Ideal For

•SAP application environments (mandatory for S/4HANA)
•Hybrid OLTP/OLAP workloads needing both on same data
•Real-time analytics where latency matters critically
•Enterprise data warehousing at large scale
•Organizations already invested in SAP ecosystem
•Complex analytics with large parallel scan requirements
•Multi-model requirements (relational + graph + JSON)

Consider Alternatives When

•Purely OLTP workloads (VoltDB, CockroachDB may be simpler)
•Cost is primary constraint (HANA licensing is significant)
•Dataset far exceeds available RAM budget
•No existing SAP investment or expertise
•Simple key-value access patterns (Redis is more appropriate)
•Open-source requirement (PostgreSQL, ClickHouse alternatives)
•Pure analytics without transactions (ClickHouse, Druid)

Cost Considerations

HANA's costs include:

Licensing: Per-GB or per-TB RAM licensing (significant expense)
Hardware: Certified HANA appliances or cloud instances with large memory
Expertise: HANA administration requires specialized skills
Migration: Migrating to HANA involves substantial project costs

However, total cost of ownership (TCO) should account for:

Eliminated aggregate tables and ETL
Simplified architecture (fewer systems to maintain)
Reduced report development time
Business value of real-time insights

Page Complete

3 / 5