Dba Responsibilities - Learning Module

Loading content...

0/252

Installation and Configuration

The Foundation of Database Operations

Every database system—whether it serves a small application or powers a global enterprise—begins with a single critical phase: installation and configuration. This foundational step determines not just whether the database will function, but how well it will perform, how secure it will be, and how maintainable it will remain throughout its operational lifetime.

A poorly configured database is a ticking time bomb. It may appear to work initially, but as data volumes grow, user concurrency increases, and security threats evolve, those initial configuration choices reveal their consequences—often catastrophically. The difference between a database that scales gracefully and one that becomes a bottleneck often traces back to decisions made during installation.

What You Will Learn

By the end of this page, you will understand the complete installation and configuration lifecycle for enterprise database systems. You'll learn to make informed decisions about hardware requirements, storage architecture, memory allocation, network configuration, and security hardening—skills that separate novice operators from seasoned Database Administrators.

Pre-Installation Planning

Before executing any installation commands, experienced DBAs invest significant effort in pre-installation planning. This phase is often underestimated yet directly determines operational success. The goal is to gather requirements, assess constraints, and make architectural decisions that align with both current needs and future growth.

Requirements Gathering:

The first step involves understanding the workload characteristics that the database will serve. Different applications impose radically different demands:

Workload Characteristics and Their Database Implications
Workload Type	Characteristics	Key Configuration Focus
OLTP (Transactional)	High concurrency, short transactions, frequent writes, low latency requirements	Connection pooling, transaction log performance, row-level locking
OLAP (Analytical)	Complex queries, large scans, aggregations, batch processing	Memory for sorts, parallel query execution, columnar storage options
Mixed Workload	Combination of transactional and analytical queries	Resource isolation, query queuing, workload management
High Availability	Zero downtime tolerance, geographic distribution	Replication configuration, failover automation, cluster setup
Data Warehouse	Massive data volumes, ETL processes, historical data	Bulk loading optimization, partition strategies, compression

Capacity Estimation:

Accurate capacity estimation prevents both under-provisioning (leading to performance issues) and over-provisioning (wasting resources and budget). Consider these dimensions:

Data Volume: Current size plus projected growth over 3-5 years
Transaction Rate: Peak transactions per second (TPS) with headroom
Concurrent Users: Maximum simultaneous connections
Query Complexity: Resource requirements for typical and worst-case queries
Retention Requirements: How long data must be kept and accessible

The 10x Rule for Capacity Planning

When estimating resources, plan for 10x your current requirements for data growth and 3x for concurrent connections. Database growth is rarely linear—applications that succeed often experience exponential growth. It's far easier (and cheaper) to provision extra capacity initially than to perform emergency migrations later.

Architecture Decisions:

Pre-installation is also when fundamental architectural choices must be made:

Standalone vs. Clustered: Will this be a single-node installation or part of a high-availability cluster?
On-Premises vs. Cloud: Physical hardware, virtual machines, or cloud-managed services?
Storage Architecture: Local disks, SAN, NAS, or cloud block storage?
Network Topology: Dedicated database network, firewall placement, load balancer integration?
Backup Infrastructure: Where will backups be stored? What's the recovery time objective?

These decisions must be documented and approved before proceeding. Changing fundamental architecture post-installation often requires complete system rebuilds.

Hardware Requirements and Selection

Database performance is ultimately constrained by hardware. While software optimization can improve efficiency, inadequate hardware creates a ceiling that no amount of tuning can overcome. Understanding hardware requirements involves deep knowledge of how databases utilize each component.

Critical Hardware Components for Database Systems

•CPU Selection — Modern databases are highly parallelizable. Prefer many cores over high clock speed for OLAP workloads; OLTP benefits more from clock speed for single-threaded transaction coordination. Consider CPU cache size—larger L3 caches significantly improve query performance.
•Memory (RAM) — The most impactful hardware investment. Databases use memory for buffer pools, query execution, sorting, and caching. Size memory to hold the working set (frequently accessed data) entirely in RAM. For most production systems, 128GB-512GB is common; large data warehouses may require 1TB+.
•Storage Architecture — Storage is typically the bottleneck. NVMe SSDs should be the standard for all database files. RAID 10 provides optimal performance with redundancy. Separate transaction logs from data files onto different physical storage paths to parallelize I/O.
•Network Infrastructure — Database servers require low-latency, high-bandwidth network connections. 10GbE minimum for production; 25GbE or higher for clustered deployments. Consider dedicated database network segments isolated from general traffic.
•Redundancy Planning — Every hardware component should have redundancy or rapid replacement procedures. Dual power supplies, multiple network paths, hot-swappable drives, and identical standby hardware for critical systems.

Hardware Sizing Guidelines by Database Scale
Scale	Data Size	CPU Cores	RAM	Storage IOPS	Network
Small	< 100 GB	4-8	16-32 GB	5,000+	1 GbE
Medium	100 GB - 1 TB	8-16	64-128 GB	20,000+	10 GbE
Large	1 TB - 10 TB	16-32	256-512 GB	100,000+	25 GbE
Enterprise	10 TB - 100 TB	32-64	512 GB - 1 TB	500,000+	100 GbE
Massive	100 TB	64+ (clustered)	1 TB+ (distributed)	1M+ (distributed)	100 GbE+

Storage Performance is Non-Negotiable

Never install a production database on spinning disks (HDDs) unless cost constraints are extreme and performance is not a concern. The IOPS difference between HDDs (~150) and NVMe SSDs (~500,000+) represents a 3,000x performance gap. Transaction log writes, random I/O for index lookups, and checkpoint operations all suffer dramatically on slow storage.

Virtual vs. Physical Hardware:

Virtualization adds convenience but introduces considerations:

Resource Contention: VMs share physical resources. "Noisy neighbors" can cause unpredictable performance degradation.
Memory Ballooning: Hypervisor memory management can interfere with database buffer pools. Disable ballooning for database VMs.
Storage Latency: Virtual storage adds latency layers. Use pass-through storage or dedicated LUNs for database workloads.
CPU Scheduling: vCPU overcommitment causes "CPU steal" where the VM waits for physical CPU time.

For critical databases, dedicated physical hardware or isolated VM clusters with guaranteed resources provide predictable performance.

Operating System Configuration

The operating system serves as the foundation upon which the database runs. Proper OS configuration is essential for optimal database performance, stability, and security. Many database performance issues trace back to OS misconfiguration rather than database problems.

Kernel Parameters:

Linux (the most common production OS for databases) requires specific kernel tuning:

database-sysctl.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Memory Management
vm.swappiness = 1                    # Minimize swap usage; keep data in RAM
vm.dirty_ratio = 15                  # 15% of RAM can be dirty before sync
vm.dirty_background_ratio = 3       # Background sync starts at 3%
vm.overcommit_memory = 0            # Heuristic overcommit handling
 
# File System
fs.file-max = 6815744               # Maximum file handles system-wide
fs.aio-max-nr = 1048576             # Async I/O events supported
 
# Network Stack
net.core.somaxconn = 4096           # Maximum socket connection backlog
net.core.netdev_max_backlog = 5000  # Network device queue length
net.ipv4.tcp_max_syn_backlog = 4096 # SYN queue size
net.ipv4.tcp_fin_timeout = 15       # Faster connection cleanup
net.ipv4.tcp_keepalive_time = 300   # Detect dead connections
 
# Shared Memory (for PostgreSQL, Oracle, etc.)
kernel.shmmax = 68719476736         # Maximum shared memory segment (64GB)
kernel.shmall = 16777216            # Total shared memory pages
kernel.sem = 250 32000 100 128      # Semaphore configuration

File System Selection:

File system choice significantly impacts database performance and reliability:

XFS: Recommended for most database workloads. Excellent large file support, good concurrent I/O performance, online defragmentation.
ext4: Suitable for smaller databases. Mature and well-understood. Less performant for very large files.
ZFS: Advanced features like compression, snapshots, and checksums. Higher CPU overhead but excellent data integrity.

Mount Options:

How file systems are mounted affects performance:

/dev/sdb1 /data/postgresql xfs defaults,noatime,nodiratime,nobarrier 0 2

noatime: Disables access time updates (significant I/O reduction)
nodiratime: Disables directory access time updates
nobarrier: Removes write barriers (only if hardware has battery-backed cache)

Essential OS Configuration Checklist

•Disable Transparent Huge Pages (THP) — THP causes memory allocation latency spikes. Databases manage their own large page allocations more efficiently. Add transparent_hugepage=never to kernel boot parameters.
•Configure ulimits — Set user limits appropriately: nofile (open files) to 65535+, nproc (processes) to 16384+, memlock to unlimited for databases that lock memory.
•Disable NUMA Balancing — Automatic NUMA balancing interferes with database memory locality. Set kernel.numa_balancing=0 and let the database manage NUMA affinity.
•Set I/O Scheduler — Use noop or none scheduler for SSDs/NVMe (no reordering needed). HDDs benefit from deadline scheduler.
•Configure Time Synchronization — NTP or chrony must keep system time synchronized, especially for clustered databases and transaction ordering.
•Disable Unnecessary Services — Remove or disable services that consume resources unnecessarily: GUI components, unused network services, automatic updates during peak hours.

The Swap Trap

While swap provides a safety net, allowing a database to swap is almost always unacceptable in production. A swapping database becomes thousands of times slower. Configure vm.swappiness=1 (not 0, which can cause OOM issues), monitor swap usage aggressively, and ensure RAM is sized to avoid any swap activity.

Database Software Installation

With infrastructure prepared, the actual database software installation can proceed. While installation specifics vary by database vendor, common principles apply universally.

Installation Methods:

Modern databases offer multiple installation approaches:

Database Installation Methods Comparison
Method	Advantages	Disadvantages	Best For
Package Manager (apt, yum)	Automated dependencies, easy updates, OS integration	May not have latest versions, limited customization	Standard deployments, development
Official Binaries	Latest versions, full control, vendor support	Manual dependency management, manual updates	Production with specific version requirements
Source Compilation	Maximum control, optimization for hardware	Complex, time-consuming, expertise required	Custom builds, embedded systems
Container (Docker)	Isolation, reproducibility, easy deployment	Orchestration complexity, persistence management	Microservices, development, Kubernetes
Cloud-Managed (RDS, Cloud SQL)	Automated management, built-in HA	Less control, vendor lock-in, cost	Teams without dedicated DBAs

Directory Structure Planning:

A well-organized directory structure aids maintenance and performance:

/opt/database/              # Software binaries
├── current -> v15.3/       # Symlink to active version
├── v15.2/                  # Previous version (for rollback)
└── v15.3/                  # Current version

/data/database/             # Primary data files
├── base/                   # Base tablespace
├── global/                 # Cluster-wide tables
└── pg_tblspc/              # Additional tablespaces

/logs/database/             # Transaction logs (separate physical disk)
├── pg_wal/                 # Write-ahead logs
└── archive/                # Archived WAL segments

/backup/database/           # Local backup staging
└── daily/                  # Backup retention

Separate Physical Devices

For optimal performance, place data files, transaction logs, and temp space on separate physical devices. Transaction log writes are sequential and latency-critical; keeping them on dedicated storage prevents interference from random data I/O. This separation can improve transaction throughput by 30-50%.

Service Account Configuration:

Databases should never run as root. Create a dedicated service account:

# Create database user and group
groupadd -r postgres
useradd -r -g postgres -d /data/postgres -s /bin/bash postgres

# Set ownership of directories
chown -R postgres:postgres /data/database
chown -R postgres:postgres /logs/database
chmod 700 /data/database  # Restrict access

Security Considerations During Installation:

Change all default passwords immediately after installation
Disable remote root/superuser access
Remove sample databases and default users
Enable authentication before exposing to any network
Document all accounts created during installation

Initial Configuration Parameters

Database configuration transforms a software installation into a tuned system. Initial configuration should be based on workload characteristics, available resources, and operational requirements. While databases have hundreds of parameters, a subset drives the majority of performance impact.

Memory Configuration:

Memory allocation is the most impactful configuration category:

memory-configuration.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- Buffer Pool / Shared Buffers (PostgreSQL example)
-- Aim for 25-40% of system RAM for dedicated database servers
shared_buffers = 32GB                -- 25% of 128GB RAM
 
-- Work Memory for query operations
work_mem = 256MB                     -- Memory per sort/hash operation
maintenance_work_mem = 2GB           -- Memory for maintenance tasks
 
-- Effective Cache Size (tells optimizer about OS cache)
effective_cache_size = 96GB          -- 75% of RAM
 
-- MySQL InnoDB equivalent
innodb_buffer_pool_size = 100GB      -- 70-80% of RAM for dedicated server
innodb_buffer_pool_instances = 16    -- One per GB for large pools

Connection Configuration:

Connection handling directly affects concurrency capacity:

Connection Parameters

•max_connections: Maximum simultaneous connections. Set based on expected concurrency plus headroom
•connection_timeout: How long to wait for connection establishment
•idle_timeout: When to close idle connections
•reserved_connections: Connections reserved for administrators

Connection Pitfalls

•Setting max_connections too high wastes memory (each connection consumes 5-10MB)
•Not using connection pooling leads to connection churn overhead
•No timeout allows leaked connections to accumulate
•Not reserving admin connections can lock you out during crises

Write-Ahead Log (WAL) Configuration:

WAL/redo log configuration balances durability against performance:

wal-configuration.conf
1
2
3
4
5
6
7
8
9
10
11
12
# WAL Settings (PostgreSQL)
wal_level = replica              # Minimal, replica, or logical
fsync = on                        # NEVER off in production
synchronous_commit = on          # Trade durability for speed (carefully)
wal_buffers = 64MB               # WAL write buffer
checkpoint_timeout = 15min       # Time between checkpoints
max_wal_size = 4GB               # WAL retention before checkpoint
min_wal_size = 1GB               # Minimum WAL to keep
 
# Archive settings for point-in-time recovery
archive_mode = on
archive_command = 'cp %p /archive/%f'

Never Disable fsync

Some documentation mentions disabling fsync for performance. This is catastrophic advice for production databases. Without fsync, data that appears committed may not actually be on disk. A power failure or crash can corrupt the entire database. The performance gain is not worth risking data integrity.

Query Optimizer Configuration:

The query optimizer makes execution decisions based on configured parameters and statistics:

Cost parameters: Tell the optimizer the relative cost of different operations (sequential I/O vs random I/O, CPU cost per operation)
Parallelism settings: How many workers can be used for parallel query execution
Statistics targets: How detailed statistics should be for query planning

-- PostgreSQL optimizer settings
random_page_cost = 1.1          -- Low for SSDs (default 4 is for HDDs)
seq_page_cost = 1.0              -- Sequential page cost baseline
effective_io_concurrency = 200  -- Concurrent I/O requests for SSDs
max_parallel_workers = 8         -- Workers for parallel queries

These values must be tuned to match actual hardware characteristics. Default values often assume slow hardware that no longer represents modern systems.

Network and Security Configuration

Network configuration determines who and what can access the database, while security configuration protects both the data and the system from unauthorized access. These go hand-in-hand as critical post-installation tasks.

Network Binding:

Databases default to conservative network settings. Production deployment requires explicit configuration:

-- Listen on specific interfaces (never bind to 0.0.0.0 without firewall)
listen_addresses = 'localhost, 10.0.1.50'  -- Specific IPs only
port = 5432                                  -- Default PostgreSQL port

Firewall Configuration:

Database ports should be protected by firewall rules allowing only authorized sources:

firewall-rules.sh
1
2
3
4
5
6
7
8
# iptables example - restrict database port to application servers
iptables -A INPUT -p tcp --dport 5432 -s 10.0.2.0/24 -j ACCEPT  # App subnet
iptables -A INPUT -p tcp --dport 5432 -s 10.0.3.0/24 -j ACCEPT  # Admin subnet
iptables -A INPUT -p tcp --dport 5432 -j DROP                     # Block all else
 
# firewalld example
firewall-cmd --zone=database --add-source=10.0.2.0/24 --permanent
firewall-cmd --zone=database --add-port=5432/tcp --permanent

Authentication Configuration:

Authentication controls how users prove their identity. PostgreSQL's pg_hba.conf exemplifies host-based authentication:

pg_hba.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# TYPE  DATABASE        USER            ADDRESS                 METHOD
 
# Local socket connections
local   all             postgres                                peer
local   all             all                                     md5
 
# IPv4 local connections
host    all             all             127.0.0.1/32            scram-sha-256
 
# Application servers (specific subnet)
host    appdb           appuser         10.0.2.0/24             scram-sha-256
 
# Replication connections (standby servers)
host    replication     replicator      10.0.1.100/32           scram-sha-256
host    replication     replicator      10.0.1.101/32           scram-sha-256
 
# Deny everything else (implicit, but explicit for clarity)
host    all             all             0.0.0.0/0               reject

Security Configuration Best Practices

•Use Strong Authentication: scram-sha-256 or certificate-based authentication. Never use 'trust' or 'password' (MD5) in production.
•Principle of Least Privilege: Users should have minimum permissions necessary. Application accounts should rarely need superuser/admin.
•Enable SSL/TLS: Encrypt connections with TLS 1.2+ and valid certificates. ssl = on with proper certificate configuration.
•Audit Logging: Enable logging of authentication attempts, DDL statements, and privileged operations.
•Rotate Credentials: Implement password rotation policies and certificate renewal procedures.
•Disable Remote Superuser: Superuser/root should only be accessible from localhost or trusted admin hosts.

The Public Internet Exposure Risk

Databases should NEVER be directly exposed to the public internet. Within minutes of exposure, automated scanners will find and attack the database. Use VPNs, bastion hosts, or private networks. If external access is required, use connection proxies with additional authentication layers.

Cluster and High Availability Setup

Production databases require high availability (HA) to minimize downtime. HA configuration during initial setup is significantly easier than retrofitting later. The goal is eliminating single points of failure while maintaining data consistency.

Replication Topologies:

Different HA requirements demand different topologies:

High Availability Topologies
Topology	Description	RTO	RPO	Complexity
Primary-Standby	One primary, one or more standbys receiving replicated changes	Minutes	Seconds	Low
Primary-Synchronous Standby	Standby confirms receipt before primary commits	Seconds	Zero	Medium
Multi-Primary	Multiple nodes accept writes (conflict resolution required)	Seconds	Near-zero	High
Shared Storage	Multiple nodes share storage (only one active)	Seconds	Zero	Medium
Distributed (Consensus)	Multiple nodes vote on transactions (Raft/Paxos)	Seconds	Zero	Very High

Streaming Replication Configuration:

For PostgreSQL-style streaming replication, the primary server configuration:

primary-replication.conf
1
2
3
4
5
6
7
8
9
10
# Primary server replication settings
wal_level = replica                    # Enable WAL for replication
max_wal_senders = 5                    # Maximum replication connections
wal_keep_size = 1GB                    # WAL retained for slow standbys
max_replication_slots = 5              # Replication slot limit
hot_standby = on                       # Allow read queries on standby
 
# Synchronous replication (for zero data loss)
synchronous_standby_names = 'standby1'  # At least one must confirm
synchronous_commit = on                  # Wait for standby acknowledgment

Standby Server Setup:

Initializing a standby requires a base backup from primary:

# On standby server
pg_basebackup -h primary-host -D /data/postgresql -U replicator -P -R

# The -R flag creates standby.signal and configures connection

Automatic Failover:

Manual failover is error-prone and slow. Production systems should have automated failover:

Patroni (PostgreSQL): Distributed HA solution using etcd/Consul/ZooKeeper for leader election
Orchestrator (MySQL): Topology management with automated failover
Always On Availability Groups (SQL Server): Native HA with automatic failover
Cloud Provider Solutions: AWS RDS Multi-AZ, Azure SQL failover groups

Test Failover Before Production

A failover mechanism that has never been tested is not a failover mechanism—it's wishful thinking. Regularly (monthly at minimum) perform failover drills. Verify that applications reconnect, data is consistent, and the former primary can be rejoined as standby.

Configuration Validation and Documentation

Installation and configuration are not complete until the system has been validated and thoroughly documented. This final phase ensures the database operates correctly and that future administrators can understand and maintain the configuration.

Configuration Validation Checklist:

Post-Installation Validation Steps

•Service Startup: Verify database starts cleanly. Check logs for warnings or errors.
•Connectivity Test: Test connections from all expected sources (application servers, admin hosts, monitoring systems).
•Authentication Verification: Confirm each user account can authenticate with expected permissions.
•Replication Status: If HA configured, verify standbys are synchronized and lag is acceptable.
•Backup Test: Execute a full backup and verify it completes without errors.
•Restore Test: Restore the backup to a separate environment and verify data integrity.
•Failover Test: Trigger a failover and verify automatic promotion works correctly.
•Performance Baseline: Run a standard workload and capture baseline metrics for future comparison.
•Monitoring Integration: Verify all metrics are flowing to monitoring systems.
•Alerting Test: Trigger alert conditions and verify notifications are received.

Documentation Requirements:

Comprehensive documentation enables future maintenance and troubleshooting:

Essential Database Documentation
Document	Contents	Update Frequency
Installation Guide	Step-by-step installation procedure, specific to this environment	On version upgrade
Configuration Reference	All non-default parameters with rationale for each setting	On configuration change
Network Diagram	Database servers, application servers, backup systems, network zones	On infrastructure change
Runbook	Procedures for common operations: restarts, failovers, emergency access	Quarterly review
Recovery Playbook	Step-by-step disaster recovery procedures with RTO/RPO verification	Quarterly test
Credential Inventory	All accounts with purpose and owners (not passwords!)	On user changes
Change Log	History of all configuration changes with dates and reasons	Continuous

Configuration as Code

Modern infrastructure practices favor 'configuration as code'—storing database configuration in version control alongside deployment automation. Tools like Ansible, Terraform, and Puppet can provision and configure databases reproducibly. This approach enables infrastructure recovery, environment parity, and audit trails.

Baseline Metrics:

Capturing baseline performance metrics immediately after installation provides crucial reference points:

Query Response Time: Average and 95th percentile latency for standard queries
IOPS and Throughput: Storage I/O under normal load
Connection Utilization: Normal connection count vs. maximum
Memory Utilization: Buffer pool hit ratio, memory consumption
Replication Lag: Time delay between primary and standby

These baselines become invaluable when diagnosing future performance issues. "It's slower than before" is only meaningful if you know what "before" looked like.

Summary

Database installation and configuration form the foundation upon which all subsequent DBA responsibilities build. A properly configured database enables performance, security, and reliability; a poorly configured one guarantees problems.

Key Takeaways:

Installation and Configuration Essentials

•Pre-installation planning prevents expensive mistakes. Understand workload, estimate capacity, and make architectural decisions before touching any software.
•Hardware selection determines the performance ceiling. Prioritize memory and storage IOPS; they impact database performance more than CPU.
•OS configuration is often overlooked but critical. Kernel parameters, file systems, and ulimits must be tuned for database workloads.
•Directory structure should separate binaries, data, logs, and backups. Consider separate physical devices for transaction logs.
•Memory configuration has the highest impact. Size buffer pools appropriately and understand the database's memory architecture.
•Security configuration must be addressed from day one. Strong authentication, network restrictions, and encryption are non-negotiable.
•High availability should be configured during initial setup. Retrofitting HA later is complex and risky.
•Validation and documentation complete the installation. Test everything; document decisions for future maintainers.

What's Next:

With the database properly installed and configured, the next essential DBA responsibility is Performance Monitoring—the continuous observation and analysis that ensures the database continues to operate optimally as workloads evolve.

Page Complete

You now understand the comprehensive process of database installation and configuration at an enterprise level. These foundational skills ensure that databases start their operational lives properly configured for performance, security, and reliability. The next page explores performance monitoring—how we ensure databases maintain optimal operation over time.