Loading content...
Understanding database metrics is only half the battle. Collecting, storing, visualizing, and acting upon those metrics requires specialized tooling—a monitoring stack that operates continuously, scales with your infrastructure, and remains reliable precisely when things break.
Database monitoring has evolved dramatically. Early DBAs relied on command-line queries run manually or via cron scripts. Today's production environments demand real-time metric collection, historical trend analysis, intelligent alerting, and integration with incident management systems. The tools you choose fundamentally shape your ability to maintain database health.
This page surveys the monitoring tool landscape—from native database utilities every DBA must master, through specialized database monitors, to comprehensive observability platforms that unify metrics across entire infrastructures.
By the end of this page, you will understand native monitoring capabilities in major database systems, evaluate specialized database monitoring solutions, architect comprehensive monitoring stacks using open-source and commercial tools, and implement best practices for metric collection and retention.
Every major database system includes built-in monitoring facilities. These native tools provide the deepest visibility into database internals, serving as the foundation for all other monitoring.
PostgreSQL Monitoring:
PostgreSQL offers extensive monitoring through system catalogs and extensions:
12345678910111213141516171819202122232425
-- Active session monitoringSELECT pid, usename, datname, state, query_start, now() - query_start AS duration, queryFROM pg_stat_activityWHERE state = 'active' AND query NOT LIKE '%pg_stat_activity%'; -- Table I/O statisticsSELECT schemaname, relname, heap_blks_read, heap_blks_hit, ROUND(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) AS hit_ratioFROM pg_statio_user_tablesORDER BY heap_blks_read DESC LIMIT 10; -- Index usage analysisSELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetchFROM pg_stat_user_indexesORDER BY idx_scan DESC; -- Query statistics (requires pg_stat_statements extension)SELECT query, calls, total_exec_time, mean_exec_time, rows, shared_blks_hit, shared_blks_readFROM pg_stat_statementsORDER BY total_exec_time DESC LIMIT 10;| View | Purpose | Key Metrics |
|---|---|---|
| pg_stat_activity | Current session state | Active queries, wait events, connection state |
| pg_stat_database | Per-database statistics | Transaction counts, block reads, cache hits |
| pg_stat_user_tables | Table access statistics | Sequential scans, index scans, row operations |
| pg_stat_user_indexes | Index usage statistics | Index scans, tuples read/fetched |
| pg_stat_bgwriter | Background writer activity | Buffers written, checkpoints |
| pg_stat_replication | Replication status | Replica lag, write/flush/replay LSN |
MySQL / MariaDB Monitoring:
MySQL provides monitoring through SHOW commands and the Performance Schema:
123456789101112131415161718192021222324252627282930313233343536
-- Global status variablesSHOW GLOBAL STATUS WHERE Variable_name IN ( 'Queries', 'Threads_connected', 'Threads_running', 'Innodb_buffer_pool_read_requests', 'Innodb_buffer_pool_reads', 'Innodb_row_lock_waits', 'Innodb_row_lock_time'); -- Process list with query textSELECT id, user, host, db, command, time, state, infoFROM information_schema.processlistWHERE command != 'Sleep'; -- InnoDB buffer pool hit ratioSELECT (1 - (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests)) * 100 AS buffer_pool_hit_ratioFROM ( SELECT Variable_value AS Innodb_buffer_pool_reads FROM performance_schema.global_status WHERE Variable_name = 'Innodb_buffer_pool_reads') reads,( SELECT Variable_value AS Innodb_buffer_pool_read_requests FROM performance_schema.global_status WHERE Variable_name = 'Innodb_buffer_pool_read_requests') requests; -- Slow query digest from Performance SchemaSELECT DIGEST_TEXT, COUNT_STAR, SUM_TIMER_WAIT/1000000000000 AS total_time_sec, AVG_TIMER_WAIT/1000000000 AS avg_time_ms, SUM_ROWS_EXAMINED, SUM_ROWS_SENTFROM performance_schema.events_statements_summary_by_digestORDER BY SUM_TIMER_WAIT DESC LIMIT 10;SQL Server Monitoring:
SQL Server includes Dynamic Management Views (DMVs) and Extended Events:
12345678910111213141516171819202122232425262728293031323334353637
-- Currently executing queriesSELECT r.session_id, r.status, r.command, r.cpu_time, r.total_elapsed_time, r.logical_reads, r.writes, t.text AS query_textFROM sys.dm_exec_requests rCROSS APPLY sys.dm_exec_sql_text(r.sql_handle) tWHERE r.session_id > 50; -- Wait statisticsSELECT wait_type, waiting_tasks_count, wait_time_ms, max_wait_time_ms, signal_wait_time_msFROM sys.dm_os_wait_statsWHERE waiting_tasks_count > 0ORDER BY wait_time_ms DESC; -- Buffer pool usage by databaseSELECT DB_NAME(database_id) AS database_name, COUNT(*) * 8 / 1024 AS buffer_pool_mbFROM sys.dm_os_buffer_descriptorsGROUP BY database_idORDER BY buffer_pool_mb DESC; -- Top queries by CPUSELECT TOP 10 qs.total_worker_time/1000000 AS total_cpu_sec, qs.execution_count, qs.total_worker_time/qs.execution_count/1000 AS avg_cpu_ms, SUBSTRING(st.text, (qs.statement_start_offset/2)+1, ((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(st.text) ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1) AS query_textFROM sys.dm_exec_query_stats qsCROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) stORDER BY total_cpu_sec DESC;Third-party monitoring tools ultimately query these native interfaces. Understanding the underlying system views lets you debug monitoring issues, write custom queries for specific investigations, and validate what monitoring tools report.
Beyond SQL queries, databases and operating systems provide command-line utilities for real-time monitoring. These tools are invaluable for interactive troubleshooting and quick health checks.
Database-Specific CLI Tools:
| Database | Tool | Purpose |
|---|---|---|
| PostgreSQL | pg_top | Real-time query activity viewer (top-like interface) |
| PostgreSQL | pgbadger | Log analyzer generating HTML reports |
| PostgreSQL | pg_stat_monitor | Advanced query statistics extension |
| MySQL | mysqladmin | Server status and process management |
| MySQL | mytop | Real-time query monitoring (top-like) |
| MySQL | pt-query-digest | Slow query log analyzer from Percona Toolkit |
| SQL Server | sqlcmd | Command-line query execution |
| Oracle | SQL*Plus | Interactive SQL and PL/SQL execution |
| Oracle | adrci | Automatic Diagnostic Repository CLI |
Operating System Monitoring:
Database performance depends on OS resources. These tools monitor the host system:
1234567891011121314151617181920212223242526
# CPU, memory, and process overviewtop -c # Interactive process viewerhtop # Enhanced interactive viewer (if installed) # Virtual memory statisticsvmstat 1 # Per-second memory/CPU stats # Disk I/O monitoringiostat -xz 1 # Extended disk statistics per secondiotop # Per-process I/O activity # Network monitoringnetstat -an # All connectionsss -s # Socket statistics summaryiftop # Per-connection bandwidth # Memory usage breakdownfree -h # Human-readable memory summarycat /proc/meminfo # Detailed memory information # File system usagedf -h # Disk space by mount pointdu -sh /var/lib/postgresql # Database directory size # Database-specific file activitylsof -p $(pgrep postgres) # Files open by PostgreSQLIf your database server is using swap, performance is already severely degraded. Databases rely on predictable memory access patterns; swapping destroys this. Monitor swap usage and investigate immediately if any swap is active.
While native tools provide raw data, dedicated database monitoring solutions add critical capabilities: historical storage, trend analysis, intelligent alerting, and correlation across instances.
Open-Source Database Monitors:
| Tool | Target Databases | Key Features |
|---|---|---|
| Prometheus + PostgreSQL Exporter | PostgreSQL | Time-series metrics, PromQL queries, Grafana integration |
| Prometheus + MySQL Exporter | MySQL/MariaDB | Comprehensive MySQL metrics, Performance Schema integration |
| PMM (Percona Monitoring) | MySQL, PostgreSQL, MongoDB | Full-stack monitoring, query analytics, free and open-source |
| pgwatch2 | PostgreSQL | Metrics collection, pre-built dashboards, alerting |
| pg_monitor | PostgreSQL | Extension providing standardized metric views |
| VividCortex (SolarWinds DPM) | Multi-database | Query-level insights, ML-driven analysis |
Percona Monitoring and Management (PMM):
PMM deserves special attention as a comprehensive, free solution:
Commercial Database Monitoring:
Enterprise environments often require commercial solutions with support contracts:
| Solution | Strengths | Best For |
|---|---|---|
| Datadog Database Monitoring | Deep integrations, APM correlation, cloud-native | Multi-cloud, microservices environments |
| New Relic Database | Entity mapping, distributed tracing, AI analysis | Large-scale production with APM needs |
| SolarWinds DPA | Wait-time analysis, query tuning advisors | SQL Server and Oracle environments |
| Redgate SQL Monitor | SQL Server focus, detailed wait analysis | Microsoft SQL Server shops |
| Oracle Enterprise Manager | Deep Oracle integration, lifecycle management | Oracle database environments |
| SentryOne (now SolarWinds) | Query Plan analysis, workload comparison | SQL Server performance optimization |
If you're building a monitoring stack from scratch, Percona Monitoring and Management provides exceptional value. It's free, well-documented, actively maintained, and covers the vast majority of monitoring needs for MySQL and PostgreSQL.
Prometheus has emerged as the de facto standard for time-series metric collection in cloud-native environments. Understanding its architecture is essential for modern database monitoring.
Prometheus Architecture:
Database Exporters Configuration:
PostgreSQL exporter setup example:
12345678910111213141516171819202122232425262728293031323334
# Docker Compose setup for PostgreSQL monitoringversion: '3.8'services: postgres_exporter: image: prometheuscommunity/postgres-exporter environment: DATA_SOURCE_NAME: "postgresql://monitor:password@postgres:5432/postgres?sslmode=disable" ports: - "9187:9187" prometheus: image: prom/prometheus volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090" command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.retention.time=30d' ---# prometheus.ymlglobal: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'postgresql' static_configs: - targets: ['postgres_exporter:9187'] relabel_configs: - source_labels: [__address__] target_label: instance replacement: 'prod-db-1'PromQL for Database Metrics:
PromQL (Prometheus Query Language) enables powerful metric analysis:
12345678910111213141516171819202122232425
# Current active connectionspg_stat_activity_count{state="active"} # Query rate over 5 minutesrate(pg_stat_database_xact_commit[5m]) # Buffer cache hit ratiosum(pg_stat_database_blks_hit) / (sum(pg_stat_database_blks_hit) + sum(pg_stat_database_blks_read)) * 100 # 95th percentile query latency (histogram)histogram_quantile(0.95, sum(rate(pg_query_duration_seconds_bucket[5m])) by (le)) # Connections approaching limit (>80%)pg_stat_activity_count / pg_settings_max_connections * 100 > 80 # Disk space growth rate (MB/hour)deriv(pg_database_size_bytes[1h]) / 1024 / 1024 # Replication lag in secondspg_replication_lag_seconds > 30 # Transaction rollback ratiorate(pg_stat_database_xact_rollback[5m]) / (rate(pg_stat_database_xact_commit[5m]) + rate(pg_stat_database_xact_rollback[5m])) * 100Prometheus stores each unique label combination as a separate time series. High-cardinality labels (like individual query texts or session IDs) can explode storage requirements. Use normalized query fingerprints, not raw SQL, when labeling query metrics.
While metrics provide numerical measurements, logs capture events, errors, and detailed query information. A complete monitoring strategy integrates both.
Database Log Types:
| Log Type | Content | Monitoring Value |
|---|---|---|
| Error Log | Server errors, startup/shutdown, crashes | Immediate incident detection |
| Slow Query Log | Queries exceeding time threshold | Performance optimization candidates |
| General Query Log | All executed queries (high overhead) | Debugging, auditing (not production) |
| Audit Log | Security-relevant events, access patterns | Compliance, security monitoring |
| Transaction Log | All database modifications | Recovery, replication (not for monitoring) |
Configuring Slow Query Logging:
12345678910111213141516171819
# postgresql.conflog_destination = 'stderr'logging_collector = onlog_directory = 'log'log_filename = 'postgresql-%Y-%m-%d.log' # Log slow querieslog_min_duration_statement = 1000 # Log queries > 1 second # Optional: Log all statements (high overhead)# log_statement = 'all' # Include useful contextlog_line_prefix = '%t [%p]: db=%d,user=%u,app=%a,client=%h 'log_checkpoints = onlog_connections = onlog_disconnections = onlog_lock_waits = onlog_temp_files = 0 # Log all temp file usageLog Aggregation Architecture:
Centralized log management is essential for multi-server environments:
If you're already using Prometheus + Grafana, consider Grafana Loki for logs. It uses the same label-based approach as Prometheus, stores logs efficiently without full indexing, and integrates seamlessly into existing Grafana dashboards alongside your metrics.
Cloud database services include built-in monitoring that integrates with the cloud provider's observability ecosystem. Understanding these tools is essential for cloud deployments.
AWS RDS Monitoring:
Azure SQL Database Monitoring:
Google Cloud SQL Monitoring:
| Feature | AWS RDS | Azure SQL | Cloud SQL |
|---|---|---|---|
| Wait Analysis | Performance Insights | Query Store | Query Insights |
| Index Recommendations | Limited | Automatic Tuning | Limited |
| OS-Level Metrics | Enhanced Monitoring | Built-in | Built-in |
| Query Normalization | ✅ | ✅ | ✅ |
| Historical Retention | 7 days (free) | 30 days | 7 days |
Cloud-provided monitoring is convenient but often limited in retention, customization, and cross-cloud visibility. Most production environments supplement cloud monitoring with their own observability stack (Prometheus, Datadog, etc.) for consistent monitoring across all environments.
A production monitoring stack requires thoughtful architecture. Consider collection frequency, storage requirements, visualization needs, and operational overhead.
Reference Architecture:
Key Architectural Decisions:
| Decision | Considerations | Recommendation |
|---|---|---|
| Scrape Interval | Granularity vs storage cost; 15s is standard | 15s for metrics; 1s for troubleshooting only |
| Retention Period | Disk space vs historical analysis needs | 30-90 days raw; 2 years downsampled |
| High Availability | Monitoring must survive when databases fail | Separate infrastructure; multiple Prometheus replicas |
| Remote Storage | Long-term storage and global queries | Thanos, Cortex, or Mimir for Prometheus |
| Security | Credential management, network isolation | Exporters inside network; encrypted transport |
Your monitoring system is itself critical infrastructure. If Prometheus goes down, you lose visibility during an incident. Implement health checks on monitoring components, keep them on separate infrastructure from monitored databases, and have fallback procedures for when monitoring fails.
We've surveyed the monitoring tool landscape from native utilities to enterprise platforms. The right choice depends on your scale, budget, and existing infrastructure.
What's Next:
With metrics collected and dashboards visualized, the next page addresses Alert Configuration—how to translate monitoring data into actionable notifications that wake you up for real problems while letting you sleep through noise.
You now understand the database monitoring tool landscape—from native database utilities to cloud-managed solutions to comprehensive observability platforms. Next, we'll focus on configuring alerts that turn this data into timely, actionable notifications.