Operating SystemsDistributed Clocks

Distributed Clocks and Time Synchronization

LevelAdvanced

Duration90 mins

TopicDistributed Clocks

4 / 5

NTP (Network Time Protocol)

The Internet's Timekeeping Infrastructure

Every second, billions of devices around the world ask the same question: what time is it? The answer comes from NTP—the Network Time Protocol—one of the oldest and most ubiquitous protocols on the internet, yet one that most engineers rarely think about.

NTP has been quietly running since 1985, designed by David Mills at the University of Delaware. It has evolved through four major versions while maintaining backward compatibility. Today, NTP synchronizes everything from smartphones to stock exchanges, from data centers to scientific instruments.

Understanding NTP is essential for any distributed systems engineer. When your distributed database depends on timestamps, when your logs need to be correlated across servers, when your security certificates need to be validated—NTP is the invisible foundation making it all work.

This page explores NTP in depth: its architecture, algorithms, accuracy bounds, failure modes, and best practices for production systems.

What You Will Learn

By the end of this page, you will understand: NTP's hierarchical stratum architecture; the clock discipline algorithm that smoothly adjusts local clocks; how to configure and monitor NTP in production; the accuracy you can expect in different environments; common failure modes and their mitigations; and when to consider alternatives like PTP.

NTP Architecture and Stratum Hierarchy

NTP organizes time sources into a hierarchical structure called the stratum hierarchy. This design provides scalability, redundancy, and graceful degradation.

Stratum Levels:

Stratum 0: Reference clocks—the authoritative time sources. These are devices like atomic clocks, GPS receivers, or radio receivers tuned to time standards (WWV, DCF77). Stratum 0 devices are not directly networked; they connect to stratum 1 servers.
Stratum 1: Primary time servers directly connected to stratum 0 references. These are the root of the NTP hierarchy on the network. Examples include servers at national time institutes (NIST, PTB) and major internet companies.
Stratum 2: Servers synchronized to stratum 1. These often serve as the primary time source for organizations, reducing load on public stratum 1 servers.
Stratum 3-15: Each level synchronized to the level above. Stratum increases by 1 for each hop from stratum 1.
Stratum 16: Unsynchronized. A server advertising stratum 16 is saying "don't trust my time."

Why Stratum Hierarchy?

Scalability: Billions of devices can't all query a handful of atomic clocks. The hierarchy distributes load.
Redundancy: Each client can have multiple servers. If one fails or becomes inaccurate, others remain.
Locality: Organizations run local stratum 2-3 servers, reducing network latency and external dependencies.

NTP Stratum Levels in Practice
Stratum	Typical Examples	Accuracy to UTC	Use Case
0	Cesium clock, GPS receiver, WWVB receiver	±1 nanosecond to ±1 microsecond	Reference standard
1	time.nist.gov, ptbtime1.ptb.de	±10 microseconds	National labs, major providers
2	pool.ntp.org servers, Google time servers	±1-10 milliseconds	Internet time infrastructure
3	Corporate NTP servers	±10-100 milliseconds	Enterprise networks
4-5	Desktop computers, mobile devices	±100-500 milliseconds	End-user devices

NTP Pool Project:

The NTP Pool (pool.ntp.org) is a volunteer network of thousands of NTP servers. When you configure a device to use pool.ntp.org, DNS returns different server IPs based on your region, distributing load globally.

The pool provides zone-specific addresses:

pool.ntp.org - Global
north-america.pool.ntp.org - Regional
us.pool.ntp.org - Country-specific

This geo-distribution reduces latency and improves synchronization quality.

Reference Clock Drivers:

Stratum 1 servers need drivers for their reference clocks. Common reference clock types:

GPS: PPS (pulse-per-second) signal provides ±1 microsecond accuracy. Requires antenna with sky view.
CDMA/LTE: Cellular network timing (if available) provides ~1 millisecond accuracy.
Radio: DCF77 (Germany), WWVB (USA), MSF (UK) provide long-wave time signals.
Atomic clocks: Rubidium or cesium standards for highest accuracy without external dependency.

Stratum Is Distance, Not Quality

A common misconception: stratum doesn't directly indicate accuracy. A well-configured stratum 3 server with a low-latency connection might be more accurate than a poorly-configured stratum 2. NTP uses additional metrics (delay, dispersion, jitter) to select the best source.

NTP Protocol Mechanics

NTP operates over UDP port 123. The protocol exchanges timestamps between client and server, using the round-trip to estimate and compensate for network delay.

NTP Packet Format (Simplified):

+-----+-----+--------+--------+------+------+-------+-------+
| LI  | VN  | Mode   | Strat  | Poll | Prec | Root  | Root  |
| (2) | (3) | (3)    | (8)    | (8)  | (8)  | Delay | Disp  |
+-----+-----+--------+--------+------+------+-------+-------+
|              Reference Identifier (32)                     |
+------------------------------------------------------------+
|              Reference Timestamp (64)                       |
+------------------------------------------------------------+
|              Origin Timestamp (64)                          |
+------------------------------------------------------------+
|              Receive Timestamp (64)                         |
+------------------------------------------------------------+
|              Transmit Timestamp (64)                        |
+------------------------------------------------------------+

Key Fields:

LI (Leap Indicator): 2 bits indicating leap second status
VN (Version Number): NTP version (currently 4)
Mode: Client (3), Server (4), Broadcast (5), etc.
Stratum: Server's stratum level
Poll: Log₂ of polling interval in seconds (e.g., 10 = 2¹⁰ = 1024 seconds)
Precision: Log₂ of clock precision (e.g., -20 ≈ 1 microsecond)
Root Delay: Total round-trip delay to stratum 1
Root Dispersion: Total dispersion (error) to stratum 1
Timestamps: Four 64-bit timestamps for RTT calculation

ntp-algorithm.pseudo

Pseudocode

// NTP Timestamp Exchange and Offset Calculation
 
// NTP uses four timestamps:
// T1: Client transmit time (origin timestamp)
// T2: Server receive time  
// T3: Server transmit time
// T4: Client receive time
 
CLIENT ALGORITHM:
    FUNCTION synchronize_with(server):
        // Record local time when sending request
        T1 = local_clock()
        
        // Send NTP request (mode=3, client)
        request = create_ntp_packet(mode=CLIENT, transmit_timestamp=T1)
        send(request, server)
        
        // Receive response
        response, T4 = receive()  // T4 is local time at receipt
        
        // Extract server timestamps
        T2 = response.receive_timestamp    // When server received our request
        T3 = response.transmit_timestamp   // When server sent response
        
        // Calculate round-trip delay
        // Total time = (T4 - T1), but server processing took (T3 - T2)
        delay = (T4 - T1) - (T3 - T2)
        
        // Calculate clock offset
        // Server was at T2 when we were at T1, and at T3 when we were at T4
        // Assuming symmetric network delay: each direction took delay/2
        // Offset = server_time - client_time
        offset = ((T2 - T1) + (T3 - T4)) / 2
        
        RETURN (offset, delay)
 
// Why this formula works:
// Let θ = true offset (server ahead by θ)
// Let δ₁ = client→server delay, δ₂ = server→client delay
// 
// T2 = T1 + δ₁ + θ  (server receive = client send + delay + offset)
// T4 = T3 + δ₂ - θ  (client receive = server send + delay - offset)
//
// From first equation:  T2 - T1 = δ₁ + θ
// From second equation: T4 - T3 = δ₂ - θ
//
// Adding: (T2-T1) + (T4-T3) = δ₁ + δ₂ = total network delay
// But (T4-T3) = -(T3-T4), so:
// (T2-T1) - (T3-T4) = δ₁ + δ₂
//
// Subtracting: (T2-T1) - (T4-T3) = δ₁ - δ₂ + 2θ
// If δ₁ = δ₂ (symmetric delay): (T2-T1) - (T4-T3) = 2θ
// So θ = ((T2-T1) + (T3-T4)) / 2
 
SERVER ALGORITHM:
    ON_RECEIVE request from client:
        T2 = local_clock()  // Record receive time
        
        // Process request, prepare response
        response = create_ntp_packet(
            mode = SERVER,
            stratum = my_stratum,
            origin_timestamp = request.transmit_timestamp,  // T1 from client
            receive_timestamp = T2,
            transmit_timestamp = local_clock()  // T3, as late as possible
        )
        
        send(response, client)

Error Bounds:

The offset calculation assumes symmetric network delay. If the delays are asymmetric (δ₁ ≠ δ₂), the error in our offset estimate is:

Error = (δ₁ - δ₂) / 2

This error is bounded by:

|Error| ≤ delay / 2

Where delay is the measured round-trip time minus server processing time. Lower delay means tighter error bounds. This is why NTP prefers servers with lower round-trip times.

NTP Modes:

Client/Server: Most common. Client queries server, server responds.
Symmetric Active/Passive: Peers synchronize bidirectionally. Used between stratum 1-2 servers.
Broadcast/Multicast: Server periodically broadcasts time. Lower accuracy but reduces server load.
Manycast: Client discovers servers via multicast; transitions to unicast client/server mode.

Polling Interval:

NTP adapts polling interval based on synchronization quality:

Starts around 64 seconds (poll = 6)
Can extend to 1024+ seconds when stable
Shortens during initial sync or instability

Adaptive polling balances accuracy (frequent polling) against network/server load (infrequent polling).

Timestamp Precision

NTP timestamps are 64-bit: 32 bits for seconds since 1900, 32 bits for fractions (resolution ~232 picoseconds). However, actual precision is limited by OS clock resolution (typically 1ms-1μs) and network jitter (milliseconds). The packet format precision far exceeds practical accuracy.

Clock Discipline Algorithm

Raw offset measurements are noisy. A naive approach of jumping to each new offset would cause clock instability. NTP uses a sophisticated clock discipline algorithm that filters noise and smoothly adjusts the local clock.

The Challenge:

Each offset measurement (θ) has error from:

Network delay uncertainty (asymmetry)
Server clock error
Local clock read precision
OS scheduling jitter

We need to extract the true offset from noisy samples while also tracking clock frequency drift.

NTP's Approach: A Phase-Locked Loop (PLL)

NTP treats clock synchronization as a control system problem. The local clock is a noisy oscillator that must track a reference (the NTP servers). A phase-locked loop adjusts both:

Phase: The current time offset (where the clock is)
Frequency: The clock rate (how fast the clock ticks)

The PLL has a time constant that trades off response speed against noise rejection:

Short time constant: Quick response to real changes, but susceptible to noise
Long time constant: Stable, but slow to correct actual drift

clock-discipline.pseudo

Pseudocode

// Simplified NTP Clock Discipline Algorithm
 
// State variables
offset_estimate: float = 0.0      // Estimated offset (phase)
frequency_estimate: float = 0.0   // Estimated frequency error (ppm)
poll_interval: int = 64           // Current polling interval (seconds)
 
// PLL parameters (simplified)
TIME_CONSTANT: float = 4.0        // Larger = more stable, slower response
FREQUENCY_GAIN: float = 0.25      // How quickly to adjust frequency
 
FUNCTION on_measurement(offset, delay):
    // Filter: ignore samples with very high delay (likely asymmetric)
    IF delay > MAX_ACCEPTABLE_DELAY:
        RETURN  // Discard this sample
    
    // Update offset estimate (phase correction)
    // Apply a fraction of the error based on time constant
    phase_adjustment = offset / (TIME_CONSTANT * poll_interval)
    
    // Update frequency estimate
    // If we consistently see positive offset, our clock is slow
    frequency_adjustment = offset * FREQUENCY_GAIN / (poll_interval ^ 2)
    frequency_estimate += frequency_adjustment
    
    // Apply adjustments
    slew_clock(phase_adjustment)
    adjust_clock_frequency(frequency_estimate)
    
    // Adapt polling interval
    IF |offset| < STABLE_THRESHOLD:
        poll_interval = min(poll_interval * 2, MAX_POLL)
    ELIF |offset| > UNSTABLE_THRESHOLD:
        poll_interval = max(poll_interval / 2, MIN_POLL)
 
FUNCTION slew_clock(adjustment):
    // Gradually adjust clock rather than jumping
    // Typical slew rate: 500 ppm (0.5ms per second)
    IF adjustment > 0:
        // Clock is slow, speed it up temporarily
        temporarily_adjust_rate(+SLEW_RATE)
        duration = adjustment / SLEW_RATE
    ELSE:
        // Clock is fast, slow it down
        temporarily_adjust_rate(-SLEW_RATE)
        duration = -adjustment / SLEW_RATE
    
    SCHEDULE restore_normal_rate(after=duration)
 
// For large initial offsets, NTP may "step" the clock
FUNCTION handle_large_offset(offset):
    IF |offset| > STEP_THRESHOLD (typically 128ms):
        IF system_just_started:
            // Step is acceptable during startup
            instant_adjust_clock(offset)
        ELSE:
            // Large offset during normal operation is suspicious
            // Might step, might panic, depends on configuration
            IF |offset| > PANIC_THRESHOLD (typically 1000s):
                ABORT("Clock offset too large, manual intervention needed")
            ELSE:
                instant_adjust_clock(offset)  // Or refuse and log

Stepping vs. Slewing:

Stepping: Instantly change the clock time. Fast, but can cause:
- Negative time jumps (time going backward)
- Log entries appearing out of order
- Timeouts triggering unexpectedly
- Database anomalies
Slewing: Gradually adjust the clock rate. The clock runs slightly fast or slow until it catches up. Safer, but:
- Can take minutes or hours to correct large offsets
- Slew rate is limited (typically 500 ppm = 43 seconds/day max adjustment)

NTP's Behavior:

Offsets < 128ms: Slew to correct
Offsets 128ms - 1000s: May step, depending on configuration
Offsets > 1000s: Panic—something is seriously wrong; manual intervention needed

Clock Filter Algorithm:

NTP doesn't use just the latest measurement. It maintains a buffer of the last 8 samples and selects the best one based on:

Delay: Prefer samples with lower round-trip delay (tighter error bound)
Age: Weight recent samples more heavily
Dispersion: Account for increasing error as samples age (due to clock drift)

This filtering rejects outliers while maintaining a stable offset estimate.

The -x Flag

Many NTP implementations have a 'don't step' option (e.g., ntpd -x). This forces slewing for all offsets except at startup. Useful for systems where time jumps are unacceptable, but understand that large offsets will take hours to correct.

Server Selection and Mitigation

When configured with multiple time servers, NTP must select which servers to trust and how to combine their inputs. This is handled by the selection and clustering algorithms.

The Problem:

Given 5 servers with offsets [-10ms, +5ms, +8ms, +50ms, -200ms], which do we believe?

Servers 1-3 roughly agree
Server 4 is somewhat off
Server 5 is way off (faulty? compromised? network issues?)

NTP needs to identify and exclude the outliers, then combine the trustworthy servers.

Selection Algorithm (Intersection Algorithm):

For each server, calculate a confidence interval based on its precision and network delay
Find the largest intersection containing more than half the servers
Servers outside this intersection are "falsetickers" (incorrect) and excluded
Servers inside are "truechimers" (correct)

This algorithm is Byzantine-tolerant: with n servers, it can tolerate up to (n-1)/2 faulty servers.

Clustering Algorithm:

Among truechimers, select the best servers based on:

Stratum: Lower is better
Jitter: More stable is better
Distance: Combination of delay and dispersion (syncdist = delay/2 + dispersion)

Best Practices: Server Selection

•Use at least 4 servers — Allows identifying 1 faulty server. 3 servers can detect fault but not identify.
•Prefer 5-7 servers — Good balance of redundancy and overhead. More servers don't improve accuracy much.
•Diverse sources — Use different providers, locations, and network paths to avoid correlated failures.
•Low-latency servers — Geographic proximity reduces delay and improves accuracy bounds.
•Consider local stratum 2 — Running your own server reduces external dependencies, improves within-network sync.

Server Selection Anti-Patterns

•Single server — No way to detect if it's wrong. Any failure means losing sync.
•Two servers — If they disagree, which is right? NTP can't decide.
•Three identical virtual servers — If the hypervisor's time is wrong, all three give wrong time.
•All from one provider — Provider outage = total sync loss. Use at least 2 providers.
•Ignoring stratum — All pool servers at stratum 2 is fine; mixing with stratum 5 adds unnecessary hierarchy.

The Combine Algorithm:

After selecting trusted servers, NTP combines their inputs. It doesn't simply average—it weights by quality:

combined_offset = Σ(weight_i × offset_i) / Σ(weight_i)

Where weight is inversely proportional to the selection distance (better servers have more weight).

Prefer and True Options:

NTP configuration allows marking servers:

prefer: This server should be chosen first if it's a truechimer. Useful for designating a particularly trusted local server.
true: Assume this server is always correct; never declare it a falseticker. Dangerous—only use for authoritative reference clocks.

The Orphan Mode:

What happens when all external time sources become unreachable? Orphan mode allows a group of servers to maintain internal synchronization:

Servers switch to orphan mode when all external references are lost
They synchronize among themselves using their own (drifting) clocks
The server with the lowest orphan stratum becomes the de facto reference
When external references return, normal operation resumes

This maintains internal consistency even during network partitions.

Leap Smearing

Some providers (Google, AWS) 'smear' leap seconds—instead of inserting a 23:59:60 second, they slightly slow the clock around midnight to absorb the extra second. This avoids leap-second bugs in applications, but means their NTP time is slightly wrong during smearing. Don't mix smearing and non-smearing servers!

NTP Implementations and Configuration

Several NTP implementations exist, each with different tradeoffs:

ntpd (Reference Implementation):

The original NTP daemon, maintained by NTP.org. Most feature-complete but largest codebase.

# /etc/ntp.conf
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst
server 2.pool.ntp.org iburst
server 3.pool.ntp.org iburst

driftfile /var/lib/ntp/ntp.drift
restrict default kod nomodify notrap nopeer noquery

chronyd (Chrony):

Modern implementation, popular on Linux (default on RHEL/CentOS, Fedora). Better handles intermittent connectivity and virtualized environments.

# /etc/chrony.conf
pool pool.ntp.org iburst

driftfile /var/lib/chrony/drift
makestep 1.0 3  # Step if offset > 1s, but only for first 3 updates
rtcsync

systemd-timesyncd:

Simple SNTP client, not a full NTP implementation. Suitable for client-only machines.

OpenNTPD:

BSD-focused, security-oriented, simpler than ntpd.

ntp-monitoring.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#!/bin/bash
# NTP Monitoring and Diagnostics
 
#---------------------------------------
# Check ntpd synchronization status
#---------------------------------------
echo "=== ntpq -p (peer status) ==="
ntpq -p
# Output columns:
#   remote: server name/IP (* = current sync source, + = candidate)
#   refid: server's reference (e.g., .GPS., another IP)
#   st: stratum
#   t: type (u=unicast, b=broadcast, l=local)
#   when: seconds since last response
#   poll: current poll interval
#   reach: reachability (octal, 377 = last 8 polls succeeded)
#   delay: round-trip delay (ms)
#   offset: clock offset (ms)
#   jitter: dispersion (ms)
 
echo ""
echo "=== ntpq -c rv (system status) ==="
ntpq -c rv
# Shows: stratum, precision, rootdelay, rootdisp, offset, sys_jitter
 
#---------------------------------------
# Check chrony synchronization status
#---------------------------------------
echo ""
echo "=== chronyc tracking ==="
chronyc tracking
# Shows: reference ID, stratum, update interval, offset, frequency error
 
echo ""
echo "=== chronyc sources -v ==="
chronyc sources -v
# Shows: mode, state, name, stratum, poll, reach, last sample offset
 
#---------------------------------------
# Check current time offset from NTP server
#---------------------------------------
echo ""
echo "=== ntpdate -q (query without setting) ==="
ntpdate -q pool.ntp.org 2>/dev/null ||     sntp pool.ntp.org  # Fall back to sntp if ntpdate not available
 
#---------------------------------------
# Check drift file (clock frequency error)
#---------------------------------------
echo ""
echo "=== Drift file (frequency error in ppm) ==="
cat /var/lib/ntp/ntp.drift 2>/dev/null ||     cat /var/lib/chrony/drift 2>/dev/null ||     echo "No drift file found"
 
#---------------------------------------
# Monitor offset over time
#---------------------------------------
echo ""
echo "=== Monitoring offset (Ctrl+C to stop) ==="
while true; do
    offset=$(chronyc tracking 2>/dev/null | grep "System time" | awk '{print $4}')
    if [ -z "$offset" ]; then
        offset=$(ntpq -c rv 2>/dev/null | grep -o 'offset=[0-9.-]*' | cut -d= -f2)
    fi
    echo "$(date '+%H:%M:%S') offset: ${offset:- unknown}"
    sleep 10
done

Key Configuration Options:

Option	ntpd	chrony	Purpose
`iburst`	server ... iburst	(default)	Send 8 rapid requests at startup
`minpoll / maxpoll`	minpoll 6 maxpoll 10	minpoll 6 maxpoll 10	Control polling interval (2^n seconds)
`prefer`	server ... prefer	(not needed)	Mark preferred server
`step threshold`	tinker step 0.1	makestep 0.1 3	When to step vs. slew
`drift file`	driftfile /path	driftfile /path	Persist frequency correction
`restrict`	restrict default nomodify	(cmdallow/cmdport)	Access control

Cloud Environment Considerations:

Virtualized environments introduce additional challenges:

Timer virtualization: VM clocks may jump during migration or pause.
Hypervisor time sync: VMware Tools, Hyper-V time sync can conflict with NTP.
CPU stealing: Other VMs consuming CPU causes clock error.

Cloud Provider Solutions:

AWS: Uses Amazon Time Sync Service (169.254.169.123), provides ±1ms within region
GCP: Google's time servers with leap smearing
Azure: Uses Windows Time Service or chrony with Azure-specific recommendations

Recommendations for Cloud:

Disable hypervisor time sync (or set it to one-way only)
Use provider's internal NTP servers (lower latency)
Consider chrony instead of ntpd (better handles interruptions)
Configure aggressive initial sync for instances that start frequently

Quick Cloud NTP Setup

For most cloud workloads, this chrony config works well: makestep 1.0 - 1 (always step if >1s offset), leapsecmode slew (smear leap seconds locally), maxdistance 16.0 (accept reference up to 16s away—handles cloud interruptions). Adjust based on your accuracy needs.

Security Considerations

NTP security is surprisingly important. Many systems depend on accurate time for security functions:

Certificate validation: Certificates have validity periods
Kerberos authentication: Strict time requirements (typically ±5 minutes)
Log forensics: Accurate timestamps for incident investigation
Replay attack prevention: Nonces and timestamps prevent replaying old messages

Attack Vectors:

1. Off-path attacks: Attacker cannot see NTP traffic but tries to:

Spoof UDP source addresses to inject false time
DoS the NTP servers to force fallback to less accurate sources

2. On-path (MITM) attacks: Attacker can intercept and modify NTP traffic:

Modify timestamps to skew target clock
Delay packets to create asymmetric delay (hard to detect)
Block responses from accurate servers

3. Denial of Service:

Amplification: NTP monlist command returned huge responses (fixed in modern versions)
Reflection: Attacker spoofs victim's IP, NTP server responds to victim
Resource exhaustion: Flood target's NTP infrastructure

NTP Security Mitigations

•Multiple diverse sources — Attacker must compromise majority to skew time. Use servers from different organizations and network paths.
•NTS (Network Time Security) — NTP 4.1+ supports authenticated, encrypted NTP using TLS-derived keys. Prevents tampering and impersonation.
•Restrict access — Limit who can query or configure your NTP servers using restrict directives.
•Disable dangerous commands — Ensure monlist, readvar, etc. are disabled to prevent amplification attacks.
•Rate limiting — Limit response rate via kod (Kiss of Death) and limited restrictions.
•Monitor for anomalies — Alert on sudden large offset changes, stratum changes, or server instability.
•Local stratum 1 or 2 — Running GPS-disciplined local time server reduces external attack surface.

ntp-secure.conf

Config

# Secure NTP Configuration(/etc/ntp.conf or / etc / chrony.conf)
 
#---------------------------------------
# ntpd example
#---------------------------------------
# Diverse server sources
server time1.google.com iburst
server time2.google.com iburst
server 0.pool.ntp.org iburst
server time.cloudflare.com iburst
 
# Drift file for frequency stability
driftfile /var/lib/ntp / ntp.drift
 
# Security restrictions
# Default: ignore all requests
restrict default kod limited nomodify notrap nopeer noquery
 
# Allow localhost full access
restrict 127.0.0.1
restrict :: 1
 
# Allow queries from local network(adjust as needed)
restrict 192.168.0.0 mask 255.255.0.0 nomodify notrap
 
# Disable dangerous commands
disable monitor
 
# Log all clock sync events
logconfig =syncstatus + sysevents
 
#---------------------------------------
# chrony example(more secure defaults)
#---------------------------------------
# NTS - enabled(authenticated) servers
server time.cloudflare.com iburst nts
server nts.sth1.ntp.se iburst nts
server nts.ntp.se iburst nts
 
# Fallback to regular NTP
pool pool.ntp.org iburst
 
# Directory for NTS cookies
ntsdumpdir /var/lib/chrony
 
# Allow NTP client access from local network only
allow 192.168.0.0 / 16
deny all
 
# Only allow chronyc from localhost
bindcmdaddress 127.0.0.1
    bindcmdaddress:: 1
 
# Log significant changes
log measurements statistics tracking

Network Time Security (NTS):

NTS, standardized in RFC 8915 (2020), brings modern security to NTP:

Key Establishment (NTS-KE): Client and server establish shared keys using TLS 1.3
NTP Extension Fields: Subsequent NTP packets include authentication and encryption
No additional round trips: After setup, authenticated NTP is same latency as regular NTP

Supported by:

Chrony 4.0+
ntpd-rs
Cloudflare, Netnod, and other providers

The Bottom Line on NTP Security:

For most environments: Use multiple redundant servers, restrict access, keep software updated. This handles casual attacks.
For high-security environments: Use NTS where possible, run local GPS-disciplined stratum 1, monitor aggressively, consider out-of-band verification.
For critical infrastructure: Consider dedicated time distribution networks (GPS, PTP with authentication), cryptographic time-stamping services, and defense-in-depth.

The Leap-Second Attack

An attacker can announce a false leap second, causing vulnerable systems to insert or delete a second at the scheduled leap event. This has been used in targeted attacks. Ensure your NTP software handles leap seconds sanely, and consider leap smearing for internet-facing applications.

NTP Alternatives and Complements

NTP isn't the only time synchronization protocol. Depending on your accuracy requirements and environment, alternatives may be more appropriate.

PTP (Precision Time Protocol, IEEE 1588):

Designed for high-precision synchronization in local networks:

Accuracy: Sub-microsecond to nanosecond (vs. millisecond for NTP)
Mechanism: Hardware timestamping in NICs and switches
Best-for: Financial trading, telecom, industrial control, 5G networks
Requires: PTP-capable network hardware

Comparison:

Feature	NTP	PTP
Typical accuracy	1-10 ms	10 ns - 1 μs
Hardware support	None required	NICs, switches
Network scope	Internet-wide	Usually LAN
Configuration	Simple	Complex
Cost	Free	Hardware investment

GPS/GNSS Directly:

For highest accuracy without network dependency:

GPS PPS (pulse-per-second) provides ±10-50 nanosecond accuracy
Requires antenna with sky view
Used as stratum 0 reference for NTP servers
Modern GPS cards can discipline internal oscillators

Roughtime (Google):

Designed for bootstrapping secure time:

Cryptographically signed timestamps
Clients can prove they received a specific time from a server
Used during initial boot when local clock is untrusted
Not designed for ongoing high-accuracy sync

Choosing the Right Time Synchronization
Requirement	Solution	Notes
General server/desktop	NTP to public pools	Default on most OSes
Enterprise internal	Local stratum 2 + NTP	Better reliability, control
Cloud workloads	Provider's time service	Lowest latency within cloud
Database cluster	NTP + tight monitoring	Alert on >10ms skew
Financial trading	PTP + GPS backup	Regulatory requirements (MiFID II)
Telecom/5G	PTP + GNSS	Sub-microsecond for packet timing
Scientific instruments	GPS disciplined oscillator	Traceable to UTC
IoT/embedded	SNTP or simplified NTP	Lower resource usage

Hybrid Approaches:

1. NTP + PTP:

Use PTP within data centers for tight synchronization, NTP to external references for UTC alignment. Stratum 1 servers can use GPS+PTP to distribute sub-microsecond time to local servers.

2. NTP + GPS:

GPS receiver provides stratum 0 reference. Local NTP server distributes to network. Maintains operation during GPS outages using drift file.

3. Multiple Protocols:

Critical systems might use:

GPS as primary reference
PTP for data center distribution
NTP for edge/less critical systems
Cross-check between systems to detect anomalies

When to Upgrade from NTP:

Application requires sub-millisecond accuracy
Regulatory requirements mandate tighter sync
You're debugging timing-related issues where NTP's ~1ms isn't sufficient
Network supports PTP (industrial, telecom environments)

For most general-purpose computing—web servers, databases, enterprise applications—NTP remains appropriate and sufficient.

Cost-Benefit Analysis

Moving from NTP to PTP requires significant investment: PTP-capable NICs ($50-500 each), PTP-aware switches ($1000s-$10000s), GPS grandmaster clock ($5000+), and expertise. Only pursue if your application genuinely needs sub-millisecond accuracy and you can justify the investment.

Summary: NTP in Practice

NTP is the foundation of time synchronization across the internet—a remarkable achievement of decades of engineering that most users never notice. Let's consolidate the key insights:

Key Takeaways

•Stratum hierarchy structures time distribution — Atomic clocks at stratum 0, propagating through stratum 1-15. Each hop adds latency and potential error.
•Four timestamps enable offset calculation — Client and server exchange T1, T2, T3, T4 to compute offset and bound error based on round-trip delay.
•Clock discipline smooths corrections — PLL-based algorithm trades response speed for stability. Slewing prevents time jumps; stepping handles large offsets.
•Use 4+ diverse servers — Enables detection and rejection of faulty/malicious servers. Selection algorithm finds majority-agreeing 'truechimers.'
•Expect ~1-10ms accuracy — Within data centers, often better. Across internet, worse. PTP needed for sub-millisecond requirements.
•Security matters — NTP attacks can affect authentication, logging, certificates. Use NTS where available; run your own stratum 2 for critical systems.
•Monitor continuously — Track offset, jitter, reachability. Alert on anomalies. Drift file persistence improves stability after restarts.

Practical Recommendations:

For development machines: Default OS configuration is usually fine. Verify NTP is running.
For production servers: Use cloud provider's time service plus diverse external sources. Monitor offset. Consider local stratum 2.
For distributed databases: Ensure all nodes use identical time source configuration. Monitor for skew exceeding your consistency bounds.
For security-sensitive systems: Enable NTS, run local GPS-disciplined server, cross-check with multiple sources.

What's Next:

The final page of this module covers Clock Drift, diving deeper into the physical causes of clock instability, measurement techniques, and compensation strategies. Understanding drift completes the picture of time in distributed systems.

Page Complete

You now understand NTP in depth—from stratum architecture to clock discipline algorithms, from security considerations to practical configuration. This knowledge enables you to properly configure, monitor, and troubleshoot time synchronization in production distributed systems. Next, we'll explore the physics and measurement of clock drift.

4 / 5

Loading learning content...

Operating SystemsDistributed Clocks

Distributed Clocks and Time Synchronization

LevelAdvanced

Duration90 mins

TopicDistributed Clocks

4 / 5

NTP (Network Time Protocol)

The Internet's Timekeeping Infrastructure

This page explores NTP in depth: its architecture, algorithms, accuracy bounds, failure modes, and best practices for production systems.

What You Will Learn

NTP Architecture and Stratum Hierarchy

NTP organizes time sources into a hierarchical structure called the stratum hierarchy. This design provides scalability, redundancy, and graceful degradation.

Stratum Levels:

Stratum 0: Reference clocks—the authoritative time sources. These are devices like atomic clocks, GPS receivers, or radio receivers tuned to time standards (WWV, DCF77). Stratum 0 devices are not directly networked; they connect to stratum 1 servers.
Stratum 1: Primary time servers directly connected to stratum 0 references. These are the root of the NTP hierarchy on the network. Examples include servers at national time institutes (NIST, PTB) and major internet companies.
Stratum 2: Servers synchronized to stratum 1. These often serve as the primary time source for organizations, reducing load on public stratum 1 servers.
Stratum 3-15: Each level synchronized to the level above. Stratum increases by 1 for each hop from stratum 1.
Stratum 16: Unsynchronized. A server advertising stratum 16 is saying "don't trust my time."

Why Stratum Hierarchy?

Scalability: Billions of devices can't all query a handful of atomic clocks. The hierarchy distributes load.
Redundancy: Each client can have multiple servers. If one fails or becomes inaccurate, others remain.
Locality: Organizations run local stratum 2-3 servers, reducing network latency and external dependencies.

NTP Stratum Levels in Practice
Stratum	Typical Examples	Accuracy to UTC	Use Case
0	Cesium clock, GPS receiver, WWVB receiver	±1 nanosecond to ±1 microsecond	Reference standard
1	time.nist.gov, ptbtime1.ptb.de	±10 microseconds	National labs, major providers
2	pool.ntp.org servers, Google time servers	±1-10 milliseconds	Internet time infrastructure
3	Corporate NTP servers	±10-100 milliseconds	Enterprise networks
4-5	Desktop computers, mobile devices	±100-500 milliseconds	End-user devices

NTP Pool Project:

The pool provides zone-specific addresses:

pool.ntp.org - Global
north-america.pool.ntp.org - Regional
us.pool.ntp.org - Country-specific

This geo-distribution reduces latency and improves synchronization quality.

Reference Clock Drivers:

Stratum 1 servers need drivers for their reference clocks. Common reference clock types:

GPS: PPS (pulse-per-second) signal provides ±1 microsecond accuracy. Requires antenna with sky view.
CDMA/LTE: Cellular network timing (if available) provides ~1 millisecond accuracy.
Radio: DCF77 (Germany), WWVB (USA), MSF (UK) provide long-wave time signals.
Atomic clocks: Rubidium or cesium standards for highest accuracy without external dependency.

Stratum Is Distance, Not Quality

NTP Protocol Mechanics

NTP operates over UDP port 123. The protocol exchanges timestamps between client and server, using the round-trip to estimate and compensate for network delay.

NTP Packet Format (Simplified):

+-----+-----+--------+--------+------+------+-------+-------+
| LI  | VN  | Mode   | Strat  | Poll | Prec | Root  | Root  |
| (2) | (3) | (3)    | (8)    | (8)  | (8)  | Delay | Disp  |
+-----+-----+--------+--------+------+------+-------+-------+
|              Reference Identifier (32)                     |
+------------------------------------------------------------+
|              Reference Timestamp (64)                       |
+------------------------------------------------------------+
|              Origin Timestamp (64)                          |
+------------------------------------------------------------+
|              Receive Timestamp (64)                         |
+------------------------------------------------------------+
|              Transmit Timestamp (64)                        |
+------------------------------------------------------------+

Key Fields:

LI (Leap Indicator): 2 bits indicating leap second status
VN (Version Number): NTP version (currently 4)
Mode: Client (3), Server (4), Broadcast (5), etc.
Stratum: Server's stratum level
Poll: Log₂ of polling interval in seconds (e.g., 10 = 2¹⁰ = 1024 seconds)
Precision: Log₂ of clock precision (e.g., -20 ≈ 1 microsecond)
Root Delay: Total round-trip delay to stratum 1
Root Dispersion: Total dispersion (error) to stratum 1
Timestamps: Four 64-bit timestamps for RTT calculation

ntp-algorithm.pseudo

Pseudocode

// NTP Timestamp Exchange and Offset Calculation
 
// NTP uses four timestamps:
// T1: Client transmit time (origin timestamp)
// T2: Server receive time  
// T3: Server transmit time
// T4: Client receive time
 
CLIENT ALGORITHM:
    FUNCTION synchronize_with(server):
        // Record local time when sending request
        T1 = local_clock()
        
        // Send NTP request (mode=3, client)
        request = create_ntp_packet(mode=CLIENT, transmit_timestamp=T1)
        send(request, server)
        
        // Receive response
        response, T4 = receive()  // T4 is local time at receipt
        
        // Extract server timestamps
        T2 = response.receive_timestamp    // When server received our request
        T3 = response.transmit_timestamp   // When server sent response
        
        // Calculate round-trip delay
        // Total time = (T4 - T1), but server processing took (T3 - T2)
        delay = (T4 - T1) - (T3 - T2)
        
        // Calculate clock offset
        // Server was at T2 when we were at T1, and at T3 when we were at T4
        // Assuming symmetric network delay: each direction took delay/2
        // Offset = server_time - client_time
        offset = ((T2 - T1) + (T3 - T4)) / 2
        
        RETURN (offset, delay)
 
// Why this formula works:
// Let θ = true offset (server ahead by θ)
// Let δ₁ = client→server delay, δ₂ = server→client delay
// 
// T2 = T1 + δ₁ + θ  (server receive = client send + delay + offset)
// T4 = T3 + δ₂ - θ  (client receive = server send + delay - offset)
//
// From first equation:  T2 - T1 = δ₁ + θ
// From second equation: T4 - T3 = δ₂ - θ
//
// Adding: (T2-T1) + (T4-T3) = δ₁ + δ₂ = total network delay
// But (T4-T3) = -(T3-T4), so:
// (T2-T1) - (T3-T4) = δ₁ + δ₂
//
// Subtracting: (T2-T1) - (T4-T3) = δ₁ - δ₂ + 2θ
// If δ₁ = δ₂ (symmetric delay): (T2-T1) - (T4-T3) = 2θ
// So θ = ((T2-T1) + (T3-T4)) / 2
 
SERVER ALGORITHM:
    ON_RECEIVE request from client:
        T2 = local_clock()  // Record receive time
        
        // Process request, prepare response
        response = create_ntp_packet(
            mode = SERVER,
            stratum = my_stratum,
            origin_timestamp = request.transmit_timestamp,  // T1 from client
            receive_timestamp = T2,
            transmit_timestamp = local_clock()  // T3, as late as possible
        )
        
        send(response, client)

Error Bounds:

The offset calculation assumes symmetric network delay. If the delays are asymmetric (δ₁ ≠ δ₂), the error in our offset estimate is:

Error = (δ₁ - δ₂) / 2

This error is bounded by:

|Error| ≤ delay / 2

Where delay is the measured round-trip time minus server processing time. Lower delay means tighter error bounds. This is why NTP prefers servers with lower round-trip times.

NTP Modes:

Client/Server: Most common. Client queries server, server responds.
Symmetric Active/Passive: Peers synchronize bidirectionally. Used between stratum 1-2 servers.
Broadcast/Multicast: Server periodically broadcasts time. Lower accuracy but reduces server load.
Manycast: Client discovers servers via multicast; transitions to unicast client/server mode.

Polling Interval:

NTP adapts polling interval based on synchronization quality:

Starts around 64 seconds (poll = 6)
Can extend to 1024+ seconds when stable
Shortens during initial sync or instability

Adaptive polling balances accuracy (frequent polling) against network/server load (infrequent polling).

Timestamp Precision

Clock Discipline Algorithm

The Challenge:

Each offset measurement (θ) has error from:

Network delay uncertainty (asymmetry)
Server clock error
Local clock read precision
OS scheduling jitter

We need to extract the true offset from noisy samples while also tracking clock frequency drift.

NTP's Approach: A Phase-Locked Loop (PLL)

NTP treats clock synchronization as a control system problem. The local clock is a noisy oscillator that must track a reference (the NTP servers). A phase-locked loop adjusts both:

Phase: The current time offset (where the clock is)
Frequency: The clock rate (how fast the clock ticks)

The PLL has a time constant that trades off response speed against noise rejection:

Short time constant: Quick response to real changes, but susceptible to noise
Long time constant: Stable, but slow to correct actual drift

clock-discipline.pseudo

Pseudocode

// Simplified NTP Clock Discipline Algorithm
 
// State variables
offset_estimate: float = 0.0      // Estimated offset (phase)
frequency_estimate: float = 0.0   // Estimated frequency error (ppm)
poll_interval: int = 64           // Current polling interval (seconds)
 
// PLL parameters (simplified)
TIME_CONSTANT: float = 4.0        // Larger = more stable, slower response
FREQUENCY_GAIN: float = 0.25      // How quickly to adjust frequency
 
FUNCTION on_measurement(offset, delay):
    // Filter: ignore samples with very high delay (likely asymmetric)
    IF delay > MAX_ACCEPTABLE_DELAY:
        RETURN  // Discard this sample
    
    // Update offset estimate (phase correction)
    // Apply a fraction of the error based on time constant
    phase_adjustment = offset / (TIME_CONSTANT * poll_interval)
    
    // Update frequency estimate
    // If we consistently see positive offset, our clock is slow
    frequency_adjustment = offset * FREQUENCY_GAIN / (poll_interval ^ 2)
    frequency_estimate += frequency_adjustment
    
    // Apply adjustments
    slew_clock(phase_adjustment)
    adjust_clock_frequency(frequency_estimate)
    
    // Adapt polling interval
    IF |offset| < STABLE_THRESHOLD:
        poll_interval = min(poll_interval * 2, MAX_POLL)
    ELIF |offset| > UNSTABLE_THRESHOLD:
        poll_interval = max(poll_interval / 2, MIN_POLL)
 
FUNCTION slew_clock(adjustment):
    // Gradually adjust clock rather than jumping
    // Typical slew rate: 500 ppm (0.5ms per second)
    IF adjustment > 0:
        // Clock is slow, speed it up temporarily
        temporarily_adjust_rate(+SLEW_RATE)
        duration = adjustment / SLEW_RATE
    ELSE:
        // Clock is fast, slow it down
        temporarily_adjust_rate(-SLEW_RATE)
        duration = -adjustment / SLEW_RATE
    
    SCHEDULE restore_normal_rate(after=duration)
 
// For large initial offsets, NTP may "step" the clock
FUNCTION handle_large_offset(offset):
    IF |offset| > STEP_THRESHOLD (typically 128ms):
        IF system_just_started:
            // Step is acceptable during startup
            instant_adjust_clock(offset)
        ELSE:
            // Large offset during normal operation is suspicious
            // Might step, might panic, depends on configuration
            IF |offset| > PANIC_THRESHOLD (typically 1000s):
                ABORT("Clock offset too large, manual intervention needed")
            ELSE:
                instant_adjust_clock(offset)  // Or refuse and log

Stepping vs. Slewing:

Stepping: Instantly change the clock time. Fast, but can cause:
- Negative time jumps (time going backward)
- Log entries appearing out of order
- Timeouts triggering unexpectedly
- Database anomalies
Slewing: Gradually adjust the clock rate. The clock runs slightly fast or slow until it catches up. Safer, but:
- Can take minutes or hours to correct large offsets
- Slew rate is limited (typically 500 ppm = 43 seconds/day max adjustment)

NTP's Behavior:

Offsets < 128ms: Slew to correct
Offsets 128ms - 1000s: May step, depending on configuration
Offsets > 1000s: Panic—something is seriously wrong; manual intervention needed

Clock Filter Algorithm:

NTP doesn't use just the latest measurement. It maintains a buffer of the last 8 samples and selects the best one based on:

Delay: Prefer samples with lower round-trip delay (tighter error bound)
Age: Weight recent samples more heavily
Dispersion: Account for increasing error as samples age (due to clock drift)

This filtering rejects outliers while maintaining a stable offset estimate.

The -x Flag

Server Selection and Mitigation

When configured with multiple time servers, NTP must select which servers to trust and how to combine their inputs. This is handled by the selection and clustering algorithms.

The Problem:

Given 5 servers with offsets [-10ms, +5ms, +8ms, +50ms, -200ms], which do we believe?

Servers 1-3 roughly agree
Server 4 is somewhat off
Server 5 is way off (faulty? compromised? network issues?)

NTP needs to identify and exclude the outliers, then combine the trustworthy servers.

Selection Algorithm (Intersection Algorithm):

For each server, calculate a confidence interval based on its precision and network delay
Find the largest intersection containing more than half the servers
Servers outside this intersection are "falsetickers" (incorrect) and excluded
Servers inside are "truechimers" (correct)

This algorithm is Byzantine-tolerant: with n servers, it can tolerate up to (n-1)/2 faulty servers.

Clustering Algorithm:

Among truechimers, select the best servers based on:

Stratum: Lower is better
Jitter: More stable is better
Distance: Combination of delay and dispersion (syncdist = delay/2 + dispersion)

Best Practices: Server Selection

•Use at least 4 servers — Allows identifying 1 faulty server. 3 servers can detect fault but not identify.
•Prefer 5-7 servers — Good balance of redundancy and overhead. More servers don't improve accuracy much.
•Diverse sources — Use different providers, locations, and network paths to avoid correlated failures.
•Low-latency servers — Geographic proximity reduces delay and improves accuracy bounds.
•Consider local stratum 2 — Running your own server reduces external dependencies, improves within-network sync.

Server Selection Anti-Patterns

•Single server — No way to detect if it's wrong. Any failure means losing sync.
•Two servers — If they disagree, which is right? NTP can't decide.
•Three identical virtual servers — If the hypervisor's time is wrong, all three give wrong time.
•All from one provider — Provider outage = total sync loss. Use at least 2 providers.
•Ignoring stratum — All pool servers at stratum 2 is fine; mixing with stratum 5 adds unnecessary hierarchy.

The Combine Algorithm:

After selecting trusted servers, NTP combines their inputs. It doesn't simply average—it weights by quality:

combined_offset = Σ(weight_i × offset_i) / Σ(weight_i)

Where weight is inversely proportional to the selection distance (better servers have more weight).

Prefer and True Options:

NTP configuration allows marking servers:

prefer: This server should be chosen first if it's a truechimer. Useful for designating a particularly trusted local server.
true: Assume this server is always correct; never declare it a falseticker. Dangerous—only use for authoritative reference clocks.

The Orphan Mode:

What happens when all external time sources become unreachable? Orphan mode allows a group of servers to maintain internal synchronization:

Servers switch to orphan mode when all external references are lost
They synchronize among themselves using their own (drifting) clocks
The server with the lowest orphan stratum becomes the de facto reference
When external references return, normal operation resumes

This maintains internal consistency even during network partitions.

Leap Smearing

NTP Implementations and Configuration

Several NTP implementations exist, each with different tradeoffs:

ntpd (Reference Implementation):

The original NTP daemon, maintained by NTP.org. Most feature-complete but largest codebase.

# /etc/ntp.conf
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst
server 2.pool.ntp.org iburst
server 3.pool.ntp.org iburst

driftfile /var/lib/ntp/ntp.drift
restrict default kod nomodify notrap nopeer noquery

chronyd (Chrony):

Modern implementation, popular on Linux (default on RHEL/CentOS, Fedora). Better handles intermittent connectivity and virtualized environments.

# /etc/chrony.conf
pool pool.ntp.org iburst

driftfile /var/lib/chrony/drift
makestep 1.0 3  # Step if offset > 1s, but only for first 3 updates
rtcsync

systemd-timesyncd:

Simple SNTP client, not a full NTP implementation. Suitable for client-only machines.

OpenNTPD:

BSD-focused, security-oriented, simpler than ntpd.

ntp-monitoring.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#!/bin/bash
# NTP Monitoring and Diagnostics
 
#---------------------------------------
# Check ntpd synchronization status
#---------------------------------------
echo "=== ntpq -p (peer status) ==="
ntpq -p
# Output columns:
#   remote: server name/IP (* = current sync source, + = candidate)
#   refid: server's reference (e.g., .GPS., another IP)
#   st: stratum
#   t: type (u=unicast, b=broadcast, l=local)
#   when: seconds since last response
#   poll: current poll interval
#   reach: reachability (octal, 377 = last 8 polls succeeded)
#   delay: round-trip delay (ms)
#   offset: clock offset (ms)
#   jitter: dispersion (ms)
 
echo ""
echo "=== ntpq -c rv (system status) ==="
ntpq -c rv
# Shows: stratum, precision, rootdelay, rootdisp, offset, sys_jitter
 
#---------------------------------------
# Check chrony synchronization status
#---------------------------------------
echo ""
echo "=== chronyc tracking ==="
chronyc tracking
# Shows: reference ID, stratum, update interval, offset, frequency error
 
echo ""
echo "=== chronyc sources -v ==="
chronyc sources -v
# Shows: mode, state, name, stratum, poll, reach, last sample offset
 
#---------------------------------------
# Check current time offset from NTP server
#---------------------------------------
echo ""
echo "=== ntpdate -q (query without setting) ==="
ntpdate -q pool.ntp.org 2>/dev/null ||     sntp pool.ntp.org  # Fall back to sntp if ntpdate not available
 
#---------------------------------------
# Check drift file (clock frequency error)
#---------------------------------------
echo ""
echo "=== Drift file (frequency error in ppm) ==="
cat /var/lib/ntp/ntp.drift 2>/dev/null ||     cat /var/lib/chrony/drift 2>/dev/null ||     echo "No drift file found"
 
#---------------------------------------
# Monitor offset over time
#---------------------------------------
echo ""
echo "=== Monitoring offset (Ctrl+C to stop) ==="
while true; do
    offset=$(chronyc tracking 2>/dev/null | grep "System time" | awk '{print $4}')
    if [ -z "$offset" ]; then
        offset=$(ntpq -c rv 2>/dev/null | grep -o 'offset=[0-9.-]*' | cut -d= -f2)
    fi
    echo "$(date '+%H:%M:%S') offset: ${offset:- unknown}"
    sleep 10
done

Key Configuration Options:

Option	ntpd	chrony	Purpose
`iburst`	server ... iburst	(default)	Send 8 rapid requests at startup
`minpoll / maxpoll`	minpoll 6 maxpoll 10	minpoll 6 maxpoll 10	Control polling interval (2^n seconds)
`prefer`	server ... prefer	(not needed)	Mark preferred server
`step threshold`	tinker step 0.1	makestep 0.1 3	When to step vs. slew
`drift file`	driftfile /path	driftfile /path	Persist frequency correction
`restrict`	restrict default nomodify	(cmdallow/cmdport)	Access control

Cloud Environment Considerations:

Virtualized environments introduce additional challenges:

Timer virtualization: VM clocks may jump during migration or pause.
Hypervisor time sync: VMware Tools, Hyper-V time sync can conflict with NTP.
CPU stealing: Other VMs consuming CPU causes clock error.

Cloud Provider Solutions:

AWS: Uses Amazon Time Sync Service (169.254.169.123), provides ±1ms within region
GCP: Google's time servers with leap smearing
Azure: Uses Windows Time Service or chrony with Azure-specific recommendations

Recommendations for Cloud:

Disable hypervisor time sync (or set it to one-way only)
Use provider's internal NTP servers (lower latency)
Consider chrony instead of ntpd (better handles interruptions)
Configure aggressive initial sync for instances that start frequently

Quick Cloud NTP Setup

Security Considerations

NTP security is surprisingly important. Many systems depend on accurate time for security functions:

Certificate validation: Certificates have validity periods
Kerberos authentication: Strict time requirements (typically ±5 minutes)
Log forensics: Accurate timestamps for incident investigation
Replay attack prevention: Nonces and timestamps prevent replaying old messages

Attack Vectors:

1. Off-path attacks: Attacker cannot see NTP traffic but tries to:

Spoof UDP source addresses to inject false time
DoS the NTP servers to force fallback to less accurate sources

2. On-path (MITM) attacks: Attacker can intercept and modify NTP traffic:

Modify timestamps to skew target clock
Delay packets to create asymmetric delay (hard to detect)
Block responses from accurate servers

3. Denial of Service:

Amplification: NTP monlist command returned huge responses (fixed in modern versions)
Reflection: Attacker spoofs victim's IP, NTP server responds to victim
Resource exhaustion: Flood target's NTP infrastructure

NTP Security Mitigations

•Multiple diverse sources — Attacker must compromise majority to skew time. Use servers from different organizations and network paths.
•NTS (Network Time Security) — NTP 4.1+ supports authenticated, encrypted NTP using TLS-derived keys. Prevents tampering and impersonation.
•Restrict access — Limit who can query or configure your NTP servers using restrict directives.
•Disable dangerous commands — Ensure monlist, readvar, etc. are disabled to prevent amplification attacks.
•Rate limiting — Limit response rate via kod (Kiss of Death) and limited restrictions.
•Monitor for anomalies — Alert on sudden large offset changes, stratum changes, or server instability.
•Local stratum 1 or 2 — Running GPS-disciplined local time server reduces external attack surface.

ntp-secure.conf

Config

# Secure NTP Configuration(/etc/ntp.conf or / etc / chrony.conf)
 
#---------------------------------------
# ntpd example
#---------------------------------------
# Diverse server sources
server time1.google.com iburst
server time2.google.com iburst
server 0.pool.ntp.org iburst
server time.cloudflare.com iburst
 
# Drift file for frequency stability
driftfile /var/lib/ntp / ntp.drift
 
# Security restrictions
# Default: ignore all requests
restrict default kod limited nomodify notrap nopeer noquery
 
# Allow localhost full access
restrict 127.0.0.1
restrict :: 1
 
# Allow queries from local network(adjust as needed)
restrict 192.168.0.0 mask 255.255.0.0 nomodify notrap
 
# Disable dangerous commands
disable monitor
 
# Log all clock sync events
logconfig =syncstatus + sysevents
 
#---------------------------------------
# chrony example(more secure defaults)
#---------------------------------------
# NTS - enabled(authenticated) servers
server time.cloudflare.com iburst nts
server nts.sth1.ntp.se iburst nts
server nts.ntp.se iburst nts
 
# Fallback to regular NTP
pool pool.ntp.org iburst
 
# Directory for NTS cookies
ntsdumpdir /var/lib/chrony
 
# Allow NTP client access from local network only
allow 192.168.0.0 / 16
deny all
 
# Only allow chronyc from localhost
bindcmdaddress 127.0.0.1
    bindcmdaddress:: 1
 
# Log significant changes
log measurements statistics tracking

Network Time Security (NTS):

NTS, standardized in RFC 8915 (2020), brings modern security to NTP:

Key Establishment (NTS-KE): Client and server establish shared keys using TLS 1.3
NTP Extension Fields: Subsequent NTP packets include authentication and encryption
No additional round trips: After setup, authenticated NTP is same latency as regular NTP

Supported by:

Chrony 4.0+
ntpd-rs
Cloudflare, Netnod, and other providers

The Bottom Line on NTP Security:

For most environments: Use multiple redundant servers, restrict access, keep software updated. This handles casual attacks.
For high-security environments: Use NTS where possible, run local GPS-disciplined stratum 1, monitor aggressively, consider out-of-band verification.
For critical infrastructure: Consider dedicated time distribution networks (GPS, PTP with authentication), cryptographic time-stamping services, and defense-in-depth.

The Leap-Second Attack

NTP Alternatives and Complements

NTP isn't the only time synchronization protocol. Depending on your accuracy requirements and environment, alternatives may be more appropriate.

PTP (Precision Time Protocol, IEEE 1588):

Designed for high-precision synchronization in local networks:

Accuracy: Sub-microsecond to nanosecond (vs. millisecond for NTP)
Mechanism: Hardware timestamping in NICs and switches
Best-for: Financial trading, telecom, industrial control, 5G networks
Requires: PTP-capable network hardware

Comparison:

Feature	NTP	PTP
Typical accuracy	1-10 ms	10 ns - 1 μs
Hardware support	None required	NICs, switches
Network scope	Internet-wide	Usually LAN
Configuration	Simple	Complex
Cost	Free	Hardware investment

GPS/GNSS Directly:

For highest accuracy without network dependency:

GPS PPS (pulse-per-second) provides ±10-50 nanosecond accuracy
Requires antenna with sky view
Used as stratum 0 reference for NTP servers
Modern GPS cards can discipline internal oscillators

Roughtime (Google):

Designed for bootstrapping secure time:

Cryptographically signed timestamps
Clients can prove they received a specific time from a server
Used during initial boot when local clock is untrusted
Not designed for ongoing high-accuracy sync

Choosing the Right Time Synchronization
Requirement	Solution	Notes
General server/desktop	NTP to public pools	Default on most OSes
Enterprise internal	Local stratum 2 + NTP	Better reliability, control
Cloud workloads	Provider's time service	Lowest latency within cloud
Database cluster	NTP + tight monitoring	Alert on >10ms skew
Financial trading	PTP + GPS backup	Regulatory requirements (MiFID II)
Telecom/5G	PTP + GNSS	Sub-microsecond for packet timing
Scientific instruments	GPS disciplined oscillator	Traceable to UTC
IoT/embedded	SNTP or simplified NTP	Lower resource usage

Hybrid Approaches:

1. NTP + PTP:

Use PTP within data centers for tight synchronization, NTP to external references for UTC alignment. Stratum 1 servers can use GPS+PTP to distribute sub-microsecond time to local servers.

2. NTP + GPS:

GPS receiver provides stratum 0 reference. Local NTP server distributes to network. Maintains operation during GPS outages using drift file.

3. Multiple Protocols:

Critical systems might use:

GPS as primary reference
PTP for data center distribution
NTP for edge/less critical systems
Cross-check between systems to detect anomalies

When to Upgrade from NTP:

Application requires sub-millisecond accuracy
Regulatory requirements mandate tighter sync
You're debugging timing-related issues where NTP's ~1ms isn't sufficient
Network supports PTP (industrial, telecom environments)

For most general-purpose computing—web servers, databases, enterprise applications—NTP remains appropriate and sufficient.

Cost-Benefit Analysis

Summary: NTP in Practice

NTP is the foundation of time synchronization across the internet—a remarkable achievement of decades of engineering that most users never notice. Let's consolidate the key insights:

Key Takeaways

•Stratum hierarchy structures time distribution — Atomic clocks at stratum 0, propagating through stratum 1-15. Each hop adds latency and potential error.
•Four timestamps enable offset calculation — Client and server exchange T1, T2, T3, T4 to compute offset and bound error based on round-trip delay.
•Clock discipline smooths corrections — PLL-based algorithm trades response speed for stability. Slewing prevents time jumps; stepping handles large offsets.
•Use 4+ diverse servers — Enables detection and rejection of faulty/malicious servers. Selection algorithm finds majority-agreeing 'truechimers.'
•Expect ~1-10ms accuracy — Within data centers, often better. Across internet, worse. PTP needed for sub-millisecond requirements.
•Security matters — NTP attacks can affect authentication, logging, certificates. Use NTS where available; run your own stratum 2 for critical systems.
•Monitor continuously — Track offset, jitter, reachability. Alert on anomalies. Drift file persistence improves stability after restarts.

Practical Recommendations:

For development machines: Default OS configuration is usually fine. Verify NTP is running.
For production servers: Use cloud provider's time service plus diverse external sources. Monitor offset. Consider local stratum 2.
For distributed databases: Ensure all nodes use identical time source configuration. Monitor for skew exceeding your consistency bounds.
For security-sensitive systems: Enable NTS, run local GPS-disciplined server, cross-check with multiple sources.

What's Next:

Page Complete

4 / 5