Dns Resolution - Learning Module

Loading content...

0/228

Resolution Process

The Complete Journey

When you click a link or type a URL, you initiate a sophisticated orchestration involving your operating system, local caches, remote resolvers, and potentially dozens of authoritative servers. This entire process typically completes in under 100 milliseconds—often under 10 milliseconds—yet involves multiple layers of caching, protocol exchanges, and fallback mechanisms.

In this page, we synthesize our understanding of recursive and iterative resolution into a complete end-to-end picture of the DNS resolution process. We'll trace every step from user action to IP address delivery, examining what happens at each layer and how optimizations make the system remarkably fast.

What You Will Learn

By the end of this page, you will understand: (1) The complete lifecycle of a DNS query from application to answer, (2) Every cache layer and when each is consulted, (3) How modern optimizations like prefetching and connection reuse work, (4) The timing characteristics of DNS resolution, and (5) How to trace and analyze the resolution path in practice.

The Resolution Stack

DNS resolution involves multiple layers, each with its own caching and processing logic. Understanding this stack is essential for efficient troubleshooting and optimization.

The Resolution Layer Stack:

DNS Resolution Layer Stack
Layer	Component	Cache Type	Typical TTL	Query Trigger
L1: Application	Browser DNS cache	Per-process	1-60 minutes	User navigates to URL
L2: OS	Operating system resolver cache	System-wide	Respects DNS TTL	Application calls gethostbyname()
L3: Local Network	Router/home DNS cache	Network-wide	Usually respects TTL	OS queries configured resolver
L4: Recursive Resolver	ISP/Public resolver cache	Resolver-wide	Respects DNS TTL	Local query misses
L5: Authoritative Server	Zone data (not a cache)	N/A - Source of truth	Defines TTL	Resolver queries authority

Cache Hierarchy Behavior:

Queries flow downward through this stack until a cache hit occurs. Responses flow upward, populating each cache along the way:

┌─────────────────────────────────────────────────────────────────┐
│  User clicks: https://www.example.com/page                      │
└───────────────────────────┬─────────────────────────────────────┘
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│  Browser Cache: Is www.example.com cached?                      │
│    ├─ HIT → Return immediately (0ms)                            │
│    └─ MISS → Query OS resolver                                  │
└───────────────────────────┬─────────────────────────────────────┘
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│  OS Cache: Is www.example.com in system cache?                  │
│    ├─ HIT → Return to browser (<1ms)                            │
│    └─ MISS → Query configured DNS server                        │
└───────────────────────────┬─────────────────────────────────────┘
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│  Recursive Resolver: Is www.example.com cached?                 │
│    ├─ HIT → Return to client (1-20ms network)                   │
│    └─ MISS → Begin iterative resolution (50-200ms)              │
└─────────────────────────────────────────────────────────────────┘

Cache Hit Ratios in Practice

In production systems: Browser caches achieve 30-50% hit rates. OS caches add another 20-30%. Recursive resolver caches achieve 80-95% hit rates for popular domains. This means most DNS queries are answered from cache, with only 5-10% requiring full iterative resolution.

Complete Resolution Walkthrough

Let's trace a complete DNS query for www.example.com assuming all caches are empty—the worst-case (but most instructive) scenario.

Phase 1: Application Request

The user types www.example.com in the browser address bar and presses Enter:

Browser parses the URL, extracts hostname www.example.com
Browser checks its internal DNS cache (chrome://net-internals/#dns in Chrome)
Cache miss: Browser issues a system call to resolve the name

System API Call (Linux/macOS)
1
2
3
4
5
6
7
8
9
10
// The browser essentially calls this system function
#include <netdb.h>
 
struct hostent *host = gethostbyname("www.example.com");
// Or the modern alternative:
struct addrinfo *result;
int status = getaddrinfo("www.example.com", "https", NULL, &result);
 
// This call blocks until resolution completes or times out
// The OS resolver handles all DNS protocol details

Phase 2: Stub Resolver Processing

The operating system's stub resolver handles the application's request:

Check /etc/hosts (or Windows hosts file) for static mappings
Check the system DNS cache (systemd-resolved, nscd, or Windows DNS Client)
Cache miss: Prepare DNS query packet

DNS Query Packet Contents:

Header:
  ID: 0x4A2F (random 16-bit identifier for matching responses)
  Flags: QR=0 (query), RD=1 (recursion desired), OPCODE=0 (standard query)
  Questions: 1
  Answers: 0, Authority: 0, Additional: 0

Question Section:
  QNAME: www.example.com
  QTYPE: A (or AAAA for IPv6)
  QCLASS: IN (Internet)

Look up configured DNS server (from DHCP, static config, or systemd-resolved)
Send UDP datagram to configured resolver (usually port 53)

Converting Mermaid diagram...

Phase 3: Recursive Resolver Processing

The recursive resolver receives the query and processes it:

Parse query, validate format, extract QNAME
Check cache for exact match www.example.com A
Check cache for cached NS records that shortcut resolution:
- Do we have NS records for example.com? → Skip to authoritative
- Do we have NS records for .com? → Skip root
- Neither? → Start from root hints
Begin iteration through DNS hierarchy
Cache all responses with their TTLs
Return final answer to stub resolver

Phase 4: Response Delivery

The answer propagates back up the stack:

Recursive resolver sends UDP response to stub resolver
Stub resolver populates OS cache
gethostbyname() returns to application
Browser caches the result
Browser initiates TCP connection to the IP address

Resolution Timing Analysis

Understanding DNS timing is crucial for performance optimization. Let's break down where time is spent:

Component Timing Breakdown:

DNS Resolution Timing Components (Cold Cache)
Step	Component	Typical Latency	Variables
1	Application to stub resolver	<0.1ms	Local function call
2	Stub resolver to recursive resolver	1-50ms	Network distance, protocol (UDP vs DoH)
3	Resolver to root server	10-30ms	Anycast location, network path
4	Resolver to TLD server	10-50ms	TLD server distribution
5	Resolver to authoritative server	10-150ms	Server location, load
6	Response propagation back	Same as forward path	—
—	Total (cold cache)	50-300ms	Highly variable
—	Total (hot cache at resolver)	1-20ms	Network RTT only
—	Total (OS cache hit)	<1ms	Memory access only

Timing Distribution in Practice:

A study of production DNS traffic typically shows:

~70% of queries: Answered from resolver cache in <20ms
~20% of queries: Partial cache hit (some NS cached), 20-100ms
~10% of queries: Complete cache miss, 100-300ms

Factors Affecting Resolution Time:

Performance Variables

•Geographic distance: Resolvers and authoritative servers spread globally. Queries may traverse continents.
•Anycast deployment: Well-deployed anycast reduces latency by directing queries to nearby instances.
•TTL values: Lower TTLs mean more frequent resolution. A 60-second TTL means re-resolution every minute.
•Server load: Overloaded DNS servers may delay responses or drop packets.
•Network congestion: Packet loss causes timeouts and retransmissions.
•Protocol overhead: DNS-over-HTTPS adds TLS handshake time but enables connection reuse.
•DNSSEC validation: Adds time for cryptographic verification chain.

The First-Byte Problem

DNS resolution directly impacts Time-to-First-Byte (TTFB), a critical web performance metric. A 200ms DNS resolution adds 200ms before the browser can even begin the TCP handshake. For mobile users on high-latency networks, this can mean half a second just for DNS. This is why DNS prefetching and caching are so important for user experience.

Modern Optimizations

The DNS ecosystem has evolved numerous optimizations to minimize resolution latency:

1. DNS Prefetching

Browsers predict which hostnames will be needed and resolve them in advance:

HTML DNS Prefetch Hints
1
2
3
4
5
6
7
<!-- Instruct browser to prefetch DNS for domains used later -->
<link rel="dns-prefetch" href="//cdn.example.com">
<link rel="dns-prefetch" href="//api.example.com">
<link rel="dns-prefetch" href="//fonts.googleapis.com">
 
<!-- Preconnect does DNS + TCP + TLS handshake -->
<link rel="preconnect" href="https://cdn.example.com">

Browser prefetching behaviors:

Parse page HTML for links to other origins
Resolve those hostnames before user clicks
User hovers over a link? Prefetch that domain
Maintains a pool of pre-resolved hostnames

2. Resolver Cache Prefetching

Advanced recursive resolvers refresh cache entries before expiration:

Original TTL: 3600 seconds (1 hour)
Cache entry age: 3000 seconds (83% of TTL)
Action: Background refresh query to authoritative
Result: Cache always hot for popular domains

3. Happy Eyeballs (RFC 8305)

When both A (IPv4) and AAAA (IPv6) records exist:

Query both A and AAAA simultaneously
Start connecting to both address families
Prefer IPv6 but fall back to IPv4 quickly if IPv6 is slow
Typical delay: 250ms before IPv4 fallback

Client-Side Optimizations

•DNS prefetching for predicted navigations
•Local caching with appropriate TTL respect
•Parallel A/AAAA queries
•Connection pooling and reuse
•Happy Eyeballs for dual-stack

Resolver Optimizations

•TTL-aware cache prefetching
•SRTT-based server selection
•Query minimization (QNAME minimization)
•Negative response caching
•TCP/TLS connection reuse (DoT/DoH)

4. Anycast Deployment

Major DNS infrastructure uses BGP anycast to minimize latency:

Same IP address advertised from multiple global locations
Network routing automatically directs queries to nearest instance
Root servers: 13 identifiers, 1000+ actual instances worldwide
Public resolvers (8.8.8.8, 1.1.1.1): Present in major IXPs globally

5. EDNS Client Subnet (ECS)

For CDN optimization, resolvers can pass partial client IP to authoritative servers:

Client IP: 192.168.1.100 (private, behind NAT)
Resolver sends: EDNS0 Client Subnet: 203.0.113.0/24 (public prefix)
Authoritative server: Returns IP of nearest CDN edge
Result: Better content delivery performance

Privacy vs Performance Trade-off

ECS improves CDN performance but exposes client location to authoritative servers. Privacy-focused resolvers (like 1.1.1.1) may disable ECS by default. DNS-over-HTTPS combined with no ECS provides best privacy but may result in suboptimal CDN routing.

Resolution Failure Modes

DNS resolution can fail in various ways. Understanding failure modes helps with troubleshooting and designing resilient systems.

Failure Categories:

DNS Resolution Failure Types
Failure Type	DNS Response	Meaning	Common Causes
NXDOMAIN	RCODE=3, AA=1	Domain does not exist	Typo in domain, domain expired, zone misconfiguration
NODATA	RCODE=0, AA=1, empty answer	Domain exists, no record of queried type	Querying A for domain with only MX records
SERVFAIL	RCODE=2	Server failure during resolution	Authoritative servers unreachable, DNSSEC validation failure
REFUSED	RCODE=5	Server refuses to answer	Querying authoritative server that doesn't serve zone
Timeout	No response	Network issue or server overload	Firewall blocking, DDoS, server down
Lame delegation	SERVFAIL or timeout	NS points to non-authoritative server	Zone transfer failure, misconfiguration

SERVFAIL Deep Dive:

SERVFAIL is the most common and frustrating failure because it indicates something went wrong but not what. Common causes:

All authoritative servers unreachable: Network issues, DDoS attack, misconfiguration
DNSSEC validation failure: Signature expired, key mismatch, missing RRSIG
Lame delegation: NS records point to servers not configured for the zone
Timeout cascade: All servers responded too slowly
Malformed responses: Authoritative server returning invalid data

Diagnosing SERVFAIL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Step 1: Try querying authoritative servers directly
$ dig @ns1.example.com example.com A +norec
 
# Step 2: Check if DNSSEC validation is the issue
$ dig @8.8.8.8 example.com A +cd  # +cd disables DNSSEC validation
# If this works but normal query fails, DNSSEC is the problem
 
# Step 3: Check DNSSEC chain
$ dig example.com A +dnssec
$ dig example.com DNSKEY
$ dig com DNSKEY
$ delv example.com A  # Validates DNSSEC chain
 
# Step 4: Test each authoritative server
$ dig @ns1.example.com example.com SOA +norec
$ dig @ns2.example.com example.com SOA +norec

Negative Caching Impact

NXDOMAIN and SERVFAIL responses are cached (negative caching). An NXDOMAIN for a briefly-misconfigured zone may persist in caches long after the fix. The negative cache TTL (from SOA minimum field) controls this. Some resolvers cache SERVFAIL for shorter periods (e.g., 30 seconds) to allow faster recovery.

Special Resolution Scenarios

Several special cases in DNS resolution require additional handling:

1. CNAME Chain Resolution

When an A query returns a CNAME, the resolver must continue resolving:

Query: www.example.com A

Step 1: www.example.com CNAME webserver.example.com
        (No A record at this name, follow CNAME)

Step 2: webserver.example.com CNAME lb.cdn.net
        (Another CNAME, continue following)

Step 3: lb.cdn.net A 203.0.113.100
        (Final answer)

Resolvers have CNAME chain limits (typically 8-16) to prevent infinite loops from misconfigurations like:

a.example.com CNAME b.example.com
b.example.com CNAME a.example.com  ← Infinite loop!

2. MX Resolution and Additional Records

MX (Mail Exchange) resolution involves additional steps:

Query: example.com MX

Response:
  Answer: example.com MX 10 mail.example.com
          example.com MX 20 backup.example.com
  Additional: mail.example.com A 192.0.2.10
              backup.example.com A 192.0.2.20

Authoritative servers often include A records for MX targets in the Additional section to save a round trip. Resolvers use these but verify them if needed.

3. Split-Horizon DNS

Organizations may return different results based on query source:

Query Source	Result for `app.company.com`
Internal network	10.0.1.100 (private IP)
External internet	203.0.113.50 (public IP)

Implemented via:

View-based configuration in BIND
Separate internal/external DNS servers
Resolver policies based on client subnet

4. Load Balancing via DNS

DNS can distribute traffic across servers:

DNS Load Balancing Methods

•Round-robin: Return multiple A records; clients try first one (or random)
•Geo-DNS: Return different IPs based on client location (via ECS or resolver IP)
•Weighted: Return IPs with different frequencies based on server capacity
•Health-checked: Only return IPs of healthy servers (requires active monitoring)
•Latency-based: Return server with lowest latency to client region

DNS Load Balancing Limitations

DNS-based load balancing has limitations: (1) clients cache results, so changes aren't immediate; (2) TTLs create trade-offs between freshness and performance; (3) some clients ignore multiple A records and always use the first. For true high-availability, combine DNS load balancing with layer-4 or layer-7 load balancers.

Practical Resolution Analysis

Let's examine practical tools and techniques for analyzing DNS resolution:

Using dig for Complete Analysis:

Comprehensive dig Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Basic query with timing
$ dig www.example.com +stats
;; Query time: 23 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
 
# Trace full resolution path
$ dig www.example.com +trace +additional
 
# Query specific server without recursion (iterative mode)
$ dig @ns1.example.com www.example.com +norec
 
# Check all record types
$ dig www.example.com ANY
 
# Query with DNSSEC validation info
$ dig www.example.com +dnssec +multi
 
# TCP mode (useful when UDP truncates)
$ dig www.example.com +tcp
 
# Specify query type and class
$ dig www.example.com AAAA IN
 
# Timing multiple resolvers
$ for resolver in 8.8.8.8 1.1.1.1 9.9.9.9; do
    echo -n "$resolver: "
    dig @$resolver www.example.com +stats 2>&1 | grep "Query time"
done

Browser Developer Tools:

Chrome and Firefox expose DNS timing in developer tools:

Network panel: Each request shows DNS timing in waterfall
Performance panel: DNS resolution appears in timing breakdown
chrome://net-internals/#dns: Shows Chrome's DNS cache contents
about:networking#dns: Firefox's DNS cache viewer

Wireshark DNS Analysis:

Capturing DNS traffic reveals packet-level details:

Capture filter: port 53
Display filter: dns

Useful columns:
- Time (for latency analysis)
- Source/Destination (query direction)
- Info (query type and domain)
- dns.flags.response (distinguish query from response)
- dns.qry.name (domain being queried)
- dns.resp.ttl (TTL values in responses)

DNS Analysis Tools Comparison
Tool	Best For	Platform	Key Features
dig	Command-line queries	Unix/Linux/macOS	Detailed output, +trace, +dnssec
nslookup	Quick lookups	Cross-platform	Interactive mode, simple syntax
host	Simple forward/reverse lookup	Unix/Linux	Concise output
drill	DNSSEC analysis	Unix/Linux	Better DNSSEC output than dig
delv	DNSSEC validation	BIND package	Validates DNSSEC chain
dnstracer	Trace delegation	Unix/Linux	Shows each delegation step
Wireshark	Packet-level analysis	Cross-platform	Full packet capture and decode

Testing DNS Changes

When making DNS changes, remember caching at every level. Use dig @authoritative-server to verify the change at the source. Then wait for TTL expiration of the old record. Tools like DNS propagation checkers query servers worldwide to show when changes have propagated globally.

Summary: Resolution Process

We've traced the complete DNS resolution process from user action to IP address delivery. Let's consolidate the key concepts:

Key Takeaways

•Multiple cache layers exist — Browser, OS, local network, and resolver caches each reduce load on authoritative infrastructure.
•Most queries hit cache — 90%+ of queries are answered from resolver cache; only a fraction require full iterative resolution.
•Timing varies widely — From <1ms (OS cache) to 300ms+ (cold cache, distant authoritative); caching is essential for performance.
•Modern optimizations abound — Prefetching, anycast, SRTT selection, and connection reuse minimize latency.
•Failures have distinct types — NXDOMAIN, SERVFAIL, timeout each indicate different problems; systematic diagnosis is essential.
•Special cases require handling — CNAME chains, MX additional records, split-horizon, and load balancing add complexity.

What's next:

We've seen the complete resolution process. Next, we'll examine the resolver role in more detail—exploring how resolvers are configured, the critical services they provide beyond basic resolution, and how resolver selection impacts performance and privacy.

Page Complete

You now understand the complete DNS resolution process from application request through all cache layers and authoritative queries back to IP address delivery. You can trace resolution timing, diagnose failures, and apply modern optimizations. Next, we'll deep-dive into the resolver's role and configuration.