Traffic Shaping - Learning Module

Loading content...

0/228

QoS Implementation

Engineering Predictable Network Quality

A hospital's telemedicine system carries real-time video consultations between surgeons and remote operating rooms. Simultaneously, the same network handles administrative email, software updates, and backup traffic. If a large backup job delays video packets by even 200 milliseconds, visual artifacts appear—potentially dangerous during a surgery.

Quality of Service (QoS) is the suite of technologies that ensures this never happens. QoS provides mechanisms to guarantee network behavior for critical applications—bounded latency, minimum bandwidth, controlled jitter—regardless of other traffic on the network. It transforms best-effort networking into predictable, reliable infrastructure.

What You Will Learn

By the end of this page, you will understand end-to-end QoS implementation—from classification and marking at network edges, through queuing and scheduling in the core, to the DiffServ and IntServ models that define how QoS operates at scale. You'll be equipped to design and implement QoS architectures for enterprise and carrier networks.

QoS Fundamentals

Quality of Service refers to the capability of a network to provide differentiated and predictable service to network traffic. It encompasses all mechanisms that control traffic characteristics to provide predictable network behavior.

Why QoS Exists:

The fundamental network resource—bandwidth—is finite. When demand exceeds capacity, the network must decide:

Which packets to send first?
Which packets to delay?
Which packets to drop?

Without QoS, these decisions are arbitrary (FIFO). With QoS, they become policy-driven and aligned with business needs.

QoS Metrics:

Key QoS Metrics
Metric	Definition	Impact	Typical Target
Bandwidth	Data throughput (bits per second)	How much data can flow	≥ committed rate
Latency (Delay)	Time from source to destination	Responsiveness, real-time quality	< 150ms for voice
Jitter	Variation in latency	Audio/video smoothness	< 30ms for voice
Packet Loss	Percentage of packets not delivered	Data integrity, retransmissions	< 1% for voice
Availability	Percentage of time service is accessible	Reliability	99.99% for critical

Traffic Type Requirements:

Different applications have vastly different requirements:

Traffic Type	Bandwidth	Latency	Jitter	Loss	Example
Voice (VoIP)	Low (64 Kbps/call)	Critical (<150ms)	Critical (<30ms)	Low (<1%)	Phone calls
Video Conferencing	High (2-10 Mbps)	Critical (<200ms)	Critical (<50ms)	Low (<1%)	Zoom, Teams
Streaming Video	High (5-25 Mbps)	Medium (buffered)	Low (buffered)	Low	Netflix, YouTube
Interactive Data	Medium	Medium (<500ms)	Medium	Very Low	Web apps, SaaS
Bulk Transfer	High	Low	None	Very Low	Backups, sync
Background	Low priority	None	None	Acceptable	Updates, telemetry

The Voice Benchmark

Voice (VoIP) is often the most demanding real-time application because it has strict requirements across ALL metrics simultaneously—low bandwidth but critical latency, jitter, and loss requirements. If your QoS can handle voice well, it can typically handle anything.

QoS Models: IntServ vs DiffServ

Two architectural models define how QoS operates across networks:

1. Integrated Services (IntServ)

IntServ provides per-flow guarantees through resource reservation. Each flow (e.g., a single VoIP call) gets explicit bandwidth and latency guarantees.

Mechanism:

RSVP (Resource Reservation Protocol): Signals resource requirements along the path
Each router reserves buffer space and processing capacity
Admission control: new flows rejected if resources unavailable
State maintained per flow in every router

IntServ Advantages

•Hard guarantees: Bandwidth and latency guaranteed per-flow
•Admission control: Never oversubscribed
•End-to-end: Guarantees span the full path
•Fine-grained: Individual flow control

IntServ Limitations

•Scalability: State per flow doesn't scale (millions of flows)
•Complexity: Every router must support RSVP
•Signaling overhead: Reservation setup takes time
•Rigid: Resources locked even if unused

2. Differentiated Services (DiffServ)

DiffServ provides class-based QoS without per-flow state. Traffic is classified into a small number of classes, each receiving different treatment.

Mechanism:

Packets marked with DSCP (Differentiated Services Code Point) in IP header
Edge devices classify and mark; core devices act on marks
Per-Hop Behaviors (PHBs) define treatment at each router
No signaling, no per-flow state in core

DiffServ Advantages:

Scalable: Fixed number of classes regardless of flows
Simple core: Only edge devices need classification logic
No signaling overhead
Flexible: Class definitions can be policy-driven

DiffServ Limitations:

Relative, not absolute guarantees
Best-class traffic still competes with best-class traffic
Complex classification at edges
Cross-domain policies must align

IntServ vs DiffServ Comparison
Aspect	IntServ	DiffServ
Guarantee type	Absolute (per-flow)	Relative (per-class)
State	Per-flow in every router	Per-class (fixed, small)
Scalability	Limited (thousands)	Excellent (unlimited flows)
Signaling	Required (RSVP)	None
Classification	5-tuple (flow-based)	DSCP (class-based)
Deployment	Rare, specialized	Ubiquitous

DiffServ Won

In practice, DiffServ is the dominant QoS model. IntServ's scalability limitations made it impractical for the internet and most enterprise networks. DiffServ's 'good enough' class-based approach scales far better. IntServ concepts survive in specific contexts like MPLS-TE and data center fabrics.

The QoS Policy Framework

Implementing QoS requires a structured framework with distinct functional components:

1. Classification — Identifying what traffic is 2. Marking — Labeling packets for treatment 3. Policing — Enforcing traffic contracts 4. Shaping — Smoothing traffic output 5. Queuing — Organizing packets for transmission 6. Scheduling — Deciding which queue to service

Converting Mermaid diagram...

Classification Methods:

Packets are classified using various criteria:

Method	Layer	Criteria	Trust Required
Interface	L1	Ingress port	None (physical)
VLAN	L2	802.1Q VLAN ID, CoS	Medium
IP Header	L3	Source/dest IP, DSCP, protocol	Depends
Ports	L4	TCP/UDP port numbers	Medium
DPI	L7	Application signatures	High (inspect payload)
NBAR	L7	Cisco Network-Based Application Recognition	High

Classification Example (Cisco):

! Define class-maps for classification
class-map match-any VOICE
  match protocol rtp audio
  match dscp ef
  
class-map match-any VIDEO
  match protocol rtp video
  match dscp af41
  
class-map match-any BUSINESS-CRITICAL
  match access-group name CRITICAL-APPS
  match dscp af31
  
class-map match-any BEST-EFFORT
  match any

Marking (DSCP Values):

The 6-bit DSCP field allows 64 values, but standard PHBs use a subset:

Standard DSCP Values and PHBs
PHB	DSCP (Decimal)	DSCP (Binary)	Application
EF (Expedited Forwarding)	46	101110	VoIP, real-time
AF41 (Assured Forwarding)	34	100010	Video conferencing
AF31	26	011010	Mission-critical data
AF21	18	010010	Transactional data
AF11	10	001010	Bulk data
CS3 (Class Selector)	24	011000	Signaling (SIP, H.323)
CS2	16	010000	Network management
BE (Default)	0	000000	Best effort

Trust Boundaries

Never trust DSCP markings from untrusted sources. At trust boundaries (e.g., customer-facing interfaces), either reset all markings to default OR re-classify based on observed traffic. Trusting external markings allows attackers to gain priority by simply marking their traffic as EF.

Queuing Disciplines

Queuing is where differentiated treatment happens. Multiple queue strategies exist, each with different characteristics.

1. FIFO (First In, First Out)

Simplest: packets served in arrival order. Single queue, no differentiation.

Problems: High-priority packets wait behind low-priority. Bursty sources can consume all buffer space.

2. Priority Queuing (PQ)

Multiple queues with strict priority order. Higher-priority queue fully drained before lower.

Priority Order:
  Queue 1 (Critical): Always served first
  Queue 2 (High): Served when Queue 1 empty
  Queue 3 (Normal): Served when Queues 1-2 empty
  Queue 4 (Low): Served when Queues 1-3 empty

Advantage: Critical traffic gets minimal latency. Problem: Starvation—lower queues may never be served if higher queues always have traffic.

Queuing Discipline Comparison
Discipline	Queues	Fairness	Latency	Starvation	Use Case
FIFO	1	None	Variable	N/A	Simple/legacy
Priority (PQ)	Fixed (4-8)	None	Excellent for high	Possible	Real-time traffic
WFQ (Weighted Fair)	Per-flow	Yes (weighted)	Good	No	Fair sharing
CBWFQ	Per-class	Yes (configurable)	Good	No	Enterprise QoS
LLQ	PQ + CBWFQ	Yes + strict priority	Excellent	Controlled	Voice + data

3. Weighted Fair Queuing (WFQ)

Allocates bandwidth fairly across flows, weighted by importance.

Flow A weight: 2
Flow B weight: 1
Flow C weight: 1
Total bandwidth: 10 Mbps

Allocation:
  Flow A: (2/4) × 10 = 5 Mbps
  Flow B: (1/4) × 10 = 2.5 Mbps
  Flow C: (1/4) × 10 = 2.5 Mbps

4. Class-Based Weighted Fair Queuing (CBWFQ)

WFQ applied to defined classes rather than individual flows:

! Cisco CBWFQ example
policy-map WAN-QOS
  class VOICE
    priority 1000  ! 1 Mbps strict priority
  class VIDEO
    bandwidth 5000  ! 5 Mbps minimum guarantee
  class BUSINESS
    bandwidth 3000  ! 3 Mbps minimum
    random-detect    ! WRED for congestion
  class class-default
    fair-queue       ! Fair queuing for rest

5. Low Latency Queuing (LLQ)

Combines priority queuing with CBWFQ—strict priority for real-time, fair queuing for the rest:

LLQ Structure:
  ┌─────────────────────────────────────┐
  │ Priority Queue (Voice, policed)    │ ← Always served first
  ├─────────────────────────────────────┤
  │ CBWFQ Queues (Video, Data, etc.)   │ ← Fair sharing
  └─────────────────────────────────────┘

LLQ is the de facto standard for enterprise voice/video QoS—it provides deterministic latency for real-time traffic while preventing starvation of data traffic.

Police the Priority Queue

Priority queues must be policed or rate-limited. If the priority queue can consume 100% of bandwidth, it will—starving everything else. Voice traffic should typically be limited to 30% or less of link bandwidth. Cisco's LLQ inherently implements this policing.

Active Queue Management

Active Queue Management (AQM) proactively manages queue congestion before overflow, improving network performance and fairness.

The Tail Drop Problem:

Simple queues drop packets when full (tail drop). This creates problems:

Global synchronization: Many TCP flows see loss simultaneously, all back off, then all recover—creating waves
No early warning: TCP doesn't know congestion is building until queue overflows
Bufferbloat: Large queues cause high latency before any drops signal congestion

Random Early Detection (RED):

RED probabilistically drops packets as queue fills:

RED Parameters:
  min_threshold: Start dropping at this queue depth
  max_threshold: Drop probability = 100% here
  max_probability: Max drop probability at max_threshold

Algorithm:
  avg_queue = weighted_average(queue_depth)
  
  if avg_queue < min_threshold:
    drop_probability = 0
  else if avg_queue < max_threshold:
    drop_probability = max_probability × 
                       (avg_queue - min_threshold) /
                       (max_threshold - min_threshold)
  else:
    drop_probability = 1.0
  
  if random() < drop_probability:
    drop_packet()

AQM Algorithms
Algorithm	Approach	Key Feature	Configuration
Tail Drop	Drop when full	Simple, default	None
RED	Random drop as fills	Early congestion signal	Complex (3 params)
WRED	RED per-traffic-class	Differentiated treatment	Complex + per-class
CoDel	Drop based on sojourn time	Targets latency directly	Simple (target, interval)
PIE	Drop probability from latency	Lighter than CoDel \| Simple (target, tupdate)
FQ-CoDel	CoDel + fair queuing	Best overall \| Minimal

Weighted RED (WRED):

Applies different RED parameters per traffic class:

! Cisco WRED example
policy-map WRED-POLICY
  class BUSINESS
    bandwidth 30 percent
    random-detect dscp-based
    random-detect dscp af31 30 60 10  ! min max prob
    random-detect dscp af32 25 50 10
    random-detect dscp af33 20 40 10  ! Lower thresholds = drop sooner

AF classes use three drop precedences (green/yellow/red). WRED drops higher precedence (yellow/red marked) packets earlier, preserving green-marked conforming traffic.

CoDel (Controlled Delay):

Modern AQM algorithm targeting sojourn time (how long packets wait in queue):

CoDel Logic:
  target: 5ms max sojourn time
  interval: 100ms measurement window
  
  if all packets in last interval waited > target:
    enter dropping state
    periodically drop packets
    decrease drop interval over time
  else:
    exit dropping state

CoDel advantages:

No configuration tuning required
Directly targets latency (what we care about)
Self-tuning to link speed
No parameter guessing like RED

FQ-CoDel is Best Practice

FQ-CoDel (Fair Queuing CoDel) combines per-flow fair queuing with CoDel AQM. It's the default qdisc in Linux 4.15+ and recommended by the IETF for general-purpose queuing. It handles bufferbloat, prevents heavy flows from starving light flows, and requires no configuration.

End-to-End QoS Design

Effective QoS requires consistent end-to-end design, not just individual device configuration.

Enterprise QoS Architecture:

Converting Mermaid diagram...

QoS at Each Layer:

Access Layer (Classification & Marking):

Trust voice phones on dedicated voice VLANs
Classify and mark traffic from untrusted PCs
Apply ingress policing to limit user bandwidth

Distribution Layer (Aggregation):

Trust DSCP from access layer (within enterprise)
Apply QoS policies at aggregation points
Re-mark if policies require

Core Layer (Simple & Fast):

Trust DSCP absolutely
Apply minimal queuing (priority + fair)
Core is typically over-provisioned; QoS rarely activated

WAN Edge (Critical Point):

WAN is the bottleneck—QoS matters most here
Shape traffic to contracted WAN rate
Apply LLQ for voice/video
Mark traffic for provider network

QoS Design Principles

•Classify and mark once, queue everywhere — Classification is expensive; do it at ingress, trust markings elsewhere
•Police at ingress, shape at egress — Ingress policing protects the network; egress shaping prevents downstream congestion
•Provision for aggregates, not flows — Ensure capacity for voice CLASS, not individual calls
•Congestion happens at bottlenecks — Focus QoS resources on constrained links (WAN, uplinks)
•Match class definitions end-to-end — EF must mean EF throughout the path
•Limit priority traffic — Priority queue should be ≤30% of bandwidth

The '4-Class Model'

A simple, effective model uses 4 classes: 1) Voice (EF, priority queue, policed), 2) Video (AF4x, guaranteed bandwidth), 3) Business Critical (AF2x-AF3x, guaranteed minimum), 4) Best Effort (default, fair queuing). This covers most use cases without excessive complexity.

QoS in Modern Architectures

Modern network architectures—cloud, SD-WAN, containers—require adapted QoS approaches.

Cloud Network QoS:

Cloud providers implement QoS differently than traditional networks:

Cloud Context	QoS Mechanism	Notes
VM Networking	Hypervisor-enforced bandwidth limits	Minimum/maximum per VM
VPC Egress	Per-instance egress bandwidth	Credit-based, burstable
Cross-AZ Traffic	Implicit priority (same-AZ faster)	Cost and latency differences
Dedicated Interconnect	DSCP honored, traditional QoS	Customer-managed markings

Example: AWS Network QoS

Instance types have baseline + burst bandwidth
EBS-optimized instances have dedicated storage bandwidth
Enhanced networking (ENA) reduces latency
Placement groups reduce inter-VM latency
Transit Gateway reserves capacity

SD-WAN QoS:

SD-WAN revolutionizes WAN QoS by treating multiple WAN paths as a pool:

SD-WAN QoS Features:
  1. Application-Aware Routing
     - Identify application (DPI, SaaS signatures)
     - Choose best path based on real-time metrics
     
  2. Path Quality Monitoring
     - Continuous latency/jitter/loss measurement
     - Per-path health metrics
     
  3. Dynamic Path Selection
     - VoIP → Path with lowest jitter
     - Bulk → Path with highest bandwidth
     - Critical → Path with best overall score
     
  4. Forward Error Correction (FEC)
     - Proactively add redundancy for real-time traffic
     - Recover from packet loss without retransmission
     
  5. Packet Duplication
     - Send critical traffic over multiple paths
     - First-arriving packet wins

Container Networking QoS:

Kubernetes and containers require different approaches:

# Kubernetes Pod QoS via resource limits
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "512Mi"
        cpu: "1000m"
    # Network bandwidth annotation (varies by CNI)
    annotations:
      kubernetes.io/egress-bandwidth: "10M"
      kubernetes.io/ingress-bandwidth: "10M"

CNI plugins like Calico, Cilium, and Antrea provide traffic shaping via Linux tc.

Zero Trust QoS

In zero-trust architectures, QoS must work with encrypted traffic. Since payload inspection is impossible, classification relies on metadata: IP addresses, port numbers, and increasingly, encryption details like TLS SNI or QUIC connection IDs. This is a significant shift from traditional deep packet inspection approaches.

QoS Implementation Case Studies

Real-world QoS implementations provide practical insights.

Case Study 1: Enterprise VoIP Deployment

A 5,000-employee enterprise deploying VoIP with these requirements:

1,000 concurrent calls maximum
G.711 codec (64 Kbps + headers = ~80 Kbps per call)
150ms end-to-end latency maximum
1% packet loss maximum

Solution:

Bandwidth Calculation:
  Voice: 1000 calls × 80 Kbps = 80 Mbps
  Voice + overhead: 100 Mbps
  
WAN Link: 1 Gbps
Voice allocation: 10% (priority)

Policy:
  1. Phones on dedicated Voice VLAN (trust DSCP)
  2. Voice marked EF (DSCP 46)
  3. Call signaling marked CS3 (DSCP 24)
  4. LLQ policy: priority 100 Mbps for EF
  5. Monitoring: CAC (Call Admission Control) at 90%

Case Study 2: Multi-Tenant Data Center

Data center with 100 tenants, each needing guaranteed bandwidth:

Challenges:
  - Tenants don't trust each other
  - Variable traffic patterns
  - Fair burst sharing
  - Predictable performance
  
Solution: Hierarchical Token Bucket (HTB)

  Root: 100 Gbps data center uplink
  ├── Tenant-A: rate 10G, ceil 15G
  │   ├── Web: rate 5G, ceil 10G
  │   ├── Database: rate 3G, ceil 5G
  │   └── Backup: rate 2G, ceil 15G
  ├── Tenant-B: rate 8G, ceil 12G
  │   └── ...
  └── Shared/Burst: rate 0, ceil 100G

Key Features:
  - Each tenant gets guaranteed minimum
  - Can burst above minimum if capacity available
  - Hierarchical: tenant quota, then internal division
  - Isolation: Tenant A's burst doesn't affect Tenant B's guarantee

Case Study 3: Global Video Conferencing

Global enterprise with video conferencing across continents:

Challenges:
  - 200ms+ intercontinental latency
  - Variable internet path quality
  - Competing with video streaming, backups
  - Different endpoint capabilities

Solution: Multi-layer QoS + Adaptive

  1. End-to-end DSCP marking (AF41)
  2. SD-WAN with dual internet + MPLS
  3. Path selection: lowest-latency for video
  4. FEC (10% redundancy) for packet loss compensation
  5. Adaptive bitrate: 720p default, drops to 360p if congestion
  6. Jitter buffer: 80ms at endpoints
  
Results:
  - 99.5% of calls achieve 'good' quality
  - Fallback to audio-only if path degraded
  - Automatic failover between paths <2 seconds

Measure, Measure, Measure

Every QoS implementation should include monitoring. Track per-class throughput, drop rates, queue depths, and latency. Without measurement, you're flying blind—you won't know if QoS is helping until users complain.

Summary: QoS Implementation

We've comprehensively explored Quality of Service implementation—the culmination of traffic shaping concepts applied to deliver differentiated, predictable network behavior.

Key Takeaways

•QoS provides predictable network behavior — Controlling bandwidth, latency, jitter, and loss for different traffic types.
•DiffServ dominates over IntServ — Class-based, stateless treatment scales far better than per-flow reservations.
•QoS is a pipeline — Classification → Marking → Policing → Shaping → Queuing → Scheduling, each stage critical.
•LLQ combines priority with fairness — Strict priority for real-time, weighted fair queuing for the rest.
•AQM prevents congestion collapse — CoDel/FQ-CoDel are modern best practices, avoiding RED's configuration complexity.
•End-to-end design is essential — Consistent class definitions and treatment across the entire path.
•Modern architectures adapt QoS — Cloud, SD-WAN, and containers require new approaches while using familiar principles.

Module Complete:

You've now completed the Traffic Shaping module. You understand the foundational concepts, the leaky bucket and token bucket algorithms, rate limiting practices, and how these combine into comprehensive QoS implementations. This knowledge is directly applicable to network design, troubleshooting, and system architecture at any scale.

Module Complete

Congratulations! You've mastered Traffic Shaping from concepts through implementation. You understand leaky bucket, token bucket, rate limiting, and QoS—the mechanisms that make modern networks predictable, fair, and reliable. These skills are essential for network engineering, distributed systems design, and any role involving network-dependent applications.

QoS Implementation

Engineering Predictable Network Quality

What You Will Learn

QoS Fundamentals

Why QoS Exists:

The fundamental network resource—bandwidth—is finite. When demand exceeds capacity, the network must decide:

Which packets to send first?
Which packets to delay?
Which packets to drop?

Without QoS, these decisions are arbitrary (FIFO). With QoS, they become policy-driven and aligned with business needs.

QoS Metrics:

Key QoS Metrics
Metric	Definition	Impact	Typical Target
Bandwidth	Data throughput (bits per second)	How much data can flow	≥ committed rate
Latency (Delay)	Time from source to destination	Responsiveness, real-time quality	< 150ms for voice
Jitter	Variation in latency	Audio/video smoothness	< 30ms for voice
Packet Loss	Percentage of packets not delivered	Data integrity, retransmissions	< 1% for voice
Availability	Percentage of time service is accessible	Reliability	99.99% for critical

Traffic Type Requirements:

Different applications have vastly different requirements:

Traffic Type	Bandwidth	Latency	Jitter	Loss	Example
Voice (VoIP)	Low (64 Kbps/call)	Critical (<150ms)	Critical (<30ms)	Low (<1%)	Phone calls
Video Conferencing	High (2-10 Mbps)	Critical (<200ms)	Critical (<50ms)	Low (<1%)	Zoom, Teams
Streaming Video	High (5-25 Mbps)	Medium (buffered)	Low (buffered)	Low	Netflix, YouTube
Interactive Data	Medium	Medium (<500ms)	Medium	Very Low	Web apps, SaaS
Bulk Transfer	High	Low	None	Very Low	Backups, sync
Background	Low priority	None	None	Acceptable	Updates, telemetry

The Voice Benchmark

QoS Models: IntServ vs DiffServ

Two architectural models define how QoS operates across networks:

1. Integrated Services (IntServ)

IntServ provides per-flow guarantees through resource reservation. Each flow (e.g., a single VoIP call) gets explicit bandwidth and latency guarantees.

Mechanism:

RSVP (Resource Reservation Protocol): Signals resource requirements along the path
Each router reserves buffer space and processing capacity
Admission control: new flows rejected if resources unavailable
State maintained per flow in every router

IntServ Advantages

•Hard guarantees: Bandwidth and latency guaranteed per-flow
•Admission control: Never oversubscribed
•End-to-end: Guarantees span the full path
•Fine-grained: Individual flow control

IntServ Limitations

•Scalability: State per flow doesn't scale (millions of flows)
•Complexity: Every router must support RSVP
•Signaling overhead: Reservation setup takes time
•Rigid: Resources locked even if unused

2. Differentiated Services (DiffServ)

DiffServ provides class-based QoS without per-flow state. Traffic is classified into a small number of classes, each receiving different treatment.

Mechanism:

Packets marked with DSCP (Differentiated Services Code Point) in IP header
Edge devices classify and mark; core devices act on marks
Per-Hop Behaviors (PHBs) define treatment at each router
No signaling, no per-flow state in core

DiffServ Advantages:

Scalable: Fixed number of classes regardless of flows
Simple core: Only edge devices need classification logic
No signaling overhead
Flexible: Class definitions can be policy-driven

DiffServ Limitations:

Relative, not absolute guarantees
Best-class traffic still competes with best-class traffic
Complex classification at edges
Cross-domain policies must align

IntServ vs DiffServ Comparison
Aspect	IntServ	DiffServ
Guarantee type	Absolute (per-flow)	Relative (per-class)
State	Per-flow in every router	Per-class (fixed, small)
Scalability	Limited (thousands)	Excellent (unlimited flows)
Signaling	Required (RSVP)	None
Classification	5-tuple (flow-based)	DSCP (class-based)
Deployment	Rare, specialized	Ubiquitous

DiffServ Won

The QoS Policy Framework

Implementing QoS requires a structured framework with distinct functional components:

Converting Mermaid diagram...

Classification Methods:

Packets are classified using various criteria:

Method	Layer	Criteria	Trust Required
Interface	L1	Ingress port	None (physical)
VLAN	L2	802.1Q VLAN ID, CoS	Medium
IP Header	L3	Source/dest IP, DSCP, protocol	Depends
Ports	L4	TCP/UDP port numbers	Medium
DPI	L7	Application signatures	High (inspect payload)
NBAR	L7	Cisco Network-Based Application Recognition	High

Classification Example (Cisco):

! Define class-maps for classification
class-map match-any VOICE
  match protocol rtp audio
  match dscp ef
  
class-map match-any VIDEO
  match protocol rtp video
  match dscp af41
  
class-map match-any BUSINESS-CRITICAL
  match access-group name CRITICAL-APPS
  match dscp af31
  
class-map match-any BEST-EFFORT
  match any

Marking (DSCP Values):

The 6-bit DSCP field allows 64 values, but standard PHBs use a subset:

Standard DSCP Values and PHBs
PHB	DSCP (Decimal)	DSCP (Binary)	Application
EF (Expedited Forwarding)	46	101110	VoIP, real-time
AF41 (Assured Forwarding)	34	100010	Video conferencing
AF31	26	011010	Mission-critical data
AF21	18	010010	Transactional data
AF11	10	001010	Bulk data
CS3 (Class Selector)	24	011000	Signaling (SIP, H.323)
CS2	16	010000	Network management
BE (Default)	0	000000	Best effort

Trust Boundaries

Queuing Disciplines

Queuing is where differentiated treatment happens. Multiple queue strategies exist, each with different characteristics.

1. FIFO (First In, First Out)

Simplest: packets served in arrival order. Single queue, no differentiation.

Problems: High-priority packets wait behind low-priority. Bursty sources can consume all buffer space.

2. Priority Queuing (PQ)

Multiple queues with strict priority order. Higher-priority queue fully drained before lower.

Priority Order:
  Queue 1 (Critical): Always served first
  Queue 2 (High): Served when Queue 1 empty
  Queue 3 (Normal): Served when Queues 1-2 empty
  Queue 4 (Low): Served when Queues 1-3 empty

Advantage: Critical traffic gets minimal latency. Problem: Starvation—lower queues may never be served if higher queues always have traffic.

Queuing Discipline Comparison
Discipline	Queues	Fairness	Latency	Starvation	Use Case
FIFO	1	None	Variable	N/A	Simple/legacy
Priority (PQ)	Fixed (4-8)	None	Excellent for high	Possible	Real-time traffic
WFQ (Weighted Fair)	Per-flow	Yes (weighted)	Good	No	Fair sharing
CBWFQ	Per-class	Yes (configurable)	Good	No	Enterprise QoS
LLQ	PQ + CBWFQ	Yes + strict priority	Excellent	Controlled	Voice + data

3. Weighted Fair Queuing (WFQ)

Allocates bandwidth fairly across flows, weighted by importance.

Flow A weight: 2
Flow B weight: 1
Flow C weight: 1
Total bandwidth: 10 Mbps

Allocation:
  Flow A: (2/4) × 10 = 5 Mbps
  Flow B: (1/4) × 10 = 2.5 Mbps
  Flow C: (1/4) × 10 = 2.5 Mbps

4. Class-Based Weighted Fair Queuing (CBWFQ)

WFQ applied to defined classes rather than individual flows:

! Cisco CBWFQ example
policy-map WAN-QOS
  class VOICE
    priority 1000  ! 1 Mbps strict priority
  class VIDEO
    bandwidth 5000  ! 5 Mbps minimum guarantee
  class BUSINESS
    bandwidth 3000  ! 3 Mbps minimum
    random-detect    ! WRED for congestion
  class class-default
    fair-queue       ! Fair queuing for rest

5. Low Latency Queuing (LLQ)

Combines priority queuing with CBWFQ—strict priority for real-time, fair queuing for the rest:

LLQ Structure:
  ┌─────────────────────────────────────┐
  │ Priority Queue (Voice, policed)    │ ← Always served first
  ├─────────────────────────────────────┤
  │ CBWFQ Queues (Video, Data, etc.)   │ ← Fair sharing
  └─────────────────────────────────────┘

LLQ is the de facto standard for enterprise voice/video QoS—it provides deterministic latency for real-time traffic while preventing starvation of data traffic.

Police the Priority Queue

Active Queue Management

Active Queue Management (AQM) proactively manages queue congestion before overflow, improving network performance and fairness.

The Tail Drop Problem:

Simple queues drop packets when full (tail drop). This creates problems:

Global synchronization: Many TCP flows see loss simultaneously, all back off, then all recover—creating waves
No early warning: TCP doesn't know congestion is building until queue overflows
Bufferbloat: Large queues cause high latency before any drops signal congestion

Random Early Detection (RED):

RED probabilistically drops packets as queue fills:

RED Parameters:
  min_threshold: Start dropping at this queue depth
  max_threshold: Drop probability = 100% here
  max_probability: Max drop probability at max_threshold

Algorithm:
  avg_queue = weighted_average(queue_depth)
  
  if avg_queue < min_threshold:
    drop_probability = 0
  else if avg_queue < max_threshold:
    drop_probability = max_probability × 
                       (avg_queue - min_threshold) /
                       (max_threshold - min_threshold)
  else:
    drop_probability = 1.0
  
  if random() < drop_probability:
    drop_packet()

AQM Algorithms
Algorithm	Approach	Key Feature	Configuration
Tail Drop	Drop when full	Simple, default	None
RED	Random drop as fills	Early congestion signal	Complex (3 params)
WRED	RED per-traffic-class	Differentiated treatment	Complex + per-class
CoDel	Drop based on sojourn time	Targets latency directly	Simple (target, interval)
PIE	Drop probability from latency	Lighter than CoDel \| Simple (target, tupdate)
FQ-CoDel	CoDel + fair queuing	Best overall \| Minimal

Weighted RED (WRED):

Applies different RED parameters per traffic class:

! Cisco WRED example
policy-map WRED-POLICY
  class BUSINESS
    bandwidth 30 percent
    random-detect dscp-based
    random-detect dscp af31 30 60 10  ! min max prob
    random-detect dscp af32 25 50 10
    random-detect dscp af33 20 40 10  ! Lower thresholds = drop sooner

AF classes use three drop precedences (green/yellow/red). WRED drops higher precedence (yellow/red marked) packets earlier, preserving green-marked conforming traffic.

CoDel (Controlled Delay):

Modern AQM algorithm targeting sojourn time (how long packets wait in queue):

CoDel Logic:
  target: 5ms max sojourn time
  interval: 100ms measurement window
  
  if all packets in last interval waited > target:
    enter dropping state
    periodically drop packets
    decrease drop interval over time
  else:
    exit dropping state

CoDel advantages:

No configuration tuning required
Directly targets latency (what we care about)
Self-tuning to link speed
No parameter guessing like RED

FQ-CoDel is Best Practice

End-to-End QoS Design

Effective QoS requires consistent end-to-end design, not just individual device configuration.

Enterprise QoS Architecture:

Converting Mermaid diagram...

QoS at Each Layer:

Access Layer (Classification & Marking):

Trust voice phones on dedicated voice VLANs
Classify and mark traffic from untrusted PCs
Apply ingress policing to limit user bandwidth

Distribution Layer (Aggregation):

Trust DSCP from access layer (within enterprise)
Apply QoS policies at aggregation points
Re-mark if policies require

Core Layer (Simple & Fast):

Trust DSCP absolutely
Apply minimal queuing (priority + fair)
Core is typically over-provisioned; QoS rarely activated

WAN Edge (Critical Point):

WAN is the bottleneck—QoS matters most here
Shape traffic to contracted WAN rate
Apply LLQ for voice/video
Mark traffic for provider network

QoS Design Principles

•Classify and mark once, queue everywhere — Classification is expensive; do it at ingress, trust markings elsewhere
•Police at ingress, shape at egress — Ingress policing protects the network; egress shaping prevents downstream congestion
•Provision for aggregates, not flows — Ensure capacity for voice CLASS, not individual calls
•Congestion happens at bottlenecks — Focus QoS resources on constrained links (WAN, uplinks)
•Match class definitions end-to-end — EF must mean EF throughout the path
•Limit priority traffic — Priority queue should be ≤30% of bandwidth

The '4-Class Model'

QoS in Modern Architectures

Modern network architectures—cloud, SD-WAN, containers—require adapted QoS approaches.

Cloud Network QoS:

Cloud providers implement QoS differently than traditional networks:

Cloud Context	QoS Mechanism	Notes
VM Networking	Hypervisor-enforced bandwidth limits	Minimum/maximum per VM
VPC Egress	Per-instance egress bandwidth	Credit-based, burstable
Cross-AZ Traffic	Implicit priority (same-AZ faster)	Cost and latency differences
Dedicated Interconnect	DSCP honored, traditional QoS	Customer-managed markings

Example: AWS Network QoS

Instance types have baseline + burst bandwidth
EBS-optimized instances have dedicated storage bandwidth
Enhanced networking (ENA) reduces latency
Placement groups reduce inter-VM latency
Transit Gateway reserves capacity

SD-WAN QoS:

SD-WAN revolutionizes WAN QoS by treating multiple WAN paths as a pool:

SD-WAN QoS Features:
  1. Application-Aware Routing
     - Identify application (DPI, SaaS signatures)
     - Choose best path based on real-time metrics
     
  2. Path Quality Monitoring
     - Continuous latency/jitter/loss measurement
     - Per-path health metrics
     
  3. Dynamic Path Selection
     - VoIP → Path with lowest jitter
     - Bulk → Path with highest bandwidth
     - Critical → Path with best overall score
     
  4. Forward Error Correction (FEC)
     - Proactively add redundancy for real-time traffic
     - Recover from packet loss without retransmission
     
  5. Packet Duplication
     - Send critical traffic over multiple paths
     - First-arriving packet wins

Container Networking QoS:

Kubernetes and containers require different approaches:

# Kubernetes Pod QoS via resource limits
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "512Mi"
        cpu: "1000m"
    # Network bandwidth annotation (varies by CNI)
    annotations:
      kubernetes.io/egress-bandwidth: "10M"
      kubernetes.io/ingress-bandwidth: "10M"

CNI plugins like Calico, Cilium, and Antrea provide traffic shaping via Linux tc.

Zero Trust QoS

QoS Implementation Case Studies

Real-world QoS implementations provide practical insights.

Case Study 1: Enterprise VoIP Deployment

A 5,000-employee enterprise deploying VoIP with these requirements:

1,000 concurrent calls maximum
G.711 codec (64 Kbps + headers = ~80 Kbps per call)
150ms end-to-end latency maximum
1% packet loss maximum

Solution:

Bandwidth Calculation:
  Voice: 1000 calls × 80 Kbps = 80 Mbps
  Voice + overhead: 100 Mbps
  
WAN Link: 1 Gbps
Voice allocation: 10% (priority)

Policy:
  1. Phones on dedicated Voice VLAN (trust DSCP)
  2. Voice marked EF (DSCP 46)
  3. Call signaling marked CS3 (DSCP 24)
  4. LLQ policy: priority 100 Mbps for EF
  5. Monitoring: CAC (Call Admission Control) at 90%

Case Study 2: Multi-Tenant Data Center

Data center with 100 tenants, each needing guaranteed bandwidth:

Challenges:
  - Tenants don't trust each other
  - Variable traffic patterns
  - Fair burst sharing
  - Predictable performance
  
Solution: Hierarchical Token Bucket (HTB)

  Root: 100 Gbps data center uplink
  ├── Tenant-A: rate 10G, ceil 15G
  │   ├── Web: rate 5G, ceil 10G
  │   ├── Database: rate 3G, ceil 5G
  │   └── Backup: rate 2G, ceil 15G
  ├── Tenant-B: rate 8G, ceil 12G
  │   └── ...
  └── Shared/Burst: rate 0, ceil 100G

Key Features:
  - Each tenant gets guaranteed minimum
  - Can burst above minimum if capacity available
  - Hierarchical: tenant quota, then internal division
  - Isolation: Tenant A's burst doesn't affect Tenant B's guarantee

Case Study 3: Global Video Conferencing

Global enterprise with video conferencing across continents:

Challenges:
  - 200ms+ intercontinental latency
  - Variable internet path quality
  - Competing with video streaming, backups
  - Different endpoint capabilities

Solution: Multi-layer QoS + Adaptive

  1. End-to-end DSCP marking (AF41)
  2. SD-WAN with dual internet + MPLS
  3. Path selection: lowest-latency for video
  4. FEC (10% redundancy) for packet loss compensation
  5. Adaptive bitrate: 720p default, drops to 360p if congestion
  6. Jitter buffer: 80ms at endpoints
  
Results:
  - 99.5% of calls achieve 'good' quality
  - Fallback to audio-only if path degraded
  - Automatic failover between paths <2 seconds

Measure, Measure, Measure

Summary: QoS Implementation

We've comprehensively explored Quality of Service implementation—the culmination of traffic shaping concepts applied to deliver differentiated, predictable network behavior.

Key Takeaways

•QoS provides predictable network behavior — Controlling bandwidth, latency, jitter, and loss for different traffic types.
•DiffServ dominates over IntServ — Class-based, stateless treatment scales far better than per-flow reservations.
•QoS is a pipeline — Classification → Marking → Policing → Shaping → Queuing → Scheduling, each stage critical.
•LLQ combines priority with fairness — Strict priority for real-time, weighted fair queuing for the rest.
•AQM prevents congestion collapse — CoDel/FQ-CoDel are modern best practices, avoiding RED's configuration complexity.
•End-to-end design is essential — Consistent class definitions and treatment across the entire path.
•Modern architectures adapt QoS — Cloud, SD-WAN, and containers require new approaches while using familiar principles.

Module Complete:

Module Complete