Computer NetworksRTP and RTCP

Real-time Transport Protocol and Control Protocol

LevelIntermediate

Duration60 mins

TopicRTP and RTCP

3 / 5

RTCP Feedback: The Control Plane of Real-time Media

The Hidden Intelligence Behind Smooth Video Calls

RTP moves media packets across networks, but media transport alone isn't sufficient for high-quality real-time communication. Consider these questions that RTP alone cannot answer:\n\n- How does a sender know if receivers are experiencing packet loss?\n- How do audio and video streams from the same source stay synchronized?\n- How do participants know who else is in the session?\n- How can senders adapt their bitrate to network conditions?\n- How do applications report viewing statistics and quality metrics?\n\nThe Real-time Transport Control Protocol (RTCP), defined alongside RTP in RFC 3550, provides these essential control functions. While RTP is the 'data plane' moving media, RTCP is the 'control plane' providing feedback, synchronization, and session management.\n\nThis page explores RTCP's architecture, packet types, and mechanisms that enable modern real-time applications to deliver consistent quality despite network variability.

What You Will Learn

By the end of this page, you will understand RTCP's role in RTP sessions, the five standard packet types (SR, RR, SDES, BYE, APP), how senders and receivers use RTCP for feedback and synchronization, and modern extensions like RTCP-based congestion control.

RTCP Overview and Purpose

RTCP works alongside RTP to provide out-of-band control and statistics for RTP media flows. While RTP packets carry actual media data at high frequency (typically 50 packets/second for audio), RTCP packets are sent much less frequently—typically every few seconds—to minimize bandwidth overhead.\n\nRTCP's primary functions:\n\n1. Quality feedback: Receivers report packet loss, jitter, and delay statistics to senders\n2. Synchronization: Provides mapping between RTP timestamps and wall-clock time\n3. Identification: Associates SSRCs with human-readable names and metadata\n4. Session management: Signals participant arrivals and departures\n5. Application-specific data: Extensible mechanism for custom control information

RTP vs RTCP Comparison
Characteristic	RTP	RTCP
Purpose	Carry media data	Control and feedback
Typical frequency	50-100 packets/second	Once every 1-5 seconds
Bandwidth target	As needed by media	~5% of media bandwidth
Loss tolerance	Application-managed	Low—important for feedback
Direction	Usually one-way per stream	Bidirectional
Port (traditional)	Even port (e.g., 5004)	Adjacent odd port (5005)
Port (BUNDLE)	Same as RTP	Same as RTP

RTCP Bandwidth Limiting

RTCP automatically limits itself to approximately 5% of the session's media bandwidth. In a 1 Mbps video session, RTCP uses about 50 kbps. This ensures control overhead never overwhelms the actual media, regardless of how many participants join the session.

Compound RTCP packets:\n\nRTCP packets are always sent in "compound" form—multiple RTCP packet types combined into a single UDP datagram. RFC 3550 mandates specific ordering:\n\n1. SR or RR first: Sender Report if the participant is sending, Receiver Report otherwise\n2. SDES second: Source Description (at minimum containing CNAME)\n3. Other packets: BYE, APP, or extension packets as needed\n\nThis compound structure ensures every RTCP transmission provides basic statistics and identification, even when the primary purpose is something else (like signaling departure via BYE).

Sender Report (SR)

The Sender Report (SR) is sent by participants who are actively sending RTP data. It contains two critical sections: sender information and reception reports for any streams this sender is receiving.\n\nSR structure and fields:

RTCP Sender Report Structure

Packet Layout

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|    RC   |   PT=200      |             Length            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         SSRC of sender                        |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|              NTP timestamp (most significant word)            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              NTP timestamp (least significant word)           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         RTP timestamp                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     sender's packet count                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      sender's octet count                     |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                         Reception Report Blocks...            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
Field Descriptions:
V (2 bits)    : Version (always 2)
P (1 bit)     : Padding flag
RC (5 bits)   : Reception report count (0-31)
PT (8 bits)   : Packet type (200 for SR)
Length (16)   : Length in 32-bit words minus one
SSRC (32)     : Sender's SSRC identifier
NTP TS (64)   : Wall-clock time when report was sent
RTP TS (32)   : RTP timestamp corresponding to NTP time
Packet count  : Total RTP packets sent
Octet count   : Total payload bytes sent

The NTP/RTP timestamp pair:\n\nThe most critical information in an SR is the relationship between the NTP timestamp (wall-clock time) and the RTP timestamp at that moment. This mapping is essential for:\n\n1. Lip sync: Synchronizing audio and video streams from the same source\n2. Inter-stream sync: Aligning streams from different sources in the same conference\n3. Playout timing: Converting RTP timestamps to actual playback times\n\nWithout this mapping, a receiver has no way to know the absolute timing of RTP packets—only relative timing within a single stream.

Lip-sync Using SR Timestamps
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
interface SenderReport {
    ssrc: number;
    ntpTimestamp: bigint;  // 64-bit NTP timestamp
    rtpTimestamp: number;  // 32-bit RTP timestamp
    packetCount: number;
    octetCount: number;
}
 
interface StreamSync {
    ntpTimestamp: bigint;
    rtpTimestamp: number;
    clockRate: number;  // e.g., 48000 for audio, 90000 for video
}
 
function calculateSyncOffset(
    audioSync: StreamSync,
    videoSync: StreamSync,
    videoRtpTimestamp: number
): number {
    // Convert video RTP timestamp to NTP time
    const videoRtpOffset = videoRtpTimestamp - videoSync.rtpTimestamp;
    const videoNtpOffset = BigInt(Math.floor(
        (videoRtpOffset / videoSync.clockRate) * 4294967296  // NTP seconds fraction
    ));
    const videoNtpTime = videoSync.ntpTimestamp + videoNtpOffset;
    
    // Calculate corresponding audio RTP timestamp for this NTP time
    const audioNtpOffset = Number(videoNtpTime - audioSync.ntpTimestamp);
    const audioNtpSeconds = audioNtpOffset / 4294967296;
    const audioRtpOffset = Math.floor(audioNtpSeconds * audioSync.clockRate);
    const expectedAudioRtpTs = audioSync.rtpTimestamp + audioRtpOffset;
    
    // Return offset in milliseconds
    return (audioNtpSeconds) * 1000;
}
 
// Example: Ensuring video frame plays when corresponding audio arrives
function synchronizePlayback(videoFrame: { rtpTs: number }, audioSync: StreamSync, videoSync: StreamSync) {
    const syncOffsetMs = calculateSyncOffset(audioSync, videoSync, videoFrame.rtpTs);
    console.log(`Video frame should play ${syncOffsetMs}ms relative to audio sync point`);
}

NTP Timestamp Format

NTP timestamps are 64 bits: the upper 32 bits are seconds since January 1, 1900, and the lower 32 bits are fractional seconds (1/2³² second precision). This provides sub-microsecond precision, far exceeding what media synchronization requires.

Receiver Report (RR) and Reception Blocks

The Receiver Report (RR) is sent by participants who are receiving but not sending RTP data. Its structure is similar to SR but without the sender information section—only the SSRC and reception report blocks.\n\nBoth SR and RR contain Reception Report Blocks, one for each source the participant is receiving from. These blocks provide crucial feedback about reception quality:

Reception Report Block Structure

Block Layout

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 SSRC of source being reported                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| fraction lost |       cumulative number of packets lost       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           extended highest sequence number received           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      interarrival jitter                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         last SR (LSR)                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                   delay since last SR (DLSR)                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
Field Descriptions:
SSRC source   : Which sender this report is about
Fraction lost : Packets lost / expected since last report (8-bit)
Cumulative    : Total packets lost since session start (24-bit signed)
Highest seq   : Extended sequence number (with wrap count)
Jitter        : Interarrival jitter in timestamp units
LSR           : Middle 32 bits of last SR's NTP timestamp
DLSR          : Delay from receiving SR to sending this RR

Understanding the feedback metrics:\n\nFraction Lost: An 8-bit value representing packet loss since the last report. A value of 0 means no loss; 255 means 100% loss. This provides real-time loss feedback for adaptive bitrate algorithms.\n\nCumulative Lost: A 24-bit signed value tracking total packets lost since the session began. Negative values are possible if packets are duplicated. This provides overall session quality indication.\n\nInterarrival Jitter: A smoothed estimate of variance in packet arrival times. Higher values indicate less stable network conditions. Receivers use exponential smoothing: J(i) = J(i-1) + (|D(i-j)| - J(i-1))/16\n\nRound-Trip Time calculation:\n\nThe LSR and DLSR fields enable senders to calculate round-trip time to receivers:

Round-Trip Time Calculation

RTT Calculation Using RTCP:
 
Timeline:
t1: Sender sends SR with NTP timestamp = T1
t2: Receiver receives SR, notes LSR = middle 32 bits of T1
t3: Receiver sends RR with:
    - LSR = middle 32 bits of T1
    - DLSR = (t3 - t2) in 1/65536 second units
t4: Sender receives RR at NTP time T4
 
Calculation at sender:
RTT = T4 - LSR - DLSR
    = CurrentNTP - (NTP when SR was sent) - (processing delay at receiver)
 
Example:
- Sender sent SR at NTP 0xB1B2C3C4.D5D6E7E8
- Receiver sends RR with LSR = 0xC3C4D5D6 (middle 32 bits)
- Receiver spent 50ms processing: DLSR = 50ms × 65.536 = 3277
- Sender receives RR at NTP 0xB1B2C3C5.15D6E7E8
- RTT = (0xC3C515D6 - 0xC3C4D5D6 - 3277) / 65536 = ~195ms

Loss Detection Limitations

Reception report loss statistics have inherent delay—they're sent every few seconds and reflect the previous reporting interval. For real-time congestion control, modern systems use RTCP extensions like Transport-wide Congestion Control (TWCC) that provide per-packet feedback at higher frequency.

SDES, BYE, and APP Packets

Beyond SR and RR, RTCP defines three additional standard packet types for session management and extensibility.\n\nSource Description (SDES) - Packet Type 202\n\nSDES packets provide human-readable information about RTP sources. They contain one or more "chunks," each describing a source by its SSRC and a list of descriptive items:

SDES Item Types
Code	Name	Description	Required
1	CNAME	Canonical endpoint identifier—must be unique, persistent	Yes, always
2	NAME	Human-readable name (e.g., "Alice Smith")	No
3	EMAIL	Email address	No
4	PHONE	Phone number	No
5	LOC	Geographic location	No
6	TOOL	Application name and version	No
7	NOTE	Transient message (e.g., "Away from keyboard")	No
8	PRIV	Private extension items	No

CNAME importance:\n\nThe CNAME is the most critical SDES item. Unlike SSRC, which is randomly generated and can change, CNAME is a persistent identifier for an endpoint. It typically follows the format "user@host" and serves to:\n\n- Associate multiple SSRCs from the same endpoint (audio + video)\n- Survive SSRC changes due to collisions or reconnections\n- Enable session accounting and logging\n- Correlate streams for synchronization

SDES Packet Example

Packet Content

SDES Packet containing two sources:
 
Chunk 1 (Video source):
  SSRC: 0x12345678
  CNAME: alice_video@webrtc.example.com
  NAME: Alice's Camera
  TOOL: Chrome/120.0.0.0
 
Chunk 2 (Audio source):  
  SSRC: 0xABCDEF00
  CNAME: alice_audio@webrtc.example.com
  NAME: Alice's Microphone
  TOOL: Chrome/120.0.0.0
 
Receiver uses matching CNAME prefix ("alice_") to know
both SSRCs are from the same participant and can use
SR timestamps from both for lip-sync.

BYE Packet - Packet Type 203\n\nSent when a participant leaves the session or stops a stream. Contains a list of SSRCs being shut down and an optional reason string. Receivers should stop expecting packets from these SSRCs and may free associated resources.\n\nAPP Packet - Packet Type 204\n\nProvides an application-specific extension mechanism. Contains a 4-character ASCII name (identifying the application) and arbitrary application data. Useful for proprietary extensions but not portable across different implementations.

Graceful Departure

Always send BYE packets when leaving a session. Without BYE, other participants must wait for timeout (typically 30+ seconds) before concluding a source is gone. BYE enables immediate cleanup and accurate participant counts.

APP Limitations

APP packets are vendor-specific and not standardized. For interoperable extensions, prefer RTCP XR (Extended Reports, RFC 3611) or defined header extensions. APP is best for closed systems where all endpoints are under your control.

RTCP Timing and Bandwidth Control

RTCP uses a sophisticated algorithm to determine when participants should send reports. This algorithm ensures RTCP never consumes excessive bandwidth, even as session size scales from 2 to thousands of participants.\n\nThe 5% bandwidth rule:\n\nRTCP bandwidth is typically configured to 5% of the session's media bandwidth. For a 2 Mbps video call, RTCP gets 100 kbps. This bandwidth is shared among all participants proportionally:\n\n- 75% allocated to receivers (RRs)\n- 25% allocated to senders (SRs)\n\nReporting interval calculation:\n\nThe interval between RTCP transmissions is calculated based on:\n1. Available RTCP bandwidth\n2. Average RTCP packet size\n3. Number of participants\n4. Whether the participant is a sender or receiver-only

RTCP Interval Calculation

Algorithm

RTCP Transmission Interval Algorithm (RFC 3550):
 
Given:
  - rtcp_bw: RTCP bandwidth in bytes/second
  - avg_rtcp_size: Average RTCP packet size (bytes)
  - n: Number of members in session
  - we_sent: True if we've sent RTP recently
 
Calculate base interval:
  if we_sent:
    n_effective = n * 0.25  // Senders share 25%
  else:
    n_effective = n * 0.75  // Receivers share 75%
    
  base_interval = avg_rtcp_size * n_effective / rtcp_bw
 
Apply minimums and randomization:
  // Never faster than 5 seconds (or 360ms with RR+SR)
  interval = max(base_interval, T_min)
  
  // Randomize 0.5x to 1.5x to prevent synchronization
  interval = interval * random(0.5, 1.5)
  
  // Compensate for randomization over many sends
  interval = interval / 1.21828
 
Example: 50 participants, 2Mbps video, 100 byte average RTCP
  rtcp_bw = 2,000,000 * 0.05 = 100,000 bytes/sec
  For receiver: n_eff = 50 * 0.75 = 37.5
  base = 100 * 37.5 / 100,000 = 0.0375 sec = 37.5ms
  After min: max(0.0375, 5) = 5 seconds
  After random: 5 * random(0.5, 1.5) ≈ 2.5 to 7.5 seconds

RTCP Congestion Implications

In very large sessions (thousands of participants), RTCP intervals can stretch to minutes. This means loss and jitter feedback may be extremely delayed. Large-scale streaming usually limits active RTCP participants or uses alternative feedback mechanisms.

Reduced-size RTCP:\n\nRFC 5506 introduced "reduced-size RTCP," allowing RTCP packets that don't follow the compound packet rules. Instead of always including SR/RR and SDES, systems can send individual feedback packets when only specific feedback is needed. This is crucial for modern congestion control, where feedback must be sent frequently but shouldn't waste bandwidth on redundant information.

Modern RTCP Extensions

The original RTCP specification focused on basic quality reporting. Modern real-time applications require more sophisticated feedback for congestion control, error recovery, and adaptive streaming. Several key extensions have been standardized:\n\nRTCP Feedback (RFC 4585 - "AVPF")\n\nThis profile extends RTCP to support immediate feedback without waiting for the regular RTCP interval. Key feedback types include:

RTCP Feedback Message Types
Type	Name	Purpose	Use Case
NACK	Negative Acknowledgment	Request retransmission of specific packets	Video key frame recovery
PLI	Picture Loss Indication	Request new key frame (entire picture lost)	Video sync recovery
SLI	Slice Loss Indication	Report loss of specific macroblocks	Partial video recovery
RPSI	Reference Picture Selection	Request specific reference frame	Video error resilience
FIR	Full Intra Request	Force key frame immediately	Recording, new participant join
TMMBR	Temporal Maximum Media Bitrate Request	Request sender to limit bitrate	Bandwidth adaptation

Transport-wide Congestion Control (TWCC)\n\nModern WebRTC implementations use TWCC (draft-holmer-rmcat-transport-wide-cc) for precise congestion control. Instead of loss statistics aggregated over several seconds, TWCC provides per-packet acknowledgments:\n\n- Sender includes transport-wide sequence numbers in RTP extension\n- Receiver sends RTCP feedback listing arrival times of recent packets\n- Sender calculates network conditions from detailed timing information\n\nThis enables algorithms like Google Congestion Control (GCC) and Send-side Bandwidth Estimation to react within hundreds of milliseconds to network changes.

TWCC Feedback Cycle

TWCC Flow

Transport-wide Congestion Control (TWCC) Operation:
 
Sender side:
  1. Assign transport-wide sequence number to each RTP packet
  2. Include sequence number in RTP header extension
  3. Track send time for each sequence number
 
RTP Packets:
  [TransportSeq: 100, sent: 0ms]  ─────>
  [TransportSeq: 101, sent: 5ms]  ─────>
  [TransportSeq: 102, sent: 10ms] ─────>  (lost!)
  [TransportSeq: 103, sent: 15ms] ─────>
  [TransportSeq: 104, sent: 20ms] ─────>
 
Receiver side (every ~100ms):
  Send RTCP Transport Feedback:
    Base sequence: 100
    Packet received status:
      100: received at +0ms
      101: received at +8ms (jitter: +3ms)
      102: NOT received
      103: received at +25ms (jitter: +5ms after gap)
      104: received at +28ms
 
Sender analysis:
  - Packet 102 was lost
  - Packets showing increasing delay → congestion building
  - Adjust sending rate accordingly

TWCC vs Traditional RTCP

Traditional RR/SR provides loss percentages every 5+ seconds. TWCC provides per-packet status every 50-100ms. This 50x improvement in feedback granularity enables modern congestion control algorithms to maintain quality during rapid network changes.

RTCP in Practice

Understanding RTCP conceptually is different from implementing and debugging it in real systems. Here are practical considerations for working with RTCP:\n\nRTCP MUX (RFC 5761):\n\nHistorically, RTP used port N and RTCP used port N+1. Modern systems (especially WebRTC) multiplex RTP and RTCP on the same port. The demultiplexing uses the payload type field—values 64-95 in RTP overlap with RTCP packet types, so RTP payload types in this range are avoided, allowing receivers to distinguish packet types.

RTP/RTCP Demultiplexing
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
function demultiplexRtpRtcp(packet: Uint8Array): 'rtp' | 'rtcp' | 'unknown' {
    if (packet.length < 2) return 'unknown';
    
    const firstByte = packet[0];
    const secondByte = packet[1];
    
    // Version must be 2 for both RTP and RTCP
    const version = (firstByte >> 6) & 0x03;
    if (version !== 2) return 'unknown';
    
    // The second byte contains PT for RTP or packet type for RTCP
    // RTCP types: 200 (SR), 201 (RR), 202 (SDES), 203 (BYE), 204 (APP), etc.
    // These correspond to RTP PT 72-79, which are avoided in RTP
    
    const payloadType = secondByte & 0x7F;  // 7 bits for RTP
    const rtcpType = secondByte;            // 8 bits for RTCP
    
    // RTCP packet types are 200-204 standard, 205+ for extensions
    if (rtcpType >= 200 && rtcpType <= 211) {
        return 'rtcp';
    }
    
    // RTP payload types 72-76 are reserved to avoid RTCP collision
    // (200-204 & 0x7F = 72-76)
    // Regular RTP uses 0-34 (static) or 96-127 (dynamic)
    if (payloadType < 64 || payloadType > 95) {
        return 'rtp';
    }
    
    // Ambiguous range - should not occur with proper configuration
    return 'unknown';
}
 
// RTCP packet type constants
const RTCP_TYPE = {
    SR: 200,    // Sender Report
    RR: 201,    // Receiver Report
    SDES: 202,  // Source Description
    BYE: 203,   // Goodbye
    APP: 204,   // Application-specific
    RTPFB: 205, // Transport layer feedback (NACK, TWCC)
    PSFB: 206,  // Payload-specific feedback (PLI, FIR)
    XR: 207,    // Extended Reports
} as const;

Compound packet construction:\n\nWhen building RTCP compound packets, remember these requirements:\n\n1. SR or RR must be first (required)\n2. SDES must be included (at minimum with CNAME)\n3. Total compound size should be reasonable (~128-512 bytes typical)\n4. Padding is added to make the compound packet 32-bit aligned

Debugging RTCP

When debugging RTCP issues, capture packets with Wireshark—it fully decodes all RTCP fields. Common issues: CNAME not matching between streams (breaks lip-sync), incorrect timestamp calculation (wrong RTT), and RTCP not reaching peers (firewall/NAT issues).

Summary: Mastering RTCP Feedback

We've explored RTCP's architecture, packet types, timing mechanisms, and modern extensions that enable real-time communication systems to adapt to network conditions and maintain quality.

Key Takeaways

•RTCP is RTP's control plane — Provides feedback, synchronization, and session management that RTP media transport cannot.
•SR enables synchronization — NTP/RTP timestamp pairs in Sender Reports allow lip-sync and inter-stream coordination.
•Reception blocks provide feedback — Loss, jitter, and round-trip time enable senders to adapt to network conditions.
•SDES identifies participants — CNAME provides persistent identity across SSRC changes; essential for stream correlation.
•Bandwidth is automatically managed — RTCP limits itself to ~5% of session bandwidth, scaling intervals with participant count.
•Modern extensions enable real-time adaptation — TWCC and AVPF provide per-packet feedback for sophisticated congestion control.

What's next:\n\nNow that we understand RTP data transport and RTCP control, we'll explore how these protocols enable multimedia streaming—the architectures, protocols, and techniques that power everything from video conferencing to live streaming platforms.

Page Complete

You now understand how RTCP provides the feedback and synchronization essential for real-time media. This knowledge is critical for implementing adaptive streaming, debugging quality issues, and understanding how modern video conferencing achieves consistent quality.

3 / 5

Loading learning content...

Computer NetworksRTP and RTCP

Real-time Transport Protocol and Control Protocol

LevelIntermediate

Duration60 mins

TopicRTP and RTCP

3 / 5

RTCP Feedback: The Control Plane of Real-time Media

The Hidden Intelligence Behind Smooth Video Calls

What You Will Learn

RTCP Overview and Purpose

RTP vs RTCP Comparison
Characteristic	RTP	RTCP
Purpose	Carry media data	Control and feedback
Typical frequency	50-100 packets/second	Once every 1-5 seconds
Bandwidth target	As needed by media	~5% of media bandwidth
Loss tolerance	Application-managed	Low—important for feedback
Direction	Usually one-way per stream	Bidirectional
Port (traditional)	Even port (e.g., 5004)	Adjacent odd port (5005)
Port (BUNDLE)	Same as RTP	Same as RTP

RTCP Bandwidth Limiting

Sender Report (SR)

RTCP Sender Report Structure

Packet Layout

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|    RC   |   PT=200      |             Length            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         SSRC of sender                        |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|              NTP timestamp (most significant word)            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              NTP timestamp (least significant word)           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         RTP timestamp                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     sender's packet count                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      sender's octet count                     |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|                         Reception Report Blocks...            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
Field Descriptions:
V (2 bits)    : Version (always 2)
P (1 bit)     : Padding flag
RC (5 bits)   : Reception report count (0-31)
PT (8 bits)   : Packet type (200 for SR)
Length (16)   : Length in 32-bit words minus one
SSRC (32)     : Sender's SSRC identifier
NTP TS (64)   : Wall-clock time when report was sent
RTP TS (32)   : RTP timestamp corresponding to NTP time
Packet count  : Total RTP packets sent
Octet count   : Total payload bytes sent

Lip-sync Using SR Timestamps
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
interface SenderReport {
    ssrc: number;
    ntpTimestamp: bigint;  // 64-bit NTP timestamp
    rtpTimestamp: number;  // 32-bit RTP timestamp
    packetCount: number;
    octetCount: number;
}
 
interface StreamSync {
    ntpTimestamp: bigint;
    rtpTimestamp: number;
    clockRate: number;  // e.g., 48000 for audio, 90000 for video
}
 
function calculateSyncOffset(
    audioSync: StreamSync,
    videoSync: StreamSync,
    videoRtpTimestamp: number
): number {
    // Convert video RTP timestamp to NTP time
    const videoRtpOffset = videoRtpTimestamp - videoSync.rtpTimestamp;
    const videoNtpOffset = BigInt(Math.floor(
        (videoRtpOffset / videoSync.clockRate) * 4294967296  // NTP seconds fraction
    ));
    const videoNtpTime = videoSync.ntpTimestamp + videoNtpOffset;
    
    // Calculate corresponding audio RTP timestamp for this NTP time
    const audioNtpOffset = Number(videoNtpTime - audioSync.ntpTimestamp);
    const audioNtpSeconds = audioNtpOffset / 4294967296;
    const audioRtpOffset = Math.floor(audioNtpSeconds * audioSync.clockRate);
    const expectedAudioRtpTs = audioSync.rtpTimestamp + audioRtpOffset;
    
    // Return offset in milliseconds
    return (audioNtpSeconds) * 1000;
}
 
// Example: Ensuring video frame plays when corresponding audio arrives
function synchronizePlayback(videoFrame: { rtpTs: number }, audioSync: StreamSync, videoSync: StreamSync) {
    const syncOffsetMs = calculateSyncOffset(audioSync, videoSync, videoFrame.rtpTs);
    console.log(`Video frame should play ${syncOffsetMs}ms relative to audio sync point`);
}

NTP Timestamp Format

Receiver Report (RR) and Reception Blocks

Reception Report Block Structure

Block Layout

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 SSRC of source being reported                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| fraction lost |       cumulative number of packets lost       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           extended highest sequence number received           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      interarrival jitter                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         last SR (LSR)                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                   delay since last SR (DLSR)                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
Field Descriptions:
SSRC source   : Which sender this report is about
Fraction lost : Packets lost / expected since last report (8-bit)
Cumulative    : Total packets lost since session start (24-bit signed)
Highest seq   : Extended sequence number (with wrap count)
Jitter        : Interarrival jitter in timestamp units
LSR           : Middle 32 bits of last SR's NTP timestamp
DLSR          : Delay from receiving SR to sending this RR

Round-Trip Time Calculation

RTT Calculation Using RTCP:
 
Timeline:
t1: Sender sends SR with NTP timestamp = T1
t2: Receiver receives SR, notes LSR = middle 32 bits of T1
t3: Receiver sends RR with:
    - LSR = middle 32 bits of T1
    - DLSR = (t3 - t2) in 1/65536 second units
t4: Sender receives RR at NTP time T4
 
Calculation at sender:
RTT = T4 - LSR - DLSR
    = CurrentNTP - (NTP when SR was sent) - (processing delay at receiver)
 
Example:
- Sender sent SR at NTP 0xB1B2C3C4.D5D6E7E8
- Receiver sends RR with LSR = 0xC3C4D5D6 (middle 32 bits)
- Receiver spent 50ms processing: DLSR = 50ms × 65.536 = 3277
- Sender receives RR at NTP 0xB1B2C3C5.15D6E7E8
- RTT = (0xC3C515D6 - 0xC3C4D5D6 - 3277) / 65536 = ~195ms

Loss Detection Limitations

SDES, BYE, and APP Packets

SDES Item Types
Code	Name	Description	Required
1	CNAME	Canonical endpoint identifier—must be unique, persistent	Yes, always
2	NAME	Human-readable name (e.g., "Alice Smith")	No
3	EMAIL	Email address	No
4	PHONE	Phone number	No
5	LOC	Geographic location	No
6	TOOL	Application name and version	No
7	NOTE	Transient message (e.g., "Away from keyboard")	No
8	PRIV	Private extension items	No

SDES Packet Example

Packet Content

SDES Packet containing two sources:
 
Chunk 1 (Video source):
  SSRC: 0x12345678
  CNAME: alice_video@webrtc.example.com
  NAME: Alice's Camera
  TOOL: Chrome/120.0.0.0
 
Chunk 2 (Audio source):  
  SSRC: 0xABCDEF00
  CNAME: alice_audio@webrtc.example.com
  NAME: Alice's Microphone
  TOOL: Chrome/120.0.0.0
 
Receiver uses matching CNAME prefix ("alice_") to know
both SSRCs are from the same participant and can use
SR timestamps from both for lip-sync.

Graceful Departure

APP Limitations

RTCP Timing and Bandwidth Control

RTCP Interval Calculation

Algorithm

RTCP Transmission Interval Algorithm (RFC 3550):
 
Given:
  - rtcp_bw: RTCP bandwidth in bytes/second
  - avg_rtcp_size: Average RTCP packet size (bytes)
  - n: Number of members in session
  - we_sent: True if we've sent RTP recently
 
Calculate base interval:
  if we_sent:
    n_effective = n * 0.25  // Senders share 25%
  else:
    n_effective = n * 0.75  // Receivers share 75%
    
  base_interval = avg_rtcp_size * n_effective / rtcp_bw
 
Apply minimums and randomization:
  // Never faster than 5 seconds (or 360ms with RR+SR)
  interval = max(base_interval, T_min)
  
  // Randomize 0.5x to 1.5x to prevent synchronization
  interval = interval * random(0.5, 1.5)
  
  // Compensate for randomization over many sends
  interval = interval / 1.21828
 
Example: 50 participants, 2Mbps video, 100 byte average RTCP
  rtcp_bw = 2,000,000 * 0.05 = 100,000 bytes/sec
  For receiver: n_eff = 50 * 0.75 = 37.5
  base = 100 * 37.5 / 100,000 = 0.0375 sec = 37.5ms
  After min: max(0.0375, 5) = 5 seconds
  After random: 5 * random(0.5, 1.5) ≈ 2.5 to 7.5 seconds

RTCP Congestion Implications

Modern RTCP Extensions

RTCP Feedback Message Types
Type	Name	Purpose	Use Case
NACK	Negative Acknowledgment	Request retransmission of specific packets	Video key frame recovery
PLI	Picture Loss Indication	Request new key frame (entire picture lost)	Video sync recovery
SLI	Slice Loss Indication	Report loss of specific macroblocks	Partial video recovery
RPSI	Reference Picture Selection	Request specific reference frame	Video error resilience
FIR	Full Intra Request	Force key frame immediately	Recording, new participant join
TMMBR	Temporal Maximum Media Bitrate Request	Request sender to limit bitrate	Bandwidth adaptation

TWCC Feedback Cycle

TWCC Flow

Transport-wide Congestion Control (TWCC) Operation:
 
Sender side:
  1. Assign transport-wide sequence number to each RTP packet
  2. Include sequence number in RTP header extension
  3. Track send time for each sequence number
 
RTP Packets:
  [TransportSeq: 100, sent: 0ms]  ─────>
  [TransportSeq: 101, sent: 5ms]  ─────>
  [TransportSeq: 102, sent: 10ms] ─────>  (lost!)
  [TransportSeq: 103, sent: 15ms] ─────>
  [TransportSeq: 104, sent: 20ms] ─────>
 
Receiver side (every ~100ms):
  Send RTCP Transport Feedback:
    Base sequence: 100
    Packet received status:
      100: received at +0ms
      101: received at +8ms (jitter: +3ms)
      102: NOT received
      103: received at +25ms (jitter: +5ms after gap)
      104: received at +28ms
 
Sender analysis:
  - Packet 102 was lost
  - Packets showing increasing delay → congestion building
  - Adjust sending rate accordingly

TWCC vs Traditional RTCP

RTCP in Practice

RTP/RTCP Demultiplexing
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
function demultiplexRtpRtcp(packet: Uint8Array): 'rtp' | 'rtcp' | 'unknown' {
    if (packet.length < 2) return 'unknown';
    
    const firstByte = packet[0];
    const secondByte = packet[1];
    
    // Version must be 2 for both RTP and RTCP
    const version = (firstByte >> 6) & 0x03;
    if (version !== 2) return 'unknown';
    
    // The second byte contains PT for RTP or packet type for RTCP
    // RTCP types: 200 (SR), 201 (RR), 202 (SDES), 203 (BYE), 204 (APP), etc.
    // These correspond to RTP PT 72-79, which are avoided in RTP
    
    const payloadType = secondByte & 0x7F;  // 7 bits for RTP
    const rtcpType = secondByte;            // 8 bits for RTCP
    
    // RTCP packet types are 200-204 standard, 205+ for extensions
    if (rtcpType >= 200 && rtcpType <= 211) {
        return 'rtcp';
    }
    
    // RTP payload types 72-76 are reserved to avoid RTCP collision
    // (200-204 & 0x7F = 72-76)
    // Regular RTP uses 0-34 (static) or 96-127 (dynamic)
    if (payloadType < 64 || payloadType > 95) {
        return 'rtp';
    }
    
    // Ambiguous range - should not occur with proper configuration
    return 'unknown';
}
 
// RTCP packet type constants
const RTCP_TYPE = {
    SR: 200,    // Sender Report
    RR: 201,    // Receiver Report
    SDES: 202,  // Source Description
    BYE: 203,   // Goodbye
    APP: 204,   // Application-specific
    RTPFB: 205, // Transport layer feedback (NACK, TWCC)
    PSFB: 206,  // Payload-specific feedback (PLI, FIR)
    XR: 207,    // Extended Reports
} as const;

Debugging RTCP

Summary: Mastering RTCP Feedback

We've explored RTCP's architecture, packet types, timing mechanisms, and modern extensions that enable real-time communication systems to adapt to network conditions and maintain quality.

Key Takeaways

•RTCP is RTP's control plane — Provides feedback, synchronization, and session management that RTP media transport cannot.
•SR enables synchronization — NTP/RTP timestamp pairs in Sender Reports allow lip-sync and inter-stream coordination.
•Reception blocks provide feedback — Loss, jitter, and round-trip time enable senders to adapt to network conditions.
•SDES identifies participants — CNAME provides persistent identity across SSRC changes; essential for stream correlation.
•Bandwidth is automatically managed — RTCP limits itself to ~5% of session bandwidth, scaling intervals with participant count.
•Modern extensions enable real-time adaptation — TWCC and AVPF provide per-packet feedback for sophisticated congestion control.

Page Complete

3 / 5