Loading learning content...
RTP moves media packets across networks, but media transport alone isn't sufficient for high-quality real-time communication. Consider these questions that RTP alone cannot answer:\n\n- How does a sender know if receivers are experiencing packet loss?\n- How do audio and video streams from the same source stay synchronized?\n- How do participants know who else is in the session?\n- How can senders adapt their bitrate to network conditions?\n- How do applications report viewing statistics and quality metrics?\n\nThe Real-time Transport Control Protocol (RTCP), defined alongside RTP in RFC 3550, provides these essential control functions. While RTP is the 'data plane' moving media, RTCP is the 'control plane' providing feedback, synchronization, and session management.\n\nThis page explores RTCP's architecture, packet types, and mechanisms that enable modern real-time applications to deliver consistent quality despite network variability.
By the end of this page, you will understand RTCP's role in RTP sessions, the five standard packet types (SR, RR, SDES, BYE, APP), how senders and receivers use RTCP for feedback and synchronization, and modern extensions like RTCP-based congestion control.
RTCP works alongside RTP to provide out-of-band control and statistics for RTP media flows. While RTP packets carry actual media data at high frequency (typically 50 packets/second for audio), RTCP packets are sent much less frequently—typically every few seconds—to minimize bandwidth overhead.\n\nRTCP's primary functions:\n\n1. Quality feedback: Receivers report packet loss, jitter, and delay statistics to senders\n2. Synchronization: Provides mapping between RTP timestamps and wall-clock time\n3. Identification: Associates SSRCs with human-readable names and metadata\n4. Session management: Signals participant arrivals and departures\n5. Application-specific data: Extensible mechanism for custom control information
| Characteristic | RTP | RTCP |
|---|---|---|
| Purpose | Carry media data | Control and feedback |
| Typical frequency | 50-100 packets/second | Once every 1-5 seconds |
| Bandwidth target | As needed by media | ~5% of media bandwidth |
| Loss tolerance | Application-managed | Low—important for feedback |
| Direction | Usually one-way per stream | Bidirectional |
| Port (traditional) | Even port (e.g., 5004) | Adjacent odd port (5005) |
| Port (BUNDLE) | Same as RTP | Same as RTP |
RTCP automatically limits itself to approximately 5% of the session's media bandwidth. In a 1 Mbps video session, RTCP uses about 50 kbps. This ensures control overhead never overwhelms the actual media, regardless of how many participants join the session.
Compound RTCP packets:\n\nRTCP packets are always sent in "compound" form—multiple RTCP packet types combined into a single UDP datagram. RFC 3550 mandates specific ordering:\n\n1. SR or RR first: Sender Report if the participant is sending, Receiver Report otherwise\n2. SDES second: Source Description (at minimum containing CNAME)\n3. Other packets: BYE, APP, or extension packets as needed\n\nThis compound structure ensures every RTCP transmission provides basic statistics and identification, even when the primary purpose is something else (like signaling departure via BYE).
The Sender Report (SR) is sent by participants who are actively sending RTP data. It contains two critical sections: sender information and reception reports for any streams this sender is receiving.\n\nSR structure and fields:
12345678910111213141516171819202122232425262728293031
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|V=2|P| RC | PT=200 | Length |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| SSRC of sender |+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+| NTP timestamp (most significant word) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| NTP timestamp (least significant word) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| RTP timestamp |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| sender's packet count |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| sender's octet count |+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+| Reception Report Blocks... |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Field Descriptions:V (2 bits) : Version (always 2)P (1 bit) : Padding flagRC (5 bits) : Reception report count (0-31)PT (8 bits) : Packet type (200 for SR)Length (16) : Length in 32-bit words minus oneSSRC (32) : Sender's SSRC identifierNTP TS (64) : Wall-clock time when report was sentRTP TS (32) : RTP timestamp corresponding to NTP timePacket count : Total RTP packets sentOctet count : Total payload bytes sentThe NTP/RTP timestamp pair:\n\nThe most critical information in an SR is the relationship between the NTP timestamp (wall-clock time) and the RTP timestamp at that moment. This mapping is essential for:\n\n1. Lip sync: Synchronizing audio and video streams from the same source\n2. Inter-stream sync: Aligning streams from different sources in the same conference\n3. Playout timing: Converting RTP timestamps to actual playback times\n\nWithout this mapping, a receiver has no way to know the absolute timing of RTP packets—only relative timing within a single stream.
1234567891011121314151617181920212223242526272829303132333435363738394041
interface SenderReport { ssrc: number; ntpTimestamp: bigint; // 64-bit NTP timestamp rtpTimestamp: number; // 32-bit RTP timestamp packetCount: number; octetCount: number;} interface StreamSync { ntpTimestamp: bigint; rtpTimestamp: number; clockRate: number; // e.g., 48000 for audio, 90000 for video} function calculateSyncOffset( audioSync: StreamSync, videoSync: StreamSync, videoRtpTimestamp: number): number { // Convert video RTP timestamp to NTP time const videoRtpOffset = videoRtpTimestamp - videoSync.rtpTimestamp; const videoNtpOffset = BigInt(Math.floor( (videoRtpOffset / videoSync.clockRate) * 4294967296 // NTP seconds fraction )); const videoNtpTime = videoSync.ntpTimestamp + videoNtpOffset; // Calculate corresponding audio RTP timestamp for this NTP time const audioNtpOffset = Number(videoNtpTime - audioSync.ntpTimestamp); const audioNtpSeconds = audioNtpOffset / 4294967296; const audioRtpOffset = Math.floor(audioNtpSeconds * audioSync.clockRate); const expectedAudioRtpTs = audioSync.rtpTimestamp + audioRtpOffset; // Return offset in milliseconds return (audioNtpSeconds) * 1000;} // Example: Ensuring video frame plays when corresponding audio arrivesfunction synchronizePlayback(videoFrame: { rtpTs: number }, audioSync: StreamSync, videoSync: StreamSync) { const syncOffsetMs = calculateSyncOffset(audioSync, videoSync, videoFrame.rtpTs); console.log(`Video frame should play ${syncOffsetMs}ms relative to audio sync point`);}NTP timestamps are 64 bits: the upper 32 bits are seconds since January 1, 1900, and the lower 32 bits are fractional seconds (1/2³² second precision). This provides sub-microsecond precision, far exceeding what media synchronization requires.
The Receiver Report (RR) is sent by participants who are receiving but not sending RTP data. Its structure is similar to SR but without the sender information section—only the SSRC and reception report blocks.\n\nBoth SR and RR contain Reception Report Blocks, one for each source the participant is receiving from. These blocks provide crucial feedback about reception quality:
123456789101112131415161718192021222324
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| SSRC of source being reported |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| fraction lost | cumulative number of packets lost |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| extended highest sequence number received |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| interarrival jitter |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| last SR (LSR) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| delay since last SR (DLSR) |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Field Descriptions:SSRC source : Which sender this report is aboutFraction lost : Packets lost / expected since last report (8-bit)Cumulative : Total packets lost since session start (24-bit signed)Highest seq : Extended sequence number (with wrap count)Jitter : Interarrival jitter in timestamp unitsLSR : Middle 32 bits of last SR's NTP timestampDLSR : Delay from receiving SR to sending this RRUnderstanding the feedback metrics:\n\nFraction Lost: An 8-bit value representing packet loss since the last report. A value of 0 means no loss; 255 means 100% loss. This provides real-time loss feedback for adaptive bitrate algorithms.\n\nCumulative Lost: A 24-bit signed value tracking total packets lost since the session began. Negative values are possible if packets are duplicated. This provides overall session quality indication.\n\nInterarrival Jitter: A smoothed estimate of variance in packet arrival times. Higher values indicate less stable network conditions. Receivers use exponential smoothing: J(i) = J(i-1) + (|D(i-j)| - J(i-1))/16\n\nRound-Trip Time calculation:\n\nThe LSR and DLSR fields enable senders to calculate round-trip time to receivers:
1234567891011121314151617181920
RTT Calculation Using RTCP: Timeline:t1: Sender sends SR with NTP timestamp = T1t2: Receiver receives SR, notes LSR = middle 32 bits of T1t3: Receiver sends RR with: - LSR = middle 32 bits of T1 - DLSR = (t3 - t2) in 1/65536 second unitst4: Sender receives RR at NTP time T4 Calculation at sender:RTT = T4 - LSR - DLSR = CurrentNTP - (NTP when SR was sent) - (processing delay at receiver) Example:- Sender sent SR at NTP 0xB1B2C3C4.D5D6E7E8- Receiver sends RR with LSR = 0xC3C4D5D6 (middle 32 bits)- Receiver spent 50ms processing: DLSR = 50ms × 65.536 = 3277- Sender receives RR at NTP 0xB1B2C3C5.15D6E7E8- RTT = (0xC3C515D6 - 0xC3C4D5D6 - 3277) / 65536 = ~195msReception report loss statistics have inherent delay—they're sent every few seconds and reflect the previous reporting interval. For real-time congestion control, modern systems use RTCP extensions like Transport-wide Congestion Control (TWCC) that provide per-packet feedback at higher frequency.
Beyond SR and RR, RTCP defines three additional standard packet types for session management and extensibility.\n\nSource Description (SDES) - Packet Type 202\n\nSDES packets provide human-readable information about RTP sources. They contain one or more "chunks," each describing a source by its SSRC and a list of descriptive items:
| Code | Name | Description | Required |
|---|---|---|---|
| 1 | CNAME | Canonical endpoint identifier—must be unique, persistent | Yes, always |
| 2 | NAME | Human-readable name (e.g., "Alice Smith") | No |
| 3 | Email address | No | |
| 4 | PHONE | Phone number | No |
| 5 | LOC | Geographic location | No |
| 6 | TOOL | Application name and version | No |
| 7 | NOTE | Transient message (e.g., "Away from keyboard") | No |
| 8 | PRIV | Private extension items | No |
CNAME importance:\n\nThe CNAME is the most critical SDES item. Unlike SSRC, which is randomly generated and can change, CNAME is a persistent identifier for an endpoint. It typically follows the format "user@host" and serves to:\n\n- Associate multiple SSRCs from the same endpoint (audio + video)\n- Survive SSRC changes due to collisions or reconnections\n- Enable session accounting and logging\n- Correlate streams for synchronization
1234567891011121314151617
SDES Packet containing two sources: Chunk 1 (Video source): SSRC: 0x12345678 CNAME: alice_video@webrtc.example.com NAME: Alice's Camera TOOL: Chrome/120.0.0.0 Chunk 2 (Audio source): SSRC: 0xABCDEF00 CNAME: alice_audio@webrtc.example.com NAME: Alice's Microphone TOOL: Chrome/120.0.0.0 Receiver uses matching CNAME prefix ("alice_") to knowboth SSRCs are from the same participant and can useSR timestamps from both for lip-sync.BYE Packet - Packet Type 203\n\nSent when a participant leaves the session or stops a stream. Contains a list of SSRCs being shut down and an optional reason string. Receivers should stop expecting packets from these SSRCs and may free associated resources.\n\nAPP Packet - Packet Type 204\n\nProvides an application-specific extension mechanism. Contains a 4-character ASCII name (identifying the application) and arbitrary application data. Useful for proprietary extensions but not portable across different implementations.
Always send BYE packets when leaving a session. Without BYE, other participants must wait for timeout (typically 30+ seconds) before concluding a source is gone. BYE enables immediate cleanup and accurate participant counts.
APP packets are vendor-specific and not standardized. For interoperable extensions, prefer RTCP XR (Extended Reports, RFC 3611) or defined header extensions. APP is best for closed systems where all endpoints are under your control.
RTCP uses a sophisticated algorithm to determine when participants should send reports. This algorithm ensures RTCP never consumes excessive bandwidth, even as session size scales from 2 to thousands of participants.\n\nThe 5% bandwidth rule:\n\nRTCP bandwidth is typically configured to 5% of the session's media bandwidth. For a 2 Mbps video call, RTCP gets 100 kbps. This bandwidth is shared among all participants proportionally:\n\n- 75% allocated to receivers (RRs)\n- 25% allocated to senders (SRs)\n\nReporting interval calculation:\n\nThe interval between RTCP transmissions is calculated based on:\n1. Available RTCP bandwidth\n2. Average RTCP packet size\n3. Number of participants\n4. Whether the participant is a sender or receiver-only
1234567891011121314151617181920212223242526272829303132
RTCP Transmission Interval Algorithm (RFC 3550): Given: - rtcp_bw: RTCP bandwidth in bytes/second - avg_rtcp_size: Average RTCP packet size (bytes) - n: Number of members in session - we_sent: True if we've sent RTP recently Calculate base interval: if we_sent: n_effective = n * 0.25 // Senders share 25% else: n_effective = n * 0.75 // Receivers share 75% base_interval = avg_rtcp_size * n_effective / rtcp_bw Apply minimums and randomization: // Never faster than 5 seconds (or 360ms with RR+SR) interval = max(base_interval, T_min) // Randomize 0.5x to 1.5x to prevent synchronization interval = interval * random(0.5, 1.5) // Compensate for randomization over many sends interval = interval / 1.21828 Example: 50 participants, 2Mbps video, 100 byte average RTCP rtcp_bw = 2,000,000 * 0.05 = 100,000 bytes/sec For receiver: n_eff = 50 * 0.75 = 37.5 base = 100 * 37.5 / 100,000 = 0.0375 sec = 37.5ms After min: max(0.0375, 5) = 5 seconds After random: 5 * random(0.5, 1.5) ≈ 2.5 to 7.5 secondsIn very large sessions (thousands of participants), RTCP intervals can stretch to minutes. This means loss and jitter feedback may be extremely delayed. Large-scale streaming usually limits active RTCP participants or uses alternative feedback mechanisms.
Reduced-size RTCP:\n\nRFC 5506 introduced "reduced-size RTCP," allowing RTCP packets that don't follow the compound packet rules. Instead of always including SR/RR and SDES, systems can send individual feedback packets when only specific feedback is needed. This is crucial for modern congestion control, where feedback must be sent frequently but shouldn't waste bandwidth on redundant information.
The original RTCP specification focused on basic quality reporting. Modern real-time applications require more sophisticated feedback for congestion control, error recovery, and adaptive streaming. Several key extensions have been standardized:\n\nRTCP Feedback (RFC 4585 - "AVPF")\n\nThis profile extends RTCP to support immediate feedback without waiting for the regular RTCP interval. Key feedback types include:
| Type | Name | Purpose | Use Case |
|---|---|---|---|
| NACK | Negative Acknowledgment | Request retransmission of specific packets | Video key frame recovery |
| PLI | Picture Loss Indication | Request new key frame (entire picture lost) | Video sync recovery |
| SLI | Slice Loss Indication | Report loss of specific macroblocks | Partial video recovery |
| RPSI | Reference Picture Selection | Request specific reference frame | Video error resilience |
| FIR | Full Intra Request | Force key frame immediately | Recording, new participant join |
| TMMBR | Temporal Maximum Media Bitrate Request | Request sender to limit bitrate | Bandwidth adaptation |
Transport-wide Congestion Control (TWCC)\n\nModern WebRTC implementations use TWCC (draft-holmer-rmcat-transport-wide-cc) for precise congestion control. Instead of loss statistics aggregated over several seconds, TWCC provides per-packet acknowledgments:\n\n- Sender includes transport-wide sequence numbers in RTP extension\n- Receiver sends RTCP feedback listing arrival times of recent packets\n- Sender calculates network conditions from detailed timing information\n\nThis enables algorithms like Google Congestion Control (GCC) and Send-side Bandwidth Estimation to react within hundreds of milliseconds to network changes.
12345678910111213141516171819202122232425262728
Transport-wide Congestion Control (TWCC) Operation: Sender side: 1. Assign transport-wide sequence number to each RTP packet 2. Include sequence number in RTP header extension 3. Track send time for each sequence number RTP Packets: [TransportSeq: 100, sent: 0ms] ─────> [TransportSeq: 101, sent: 5ms] ─────> [TransportSeq: 102, sent: 10ms] ─────> (lost!) [TransportSeq: 103, sent: 15ms] ─────> [TransportSeq: 104, sent: 20ms] ─────> Receiver side (every ~100ms): Send RTCP Transport Feedback: Base sequence: 100 Packet received status: 100: received at +0ms 101: received at +8ms (jitter: +3ms) 102: NOT received 103: received at +25ms (jitter: +5ms after gap) 104: received at +28ms Sender analysis: - Packet 102 was lost - Packets showing increasing delay → congestion building - Adjust sending rate accordinglyTraditional RR/SR provides loss percentages every 5+ seconds. TWCC provides per-packet status every 50-100ms. This 50x improvement in feedback granularity enables modern congestion control algorithms to maintain quality during rapid network changes.
Understanding RTCP conceptually is different from implementing and debugging it in real systems. Here are practical considerations for working with RTCP:\n\nRTCP MUX (RFC 5761):\n\nHistorically, RTP used port N and RTCP used port N+1. Modern systems (especially WebRTC) multiplex RTP and RTCP on the same port. The demultiplexing uses the payload type field—values 64-95 in RTP overlap with RTCP packet types, so RTP payload types in this range are avoided, allowing receivers to distinguish packet types.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
function demultiplexRtpRtcp(packet: Uint8Array): 'rtp' | 'rtcp' | 'unknown' { if (packet.length < 2) return 'unknown'; const firstByte = packet[0]; const secondByte = packet[1]; // Version must be 2 for both RTP and RTCP const version = (firstByte >> 6) & 0x03; if (version !== 2) return 'unknown'; // The second byte contains PT for RTP or packet type for RTCP // RTCP types: 200 (SR), 201 (RR), 202 (SDES), 203 (BYE), 204 (APP), etc. // These correspond to RTP PT 72-79, which are avoided in RTP const payloadType = secondByte & 0x7F; // 7 bits for RTP const rtcpType = secondByte; // 8 bits for RTCP // RTCP packet types are 200-204 standard, 205+ for extensions if (rtcpType >= 200 && rtcpType <= 211) { return 'rtcp'; } // RTP payload types 72-76 are reserved to avoid RTCP collision // (200-204 & 0x7F = 72-76) // Regular RTP uses 0-34 (static) or 96-127 (dynamic) if (payloadType < 64 || payloadType > 95) { return 'rtp'; } // Ambiguous range - should not occur with proper configuration return 'unknown';} // RTCP packet type constantsconst RTCP_TYPE = { SR: 200, // Sender Report RR: 201, // Receiver Report SDES: 202, // Source Description BYE: 203, // Goodbye APP: 204, // Application-specific RTPFB: 205, // Transport layer feedback (NACK, TWCC) PSFB: 206, // Payload-specific feedback (PLI, FIR) XR: 207, // Extended Reports} as const;Compound packet construction:\n\nWhen building RTCP compound packets, remember these requirements:\n\n1. SR or RR must be first (required)\n2. SDES must be included (at minimum with CNAME)\n3. Total compound size should be reasonable (~128-512 bytes typical)\n4. Padding is added to make the compound packet 32-bit aligned
When debugging RTCP issues, capture packets with Wireshark—it fully decodes all RTCP fields. Common issues: CNAME not matching between streams (breaks lip-sync), incorrect timestamp calculation (wrong RTT), and RTCP not reaching peers (firewall/NAT issues).
We've explored RTCP's architecture, packet types, timing mechanisms, and modern extensions that enable real-time communication systems to adapt to network conditions and maintain quality.
What's next:\n\nNow that we understand RTP data transport and RTCP control, we'll explore how these protocols enable multimedia streaming—the architectures, protocols, and techniques that power everything from video conferencing to live streaming platforms.
You now understand how RTCP provides the feedback and synchronization essential for real-time media. This knowledge is critical for implementing adaptive streaming, debugging quality issues, and understanding how modern video conferencing achieves consistent quality.