Computer NetworksData Link Layer

Framing: Byte Stuffing

LevelIntermediate

Duration75 mins

TopicData Link Layer

2 / 5

Escape Sequences: The Key to Data Transparency

The Data Transparency Challenge

In the previous page, we established that flag bytes (0x7E) provide an elegant solution for frame delimitation. But this solution introduces a critical problem: What happens when the payload data itself contains the flag byte value?

Imagine transmitting a compressed binary file, an encrypted message, or raw image data. These data types can contain any arbitrary byte sequence—including 0x7E. If the receiver encounters 0x7E in the middle of a payload, it will incorrectly interpret this as a frame boundary, causing catastrophic parsing errors.

This is the data transparency problem, and its solution—escape sequences—represents one of the most elegant patterns in protocol design. The term "transparency" here means the ability to transmit arbitrary data transparently, without the data being misinterpreted as protocol control information.

What You Will Master

By the end of this page, you will understand the theoretical foundations of escape sequences, why they are necessary for data transparency, how they are designed and implemented, the relationship between escape characters and the byte stuffing process, and the tradeoffs inherent in different escape sequence designs.

The Transparency Problem in Depth

To truly understand escape sequences, we must first fully appreciate the severity of the transparency problem.

Scenario Analysis:

Consider a PPP frame carrying an IP packet. The frame structure is:

[0x7E] [Address] [Control] [Protocol] [IP Packet Data] [FCS] [0x7E]

Now suppose the IP packet payload (perhaps containing compressed web content) includes the following byte sequence:

... 0x45 0x00 0x7E 0x12 0x34 ...
               ↑
               Contains flag byte value!

Without Escape Handling:

The receiver's state machine would process this as:

[0x7E] → Begin frame
[Address, Control, Protocol] → Header received
[0x45, 0x00] → Payload bytes accumulated
[0x7E] → END OF FRAME DETECTED!
[0x12, 0x34, ...] → Beginning of "next frame" (?!)

The receiver now believes it has received a complete (but corrupted) frame, and potentially starts parsing random data as the header of a new frame. The consequences cascade:

The current frame is truncated and will fail CRC validation
Subsequent bytes are misinterpreted as frame headers
Synchronization is lost until the next actual flag happens to resynchronize
Multiple frames may be corrupted before recovery

Impact Without Transparency

Without data transparency mechanisms, any protocol that can carry arbitrary binary data would fail catastrophically whenever the data happened to contain reserved byte values. This would make the protocol useless for general-purpose data transmission—which is unacceptable for a general data link layer protocol.

Probability Analysis:

How often would this problem occur? For truly random data:

Probability of any single byte being 0x7E: 1/256 ≈ 0.39%
For a 1000-byte payload: Probability of at least one 0x7E ≈ 1 - (255/256)^1000 ≈ 98%

This means nearly every large frame would be corrupted! Even for small frames:

100-byte payload: ~32% chance of containing 0x7E
500-byte payload: ~86% chance of containing 0x7E

Conclusion: Data transparency is not optional; it is essential for any protocol that transmits arbitrary data.

Historical Note:

Early character-oriented protocols like BISYNC (Binary Synchronous Communication) used printable ASCII characters for data and reserved non-printable control characters for protocol functions. This worked for text but failed for binary data. The escape sequence mechanism was developed specifically to enable binary-transparent communications.

The Escape Sequence Concept

An escape sequence is a multi-byte code that represents a single byte value that cannot appear directly in the data stream. The concept is borrowed from many domains—programming languages use escape sequences like \n for newlines, and communication protocols use them to represent reserved bytes.

Basic Principle:

Define an escape character (also called escape byte or control escape) that signals: "The next byte should be interpreted specially, not literally."

When the transmitter encounters a byte that cannot appear directly in the frame, it replaces that byte with a two-byte escape sequence:

Original byte → [ESC] [Modified byte]

The receiver, upon seeing the escape character, knows to interpret the next byte specially and reconstruct the original value.

The PPP Escape Scheme:

PPP uses the following escape encoding:

Escape byte: 0x7D
Transformation: XOR the escaped byte with 0x20

This produces the following mappings:

PPP Escape Sequence Mappings
Original Byte	Hexadecimal	Escaped Sequence	Explanation
Flag (0x7E)	01111110	0x7D 0x5E	0x7E XOR 0x20 = 0x5E
Escape (0x7D)	01111101	0x7D 0x5D	0x7D XOR 0x20 = 0x5D
Control chars (0x00-0x1F)	Various	0x7D [byte XOR 0x20]	ASCII control characters
DEL (0x7F)	01111111	0x7D 0x5F	0x7F XOR 0x20 = 0x5F

Why XOR with 0x20?

The choice of 0x20 for the XOR transformation is deliberate:

Single-bit toggle: 0x20 = 0b00100000 toggles only bit 5, a simple operation
Invertibility: XOR is its own inverse: (A XOR 0x20) XOR 0x20 = A
Range transformation: For control characters (0x00-0x1F), XOR 0x20 produces printable ASCII characters (0x20-0x3F), which aids debugging
Distinctiveness: The resulting bytes (0x5E, 0x5D, etc.) are clearly different from the originals

Mathematical Properties:

The escape encoding maintains several important properties:

Bijective (one-to-one): Each original byte maps to exactly one escape sequence
Reversible: The receiver can unambiguously reconstruct the original byte
Self-contained: No additional length information is needed
Bounded expansion: Maximum 2x expansion (if every byte needs escaping)

Escape Sequence Design Principle

A well-designed escape scheme uses a transformation that is simple to compute, easy to verify, and produces clearly distinguishable escaped values. The XOR-based approach in PPP exemplifies these principles while remaining efficient for implementation in both hardware and software.

Why Escape the Escape Character?

One subtlety that often confuses newcomers: the escape character itself (0x7D) must also be escaped. This might seem like infinite recursion, but it's essential for correctness.

The Problem Without Escaping ESC:

Suppose we only escape the flag byte (0x7E → 0x7D 0x5E) but allow 0x7D to appear literally in data.

Now consider data that naturally contains: ... 0x7D 0x5E ...

The receiver would see:

0x7D → "Escape sequence starting!"
0x5E → "Modified byte, XOR to recover original"
Result: Receiver outputs 0x5E XOR 0x20 = 0x7E

But the sender meant to transmit the literal bytes 0x7D 0x5E! The receiver has incorrectly produced a single byte 0x7E instead of the two bytes 0x7D and 0x5E.

The Solution:

The escape character itself must be escaped:

Original 0x7D → Escaped to 0x7D 0x5D
Original 0x7E → Escaped to 0x7D 0x5E

Now the receiver can always distinguish:

0x7D 0x5D → Literal 0x7D
0x7D 0x5E → Literal 0x7E
Literal 0x7D 0x5E in payload → Transmitted as 0x7D 0x5D 0x5E

Without ESC Escaping

•Data 7D 5E → Transmitted as 7D 5E
•Receiver sees 7D 5E → Interprets as escape sequence
•Receiver outputs: 7E (wrong!)
•Result: Data corruption

With ESC Escaping

•Data 7D 5E → Transmitted as 7D 5D 5E
•Receiver sees 7D 5D → Outputs 7D
•Receiver sees 5E → Outputs 5E
•Result: Correct data received

General Principle:

In any escape sequence scheme, the escape character itself must be escapable. This creates a complete and unambiguous encoding:

Rule 1: Reserved bytes are replaced with [ESC][modified]
Rule 2: ESC byte is replaced with [ESC][ESC-modified]
Rule 3: All other bytes pass through unchanged

This three-rule system handles all cases:

Original: 0x01 0x7E 0x02 0x7D 0x03 0x7E 0x7D 0x04
Encoded:  0x01 [7D 5E] 0x02 [7D 5D] 0x03 [7D 5E] [7D 5D] 0x04
          ↑    └──┬──┘  ↑   └──┬──┘  ↑   └──┬──┘ └──┬──┘  ↑
          pass  flag   pass  esc   pass  flag    esc   pass

Recursion Avoidance

Note that escaping the escape character does not create infinite recursion. The modified version of ESC (0x5D) is not itself a reserved byte, so it requires no further escaping. The scheme terminates in exactly one expansion.

The PPP Async Control Character Map

PPP defines an extended set of characters that may need escaping beyond just the flag and escape bytes. This is controlled by the Async Control Character Map (ACCM), negotiated during PPP's Link Control Protocol (LCP) phase.

Why Escape Additional Characters?

Certain byte values can interfere with the underlying serial communication hardware or software:

0x00 (NUL): Some systems interpret this as string terminator
0x03 (ETX): Can trigger buffer flush on some terminals
0x11 (XON), 0x13 (XOFF): Software flow control signals
0x1B (ESC): Escape to terminal command mode
0x7F (DEL): Delete character, often handled specially

If these bytes appear in frame data and pass through unescaped, the serial port driver or modem might misinterpret them, causing communication problems independent of the PPP protocol itself.

The ACCM Mechanism:

The ACCM is a 32-bit bitmap where each bit corresponds to a control character (0x00-0x1F):

ACCM bit 0  → Escape 0x00 if set
ACCM bit 1  → Escape 0x01 if set
ACCM bit 2  → Escape 0x02 if set
...
ACCM bit 31 → Escape 0x1F if set

The default ACCM is 0xFFFFFFFF, meaning all 32 control characters are escaped. During LCP negotiation, peers can agree to escape fewer characters for better efficiency.

Common Control Characters Requiring Escaping
Byte	Name	Reason for Escaping
0x00	NUL	String terminator in C-style implementations
0x03	ETX	End of Text; may trigger buffer operations
0x11	XON (DC1)	Software flow control: resume transmission
0x13	XOFF (DC3)	Software flow control: pause transmission
0x1B	ESC	Terminal escape sequence initiator
0x7D	PPP ESC	Protocol escape character (always escaped)
0x7E	PPP FLAG	Protocol frame delimiter (always escaped)
0x7F	DEL	Delete character; special handling on many systems

ACCM Negotiation Example:

During LCP negotiation, the two peers exchange ACCM options:

Peer A → Peer B: LCP Configure-Request with ACCM = 0x000A0000
                 (Escape only XON/XOFF: bits 17 and 19)

Peer B → Peer A: LCP Configure-Ack (accepting the proposed ACCM)

After negotiation, both peers know that only characters 0x11 (XON) and 0x13 (XOFF), plus the mandatory 0x7D and 0x7E, need escaping. All other control characters can be transmitted literally.

Efficiency Implications:

Reducing the escaped character set improves efficiency:

Default ACCM (0xFFFFFFFF): 34 characters escaped (32 control + ESC + FLAG)
Minimal ACCM (0x00000000): 2 characters escaped (ESC + FLAG only)

For binary data with uniform distribution:

Default: ~13% of bytes need escaping
Minimal: ~0.8% of bytes need escaping

The efficiency gain is significant for long-running connections over reliable media.

Asymmetric ACCM

The ACCM is negotiated independently in each direction. Peer A might request a minimal ACCM for data it receives, while Peer B might require a full ACCM due to its serial hardware. This asymmetry allows each peer to protect against its own system's vulnerabilities without imposing overhead on the other direction.

Sender Escape Processing

Implementing escape sequence insertion on the transmitter side requires a careful byte-by-byte processing algorithm. Let's develop a complete, production-quality implementation.

Algorithm Outline:

For each byte in the original frame:
  1. Check if the byte requires escaping (based on ACCM or reserved values)
  2. If escaping required:
     a. Emit the escape byte (0x7D)
     b. Emit the byte XOR 0x20
  3. If no escaping required:
     a. Emit the byte unchanged

Implementation Considerations:

Buffer Management: The escaped frame may be up to 2x the size of the original. Buffer allocation must account for worst-case expansion.
Efficiency: The escaping check should be fast, as it runs for every byte. Table lookup can be faster than bit manipulation for ACCM checks.
Flag Handling: The algorithm processes the payload only; the surrounding flag bytes are added separately and never escaped.
FCS Calculation: The FCS (CRC) is calculated on the unescaped data but is itself subject to escaping when transmitted.

escape_encoder.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
"""
PPP Escape Sequence Encoder
 
This module implements the transmitter-side escape processing
for PPP asynchronous framing, as specified in RFC 1662.
"""
 
from typing import List, Callable
 
class PPPEscapeEncoder:
    """
    Encodes frame data by inserting escape sequences for reserved bytes.
    
    This encoder handles:
    - Flag byte (0x7E) escaping (mandatory)
    - Escape byte (0x7D) escaping (mandatory)  
    - Control character escaping (ACCM-controlled)
    - Optional high-byte escaping (0x80-0xFF range, if configured)
    """
    
    FLAG_BYTE = 0x7E
    ESCAPE_BYTE = 0x7D
    XOR_MASK = 0x20
    
    def __init__(self, accm: int = 0xFFFFFFFF):
        """
        Initialize the encoder with an Async Control Character Map.
        
        Args:
            accm: 32-bit bitmap controlling which control characters
                  (0x00-0x1F) require escaping. Default is all escaped.
        """
        self.accm = accm
        self._build_escape_table()
    
    def _build_escape_table(self):
        """
        Pre-build a lookup table for O(1) escape decisions.
        
        Using a lookup table is faster than computing escape decisions
        for each byte, especially at high data rates.
        """
        self._must_escape = [False] * 256
        
        # Flag and escape bytes are always escaped
        self._must_escape[self.FLAG_BYTE] = True
        self._must_escape[self.ESCAPE_BYTE] = True
        
        # Control characters based on ACCM
        for i in range(32):
            if self.accm & (1 << i):
                self._must_escape[i] = True
        
        # 0x7F (DEL) is often escaped as well
        # (Some implementations include this in the escape set)
        # self._must_escape[0x7F] = True
    
    def update_accm(self, accm: int):
        """
        Update the ACCM after LCP negotiation.
        
        This would be called when the link configuration changes,
        such as after successful LCP negotiation reduces the escape set.
        """
        self.accm = accm
        self._build_escape_table()
    
    def encode(self, data: bytes) -> bytes:
        """
        Encode frame data with escape sequences.
        
        Args:
            data: Raw frame content (header + payload + FCS, no flags)
        
        Returns:
            Escaped data suitable for transmission between flags
        
        Note:
            This does NOT add surrounding flag bytes.
            Caller must add 0x7E before and after the returned data.
        """
        result = []
        
        for byte in data:
            if self._must_escape[byte]:
                # Insert escape sequence
                result.append(self.ESCAPE_BYTE)
                result.append(byte ^ self.XOR_MASK)
            else:
                # Pass through unchanged
                result.append(byte)
        
        return bytes(result)
    
    def encode_streaming(self, data: bytes, 
                         output_callback: Callable[[int], None]):
        """
        Stream-oriented encoding for memory-constrained environments.
        
        Instead of building a complete output buffer, this method
        calls the output callback for each byte as it's generated.
        
        Args:
            data: Raw frame content
            output_callback: Function called with each output byte
        """
        for byte in data:
            if self._must_escape[byte]:
                output_callback(self.ESCAPE_BYTE)
                output_callback(byte ^ self.XOR_MASK)
            else:
                output_callback(byte)
    
    def calculate_encoded_length(self, data: bytes) -> int:
        """
        Calculate the length of encoded data without actually encoding.
        
        Useful for buffer allocation when output size must be known
        before encoding begins.
        """
        extra = sum(1 for byte in data if self._must_escape[byte])
        return len(data) + extra
 
 
def demonstrate_escape_encoding():
    """
    Demonstrate escape sequence encoding with examples.
    """
    encoder = PPPEscapeEncoder(accm=0xFFFFFFFF)  # Full ACCM
    
    print("PPP Escape Sequence Encoding Demonstration")
    print("=" * 50)
    
    # Example 1: Data with flag byte
    data1 = bytes([0x01, 0x7E, 0x02])  # Contains flag
    encoded1 = encoder.encode(data1)
    print(f"\nExample 1: Data containing flag byte")
    print(f"  Original: {data1.hex(' ')}")
    print(f"  Encoded:  {encoded1.hex(' ')}")
    print(f"  Expansion: {len(data1)} → {len(encoded1)} bytes")
    
    # Example 2: Data with escape byte
    data2 = bytes([0x01, 0x7D, 0x02])  # Contains escape
    encoded2 = encoder.encode(data2)
    print(f"\nExample 2: Data containing escape byte")
    print(f"  Original: {data2.hex(' ')}")
    print(f"  Encoded:  {encoded2.hex(' ')}")
    
    # Example 3: Data with escape followed by what would decode as flag
    data3 = bytes([0x7D, 0x5E])  # Literal 0x7D 0x5E
    encoded3 = encoder.encode(data3)
    print(f"\nExample 3: Data that looks like escape sequence")
    print(f"  Original: {data3.hex(' ')}")
    print(f"  Encoded:  {encoded3.hex(' ')}")
    print(f"  (0x7D becomes 7D 5D, then 0x5E passes through)")
    
    # Example 4: Control characters
    data4 = bytes([0x11, 0x13, 0x41])  # XON, XOFF, 'A'
    encoded4 = encoder.encode(data4)
    print(f"\nExample 4: Control characters (XON, XOFF, 'A')")
    print(f"  Original: {data4.hex(' ')}")
    print(f"  Encoded:  {encoded4.hex(' ')}")
    
    # Example 5: Worst-case expansion
    data5 = bytes([0x7E] * 10)  # All flags
    encoded5 = encoder.encode(data5)
    print(f"\nExample 5: Worst-case (all flag bytes)")
    print(f"  Original: {data5.hex(' ')}")
    print(f"  Encoded:  {encoded5.hex(' ')}")
    print(f"  Expansion: {len(data5)} → {len(encoded5)} bytes (2x)")
    
    # ACCM comparison
    print(f"\n{'='*50}")
    print("ACCM Impact Analysis")
    
    # Data with control characters
    test_data = bytes(range(256))  # All possible bytes
    
    full_accm_encoder = PPPEscapeEncoder(accm=0xFFFFFFFF)
    min_accm_encoder = PPPEscapeEncoder(accm=0x00000000)
    
    full_encoded = full_accm_encoder.encode(test_data)
    min_encoded = min_accm_encoder.encode(test_data)
    
    print(f"\nAll 256 byte values:")
    print(f"  Full ACCM: {len(test_data)} → {len(full_encoded)} bytes")
    print(f"  Min ACCM:  {len(test_data)} → {len(min_encoded)} bytes")
    print(f"  Overhead: Full={len(full_encoded)-len(test_data)}, Min={len(min_encoded)-len(test_data)}")
 
 
if __name__ == "__main__":
    demonstrate_escape_encoding()

Performance Optimization

The lookup table approach (pre-computed escape decisions) is crucial for high-performance implementations. At 10 Mbps, the encoder must process over 1 million bytes per second. Function call overhead for each escape decision would be prohibitive.

Receiver Escape Processing

The receiver must perform the inverse operation: detect escape sequences and reconstruct the original byte values. This process is called unstuffing or escape decoding.

Algorithm Outline:

State = NORMAL
For each byte in the received frame:
  If State == NORMAL:
    If byte == ESCAPE:
      State = ESCAPED
    Else:
      Output byte unchanged
      
  Else (State == ESCAPED):
    Output byte XOR 0x20
    State = NORMAL

Key Observations:

Stateful Processing: The receiver must track whether the previous byte was an escape character. This is a two-state machine.
Single-Pass: Decoding is strictly left-to-right; no lookahead is required.
Buffer Shrinkage: The decoded data is always smaller than or equal to the input. In-place decoding is possible.
Error Conditions: What if a frame ends with a lone escape byte? This is an error condition (escaped byte missing).

escape_decoder.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
"""
PPP Escape Sequence Decoder
 
This module implements the receiver-side escape processing
for PPP asynchronous framing.
"""
 
from enum import Enum, auto
from typing import Optional, Tuple
 
class DecoderState(Enum):
    NORMAL = auto()    # Processing regular bytes
    ESCAPED = auto()   # Previous byte was 0x7D
 
class DecodeError(Exception):
    """Raised when escape sequence decoding fails."""
    pass
 
class PPPEscapeDecoder:
    """
    Decodes escape sequences to recover original frame data.
    
    This decoder handles the inverse of PPP escape encoding:
    - [0x7D 0x5E] → 0x7E
    - [0x7D 0x5D] → 0x7D
    - [0x7D 0xXX] → 0xXX XOR 0x20 (for other escaped bytes)
    """
    
    ESCAPE_BYTE = 0x7D
    XOR_MASK = 0x20
    
    def __init__(self):
        self.state = DecoderState.NORMAL
    
    def reset(self):
        """Reset decoder state for a new frame."""
        self.state = DecoderState.NORMAL
    
    def decode(self, data: bytes) -> bytes:
        """
        Decode escape sequences in received frame data.
        
        Args:
            data: Escaped frame data (between flags, flags removed)
        
        Returns:
            Original unescaped data
        
        Raises:
            DecodeError: If frame ends with incomplete escape sequence
        """
        self.reset()
        result = []
        
        for byte in data:
            decoded = self._process_byte(byte)
            if decoded is not None:
                result.append(decoded)
        
        # Check for incomplete escape at end of frame
        if self.state == DecoderState.ESCAPED:
            raise DecodeError("Frame ends with incomplete escape sequence")
        
        return bytes(result)
    
    def _process_byte(self, byte: int) -> Optional[int]:
        """
        Process a single byte through the decoder state machine.
        
        Returns the decoded byte if one is ready, None if more input needed.
        """
        if self.state == DecoderState.NORMAL:
            if byte == self.ESCAPE_BYTE:
                # Start escape sequence
                self.state = DecoderState.ESCAPED
                return None
            else:
                # Regular byte - pass through
                return byte
        
        else:  # ESCAPED state
            # Decode the escaped byte
            self.state = DecoderState.NORMAL
            return byte ^ self.XOR_MASK
    
    def decode_streaming(self, byte: int) -> Tuple[bool, Optional[int]]:
        """
        Stream-oriented decoding for interrupt-driven receivers.
        
        This method processes one byte at a time, maintaining state
        between calls. Suitable for byte-at-a-time UART reception.
        
        Args:
            byte: Single received byte
        
        Returns:
            (output_ready, decoded_byte) tuple
            output_ready: True if decoded_byte is valid
            decoded_byte: The decoded byte value, or None
        """
        decoded = self._process_byte(byte)
        return (decoded is not None, decoded)
 
 
def demonstrate_escape_decoding():
    """
    Demonstrate escape sequence decoding with examples.
    """
    decoder = PPPEscapeDecoder()
    
    print("PPP Escape Sequence Decoding Demonstration")
    print("=" * 50)
    
    # Example 1: Basic escape sequence
    encoded1 = bytes([0x01, 0x7D, 0x5E, 0x02])  # Contains escaped flag
    decoded1 = decoder.decode(encoded1)
    print(f"\nExample 1: Escaped flag byte")
    print(f"  Received: {encoded1.hex(' ')}")
    print(f"  Decoded:  {decoded1.hex(' ')}")
    print(f"  (7D 5E became 7E)")
    
    # Example 2: Escaped escape byte
    encoded2 = bytes([0x01, 0x7D, 0x5D, 0x02])
    decoded2 = decoder.decode(encoded2)
    print(f"\nExample 2: Escaped escape byte")
    print(f"  Received: {encoded2.hex(' ')}")
    print(f"  Decoded:  {decoded2.hex(' ')}")
    print(f"  (7D 5D became 7D)")
    
    # Example 3: Multiple escape sequences
    encoded3 = bytes([0x7D, 0x5E, 0x7D, 0x5D, 0x7D, 0x31])
    decoded3 = decoder.decode(encoded3)
    print(f"\nExample 3: Multiple escapes")
    print(f"  Received: {encoded3.hex(' ')}")
    print(f"  Decoded:  {decoded3.hex(' ')}")
    print(f"  (3 escape sequences → 3 bytes)")
    
    # Example 4: Verify round-trip encoding
    from escape_encoder import PPPEscapeEncoder  # Import encoder
    
    original = bytes([0x7E, 0x7D, 0x00, 0x11, 0x13, 0x41])
    encoder = PPPEscapeEncoder()
    encoded = encoder.encode(original)
    decoded = decoder.decode(encoded)
    
    print(f"\nExample 4: Round-trip verification")
    print(f"  Original: {original.hex(' ')}")
    print(f"  Encoded:  {encoded.hex(' ')}")
    print(f"  Decoded:  {decoded.hex(' ')}")
    print(f"  Match: {original == decoded}")
    
    # Example 5: Stream decoding
    print(f"\nExample 5: Stream (byte-by-byte) decoding")
    stream_decoder = PPPEscapeDecoder()
    stream_data = bytes([0x41, 0x7D, 0x5E, 0x42])
    print(f"  Input stream: {stream_data.hex(' ')}")
    print(f"  Byte-by-byte processing:")
    
    for byte in stream_data:
        ready, decoded = stream_decoder.decode_streaming(byte)
        hex_byte = f"0x{byte:02X}"
        if ready:
            print(f"    Received {hex_byte} → Output 0x{decoded:02X}")
        else:
            print(f"    Received {hex_byte} → (buffered, awaiting next byte)")
    
    # Example 6: Error case
    print(f"\nExample 6: Error case (incomplete escape)")
    try:
        bad_data = bytes([0x41, 0x7D])  # Ends with escape byte
        decoder.decode(bad_data)
    except DecodeError as e:
        print(f"  Data: {bad_data.hex(' ')}")
        print(f"  Error: {e}")
 
 
if __name__ == "__main__":
    demonstrate_escape_decoding()

Error Handling

A robust receiver must handle malformed escape sequences gracefully. If a frame ends with a lone escape byte, or if an escape sequence produces an invalid byte, the frame should be discarded and the error logged. Never allow malformed input to crash the receiver or corrupt state.

Historical Context: BISYNC DLE Stuffing

The escape sequence approach used in PPP evolved from earlier character-oriented protocols. Understanding this history provides insight into why certain design decisions were made.

BISYNC (Binary Synchronous Communication):

Developed by IBM in the 1960s, BISYNC was one of the first widely-deployed byte-oriented protocols. Its framing used ASCII control characters:

[SYN][SYN][SOH][Header][STX][Data][ETX][BCC]

Where:

SYN (0x16): Synchronization character
SOH (0x01): Start of Header
STX (0x02): Start of Text (payload begins)
ETX (0x03): End of Text (payload ends)
BCC: Block Check Character (error detection)

The DLE Stuffing Technique:

To achieve transparency, BISYNC used the DLE (Data Link Escape, 0x10) character:

When switching to transparent mode: Send DLE STX
End transparent mode: Send DLE ETX
To include literal DLE in data: Send DLE DLE

This is the ancestor of the PPP escape mechanism!

Evolution of Escape Mechanisms
Era	Protocol	Escape Method	Characteristics
1960s	BISYNC	DLE stuffing	Control character based; text-oriented
1970s	DDCMP	Byte count + CRC	Length field avoids stuffing overhead
1979	HDLC	Bit stuffing	Bit-oriented; more efficient
1980s	SLIP	Two escapes (END, ESC)	Minimal implementation
1994	PPP	0x7D + XOR	Unified approach; configurable ACCM

SLIP: The Minimalist Predecessor:

Serial Line Internet Protocol (SLIP), defined in RFC 1055, used an even simpler escape scheme:

END byte (0xC0): Frame delimiter
ESC byte (0xDB): Escape indicator
ESC_END (0xDC): Represents literal 0xC0 in data
ESC_ESC (0xDD): Represents literal 0xDB in data

Escape mappings:
0xC0 in data → 0xDB 0xDC
0xDB in data → 0xDB 0xDD

SLIP chose fixed escape sequences rather than a computational transformation (like PPP's XOR). This is slightly simpler but requires separate handling for each escaped byte.

Why PPP Improved on SLIP:

Extensibility: PPP's XOR approach handles any escaped byte uniformly
ACCM: Configurable escape sets for different serial environments
Protocol Field: PPP supports multiple upper-layer protocols (IP, IPX, etc.)
Authentication: PPP includes PAP, CHAP authentication options
Compression: PPP can compress headers and negotiate compression

Design Evolution

Each generation of protocols refined the escape mechanism based on experience with its predecessors. BISYNC taught the importance of transparency; SLIP proved simplicity was achievable; PPP added configurability and extensibility. Understanding this evolution helps appreciate why modern protocols look the way they do.

Summary: Escape Sequences for Transparency

We have now thoroughly explored escape sequences as the mechanism that enables data transparency in byte-oriented protocols. Let's consolidate our understanding:

Key Takeaways

•Data transparency means the ability to transmit arbitrary byte sequences without misinterpretation as protocol control information
•Escape sequences replace reserved byte values with multi-byte sequences that the receiver can decode
•The escape byte itself must be escaped to prevent ambiguity when payload data resembles escape sequences
•PPP's XOR-based transformation (byte XOR 0x20) provides a simple, reversible, uniform encoding
•The ACCM allows negotiation of which additional control characters require escaping
•Sender encoding and receiver decoding are implemented as simple, efficient state machines

The Complete Picture:

Escape sequences are one component of the broader byte stuffing process:

Flag Bytes (Page 1): Delimit frame boundaries
Escape Sequences (This Page): Enable transparency for reserved bytes
Byte Stuffing (Next Page): The complete algorithm combining flags and escaping
Transparency (Page 4): Theory and formal properties of transparent communication
PPP Example (Page 5): Complete real-world implementation

What's Next:

The next page examines byte stuffing as the complete algorithm that combines flag bytes and escape sequences. We'll develop the full transmitter and receiver implementations, analyze efficiency, and explore optimization strategies for production systems.

Page Complete

You now understand escape sequences as the key mechanism for data transparency. You can explain why they're necessary, how they work, and why the escape byte itself must be escaped. Next, we'll see how escape sequences combine with flag bytes in the complete byte stuffing algorithm.

2 / 5

Loading learning content...

Computer NetworksData Link Layer

Framing: Byte Stuffing

LevelIntermediate

Duration75 mins

TopicData Link Layer

2 / 5

Escape Sequences: The Key to Data Transparency

The Data Transparency Challenge

What You Will Master

The Transparency Problem in Depth

To truly understand escape sequences, we must first fully appreciate the severity of the transparency problem.

Scenario Analysis:

Consider a PPP frame carrying an IP packet. The frame structure is:

[0x7E] [Address] [Control] [Protocol] [IP Packet Data] [FCS] [0x7E]

Now suppose the IP packet payload (perhaps containing compressed web content) includes the following byte sequence:

... 0x45 0x00 0x7E 0x12 0x34 ...
               ↑
               Contains flag byte value!

Without Escape Handling:

The receiver's state machine would process this as:

[0x7E] → Begin frame
[Address, Control, Protocol] → Header received
[0x45, 0x00] → Payload bytes accumulated
[0x7E] → END OF FRAME DETECTED!
[0x12, 0x34, ...] → Beginning of "next frame" (?!)

The receiver now believes it has received a complete (but corrupted) frame, and potentially starts parsing random data as the header of a new frame. The consequences cascade:

The current frame is truncated and will fail CRC validation
Subsequent bytes are misinterpreted as frame headers
Synchronization is lost until the next actual flag happens to resynchronize
Multiple frames may be corrupted before recovery

Impact Without Transparency

Probability Analysis:

How often would this problem occur? For truly random data:

Probability of any single byte being 0x7E: 1/256 ≈ 0.39%
For a 1000-byte payload: Probability of at least one 0x7E ≈ 1 - (255/256)^1000 ≈ 98%

This means nearly every large frame would be corrupted! Even for small frames:

100-byte payload: ~32% chance of containing 0x7E
500-byte payload: ~86% chance of containing 0x7E

Conclusion: Data transparency is not optional; it is essential for any protocol that transmits arbitrary data.

Historical Note:

The Escape Sequence Concept

Basic Principle:

Define an escape character (also called escape byte or control escape) that signals: "The next byte should be interpreted specially, not literally."

When the transmitter encounters a byte that cannot appear directly in the frame, it replaces that byte with a two-byte escape sequence:

Original byte → [ESC] [Modified byte]

The receiver, upon seeing the escape character, knows to interpret the next byte specially and reconstruct the original value.

The PPP Escape Scheme:

PPP uses the following escape encoding:

Escape byte: 0x7D
Transformation: XOR the escaped byte with 0x20

This produces the following mappings:

PPP Escape Sequence Mappings
Original Byte	Hexadecimal	Escaped Sequence	Explanation
Flag (0x7E)	01111110	0x7D 0x5E	0x7E XOR 0x20 = 0x5E
Escape (0x7D)	01111101	0x7D 0x5D	0x7D XOR 0x20 = 0x5D
Control chars (0x00-0x1F)	Various	0x7D [byte XOR 0x20]	ASCII control characters
DEL (0x7F)	01111111	0x7D 0x5F	0x7F XOR 0x20 = 0x5F

Why XOR with 0x20?

The choice of 0x20 for the XOR transformation is deliberate:

Single-bit toggle: 0x20 = 0b00100000 toggles only bit 5, a simple operation
Invertibility: XOR is its own inverse: (A XOR 0x20) XOR 0x20 = A
Range transformation: For control characters (0x00-0x1F), XOR 0x20 produces printable ASCII characters (0x20-0x3F), which aids debugging
Distinctiveness: The resulting bytes (0x5E, 0x5D, etc.) are clearly different from the originals

Mathematical Properties:

The escape encoding maintains several important properties:

Bijective (one-to-one): Each original byte maps to exactly one escape sequence
Reversible: The receiver can unambiguously reconstruct the original byte
Self-contained: No additional length information is needed
Bounded expansion: Maximum 2x expansion (if every byte needs escaping)

Escape Sequence Design Principle

Why Escape the Escape Character?

One subtlety that often confuses newcomers: the escape character itself (0x7D) must also be escaped. This might seem like infinite recursion, but it's essential for correctness.

The Problem Without Escaping ESC:

Suppose we only escape the flag byte (0x7E → 0x7D 0x5E) but allow 0x7D to appear literally in data.

Now consider data that naturally contains: ... 0x7D 0x5E ...

The receiver would see:

0x7D → "Escape sequence starting!"
0x5E → "Modified byte, XOR to recover original"
Result: Receiver outputs 0x5E XOR 0x20 = 0x7E

But the sender meant to transmit the literal bytes 0x7D 0x5E! The receiver has incorrectly produced a single byte 0x7E instead of the two bytes 0x7D and 0x5E.

The Solution:

The escape character itself must be escaped:

Original 0x7D → Escaped to 0x7D 0x5D
Original 0x7E → Escaped to 0x7D 0x5E

Now the receiver can always distinguish:

0x7D 0x5D → Literal 0x7D
0x7D 0x5E → Literal 0x7E
Literal 0x7D 0x5E in payload → Transmitted as 0x7D 0x5D 0x5E

Without ESC Escaping

•Data 7D 5E → Transmitted as 7D 5E
•Receiver sees 7D 5E → Interprets as escape sequence
•Receiver outputs: 7E (wrong!)
•Result: Data corruption

With ESC Escaping

•Data 7D 5E → Transmitted as 7D 5D 5E
•Receiver sees 7D 5D → Outputs 7D
•Receiver sees 5E → Outputs 5E
•Result: Correct data received

General Principle:

In any escape sequence scheme, the escape character itself must be escapable. This creates a complete and unambiguous encoding:

Rule 1: Reserved bytes are replaced with [ESC][modified]
Rule 2: ESC byte is replaced with [ESC][ESC-modified]
Rule 3: All other bytes pass through unchanged

This three-rule system handles all cases:

Original: 0x01 0x7E 0x02 0x7D 0x03 0x7E 0x7D 0x04
Encoded:  0x01 [7D 5E] 0x02 [7D 5D] 0x03 [7D 5E] [7D 5D] 0x04
          ↑    └──┬──┘  ↑   └──┬──┘  ↑   └──┬──┘ └──┬──┘  ↑
          pass  flag   pass  esc   pass  flag    esc   pass

Recursion Avoidance

The PPP Async Control Character Map

Why Escape Additional Characters?

Certain byte values can interfere with the underlying serial communication hardware or software:

0x00 (NUL): Some systems interpret this as string terminator
0x03 (ETX): Can trigger buffer flush on some terminals
0x11 (XON), 0x13 (XOFF): Software flow control signals
0x1B (ESC): Escape to terminal command mode
0x7F (DEL): Delete character, often handled specially

If these bytes appear in frame data and pass through unescaped, the serial port driver or modem might misinterpret them, causing communication problems independent of the PPP protocol itself.

The ACCM Mechanism:

The ACCM is a 32-bit bitmap where each bit corresponds to a control character (0x00-0x1F):

ACCM bit 0  → Escape 0x00 if set
ACCM bit 1  → Escape 0x01 if set
ACCM bit 2  → Escape 0x02 if set
...
ACCM bit 31 → Escape 0x1F if set

The default ACCM is 0xFFFFFFFF, meaning all 32 control characters are escaped. During LCP negotiation, peers can agree to escape fewer characters for better efficiency.

Common Control Characters Requiring Escaping
Byte	Name	Reason for Escaping
0x00	NUL	String terminator in C-style implementations
0x03	ETX	End of Text; may trigger buffer operations
0x11	XON (DC1)	Software flow control: resume transmission
0x13	XOFF (DC3)	Software flow control: pause transmission
0x1B	ESC	Terminal escape sequence initiator
0x7D	PPP ESC	Protocol escape character (always escaped)
0x7E	PPP FLAG	Protocol frame delimiter (always escaped)
0x7F	DEL	Delete character; special handling on many systems

ACCM Negotiation Example:

During LCP negotiation, the two peers exchange ACCM options:

Peer A → Peer B: LCP Configure-Request with ACCM = 0x000A0000
                 (Escape only XON/XOFF: bits 17 and 19)

Peer B → Peer A: LCP Configure-Ack (accepting the proposed ACCM)

After negotiation, both peers know that only characters 0x11 (XON) and 0x13 (XOFF), plus the mandatory 0x7D and 0x7E, need escaping. All other control characters can be transmitted literally.

Efficiency Implications:

Reducing the escaped character set improves efficiency:

Default ACCM (0xFFFFFFFF): 34 characters escaped (32 control + ESC + FLAG)
Minimal ACCM (0x00000000): 2 characters escaped (ESC + FLAG only)

For binary data with uniform distribution:

Default: ~13% of bytes need escaping
Minimal: ~0.8% of bytes need escaping

The efficiency gain is significant for long-running connections over reliable media.

Asymmetric ACCM

Sender Escape Processing

Implementing escape sequence insertion on the transmitter side requires a careful byte-by-byte processing algorithm. Let's develop a complete, production-quality implementation.

Algorithm Outline:

For each byte in the original frame:
  1. Check if the byte requires escaping (based on ACCM or reserved values)
  2. If escaping required:
     a. Emit the escape byte (0x7D)
     b. Emit the byte XOR 0x20
  3. If no escaping required:
     a. Emit the byte unchanged

Implementation Considerations:

Buffer Management: The escaped frame may be up to 2x the size of the original. Buffer allocation must account for worst-case expansion.
Efficiency: The escaping check should be fast, as it runs for every byte. Table lookup can be faster than bit manipulation for ACCM checks.
Flag Handling: The algorithm processes the payload only; the surrounding flag bytes are added separately and never escaped.
FCS Calculation: The FCS (CRC) is calculated on the unescaped data but is itself subject to escaping when transmitted.

escape_encoder.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
"""
PPP Escape Sequence Encoder
 
This module implements the transmitter-side escape processing
for PPP asynchronous framing, as specified in RFC 1662.
"""
 
from typing import List, Callable
 
class PPPEscapeEncoder:
    """
    Encodes frame data by inserting escape sequences for reserved bytes.
    
    This encoder handles:
    - Flag byte (0x7E) escaping (mandatory)
    - Escape byte (0x7D) escaping (mandatory)  
    - Control character escaping (ACCM-controlled)
    - Optional high-byte escaping (0x80-0xFF range, if configured)
    """
    
    FLAG_BYTE = 0x7E
    ESCAPE_BYTE = 0x7D
    XOR_MASK = 0x20
    
    def __init__(self, accm: int = 0xFFFFFFFF):
        """
        Initialize the encoder with an Async Control Character Map.
        
        Args:
            accm: 32-bit bitmap controlling which control characters
                  (0x00-0x1F) require escaping. Default is all escaped.
        """
        self.accm = accm
        self._build_escape_table()
    
    def _build_escape_table(self):
        """
        Pre-build a lookup table for O(1) escape decisions.
        
        Using a lookup table is faster than computing escape decisions
        for each byte, especially at high data rates.
        """
        self._must_escape = [False] * 256
        
        # Flag and escape bytes are always escaped
        self._must_escape[self.FLAG_BYTE] = True
        self._must_escape[self.ESCAPE_BYTE] = True
        
        # Control characters based on ACCM
        for i in range(32):
            if self.accm & (1 << i):
                self._must_escape[i] = True
        
        # 0x7F (DEL) is often escaped as well
        # (Some implementations include this in the escape set)
        # self._must_escape[0x7F] = True
    
    def update_accm(self, accm: int):
        """
        Update the ACCM after LCP negotiation.
        
        This would be called when the link configuration changes,
        such as after successful LCP negotiation reduces the escape set.
        """
        self.accm = accm
        self._build_escape_table()
    
    def encode(self, data: bytes) -> bytes:
        """
        Encode frame data with escape sequences.
        
        Args:
            data: Raw frame content (header + payload + FCS, no flags)
        
        Returns:
            Escaped data suitable for transmission between flags
        
        Note:
            This does NOT add surrounding flag bytes.
            Caller must add 0x7E before and after the returned data.
        """
        result = []
        
        for byte in data:
            if self._must_escape[byte]:
                # Insert escape sequence
                result.append(self.ESCAPE_BYTE)
                result.append(byte ^ self.XOR_MASK)
            else:
                # Pass through unchanged
                result.append(byte)
        
        return bytes(result)
    
    def encode_streaming(self, data: bytes, 
                         output_callback: Callable[[int], None]):
        """
        Stream-oriented encoding for memory-constrained environments.
        
        Instead of building a complete output buffer, this method
        calls the output callback for each byte as it's generated.
        
        Args:
            data: Raw frame content
            output_callback: Function called with each output byte
        """
        for byte in data:
            if self._must_escape[byte]:
                output_callback(self.ESCAPE_BYTE)
                output_callback(byte ^ self.XOR_MASK)
            else:
                output_callback(byte)
    
    def calculate_encoded_length(self, data: bytes) -> int:
        """
        Calculate the length of encoded data without actually encoding.
        
        Useful for buffer allocation when output size must be known
        before encoding begins.
        """
        extra = sum(1 for byte in data if self._must_escape[byte])
        return len(data) + extra
 
 
def demonstrate_escape_encoding():
    """
    Demonstrate escape sequence encoding with examples.
    """
    encoder = PPPEscapeEncoder(accm=0xFFFFFFFF)  # Full ACCM
    
    print("PPP Escape Sequence Encoding Demonstration")
    print("=" * 50)
    
    # Example 1: Data with flag byte
    data1 = bytes([0x01, 0x7E, 0x02])  # Contains flag
    encoded1 = encoder.encode(data1)
    print(f"\nExample 1: Data containing flag byte")
    print(f"  Original: {data1.hex(' ')}")
    print(f"  Encoded:  {encoded1.hex(' ')}")
    print(f"  Expansion: {len(data1)} → {len(encoded1)} bytes")
    
    # Example 2: Data with escape byte
    data2 = bytes([0x01, 0x7D, 0x02])  # Contains escape
    encoded2 = encoder.encode(data2)
    print(f"\nExample 2: Data containing escape byte")
    print(f"  Original: {data2.hex(' ')}")
    print(f"  Encoded:  {encoded2.hex(' ')}")
    
    # Example 3: Data with escape followed by what would decode as flag
    data3 = bytes([0x7D, 0x5E])  # Literal 0x7D 0x5E
    encoded3 = encoder.encode(data3)
    print(f"\nExample 3: Data that looks like escape sequence")
    print(f"  Original: {data3.hex(' ')}")
    print(f"  Encoded:  {encoded3.hex(' ')}")
    print(f"  (0x7D becomes 7D 5D, then 0x5E passes through)")
    
    # Example 4: Control characters
    data4 = bytes([0x11, 0x13, 0x41])  # XON, XOFF, 'A'
    encoded4 = encoder.encode(data4)
    print(f"\nExample 4: Control characters (XON, XOFF, 'A')")
    print(f"  Original: {data4.hex(' ')}")
    print(f"  Encoded:  {encoded4.hex(' ')}")
    
    # Example 5: Worst-case expansion
    data5 = bytes([0x7E] * 10)  # All flags
    encoded5 = encoder.encode(data5)
    print(f"\nExample 5: Worst-case (all flag bytes)")
    print(f"  Original: {data5.hex(' ')}")
    print(f"  Encoded:  {encoded5.hex(' ')}")
    print(f"  Expansion: {len(data5)} → {len(encoded5)} bytes (2x)")
    
    # ACCM comparison
    print(f"\n{'='*50}")
    print("ACCM Impact Analysis")
    
    # Data with control characters
    test_data = bytes(range(256))  # All possible bytes
    
    full_accm_encoder = PPPEscapeEncoder(accm=0xFFFFFFFF)
    min_accm_encoder = PPPEscapeEncoder(accm=0x00000000)
    
    full_encoded = full_accm_encoder.encode(test_data)
    min_encoded = min_accm_encoder.encode(test_data)
    
    print(f"\nAll 256 byte values:")
    print(f"  Full ACCM: {len(test_data)} → {len(full_encoded)} bytes")
    print(f"  Min ACCM:  {len(test_data)} → {len(min_encoded)} bytes")
    print(f"  Overhead: Full={len(full_encoded)-len(test_data)}, Min={len(min_encoded)-len(test_data)}")
 
 
if __name__ == "__main__":
    demonstrate_escape_encoding()

Performance Optimization

Receiver Escape Processing

The receiver must perform the inverse operation: detect escape sequences and reconstruct the original byte values. This process is called unstuffing or escape decoding.

Algorithm Outline:

State = NORMAL
For each byte in the received frame:
  If State == NORMAL:
    If byte == ESCAPE:
      State = ESCAPED
    Else:
      Output byte unchanged
      
  Else (State == ESCAPED):
    Output byte XOR 0x20
    State = NORMAL

Key Observations:

Stateful Processing: The receiver must track whether the previous byte was an escape character. This is a two-state machine.
Single-Pass: Decoding is strictly left-to-right; no lookahead is required.
Buffer Shrinkage: The decoded data is always smaller than or equal to the input. In-place decoding is possible.
Error Conditions: What if a frame ends with a lone escape byte? This is an error condition (escaped byte missing).

escape_decoder.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
"""
PPP Escape Sequence Decoder
 
This module implements the receiver-side escape processing
for PPP asynchronous framing.
"""
 
from enum import Enum, auto
from typing import Optional, Tuple
 
class DecoderState(Enum):
    NORMAL = auto()    # Processing regular bytes
    ESCAPED = auto()   # Previous byte was 0x7D
 
class DecodeError(Exception):
    """Raised when escape sequence decoding fails."""
    pass
 
class PPPEscapeDecoder:
    """
    Decodes escape sequences to recover original frame data.
    
    This decoder handles the inverse of PPP escape encoding:
    - [0x7D 0x5E] → 0x7E
    - [0x7D 0x5D] → 0x7D
    - [0x7D 0xXX] → 0xXX XOR 0x20 (for other escaped bytes)
    """
    
    ESCAPE_BYTE = 0x7D
    XOR_MASK = 0x20
    
    def __init__(self):
        self.state = DecoderState.NORMAL
    
    def reset(self):
        """Reset decoder state for a new frame."""
        self.state = DecoderState.NORMAL
    
    def decode(self, data: bytes) -> bytes:
        """
        Decode escape sequences in received frame data.
        
        Args:
            data: Escaped frame data (between flags, flags removed)
        
        Returns:
            Original unescaped data
        
        Raises:
            DecodeError: If frame ends with incomplete escape sequence
        """
        self.reset()
        result = []
        
        for byte in data:
            decoded = self._process_byte(byte)
            if decoded is not None:
                result.append(decoded)
        
        # Check for incomplete escape at end of frame
        if self.state == DecoderState.ESCAPED:
            raise DecodeError("Frame ends with incomplete escape sequence")
        
        return bytes(result)
    
    def _process_byte(self, byte: int) -> Optional[int]:
        """
        Process a single byte through the decoder state machine.
        
        Returns the decoded byte if one is ready, None if more input needed.
        """
        if self.state == DecoderState.NORMAL:
            if byte == self.ESCAPE_BYTE:
                # Start escape sequence
                self.state = DecoderState.ESCAPED
                return None
            else:
                # Regular byte - pass through
                return byte
        
        else:  # ESCAPED state
            # Decode the escaped byte
            self.state = DecoderState.NORMAL
            return byte ^ self.XOR_MASK
    
    def decode_streaming(self, byte: int) -> Tuple[bool, Optional[int]]:
        """
        Stream-oriented decoding for interrupt-driven receivers.
        
        This method processes one byte at a time, maintaining state
        between calls. Suitable for byte-at-a-time UART reception.
        
        Args:
            byte: Single received byte
        
        Returns:
            (output_ready, decoded_byte) tuple
            output_ready: True if decoded_byte is valid
            decoded_byte: The decoded byte value, or None
        """
        decoded = self._process_byte(byte)
        return (decoded is not None, decoded)
 
 
def demonstrate_escape_decoding():
    """
    Demonstrate escape sequence decoding with examples.
    """
    decoder = PPPEscapeDecoder()
    
    print("PPP Escape Sequence Decoding Demonstration")
    print("=" * 50)
    
    # Example 1: Basic escape sequence
    encoded1 = bytes([0x01, 0x7D, 0x5E, 0x02])  # Contains escaped flag
    decoded1 = decoder.decode(encoded1)
    print(f"\nExample 1: Escaped flag byte")
    print(f"  Received: {encoded1.hex(' ')}")
    print(f"  Decoded:  {decoded1.hex(' ')}")
    print(f"  (7D 5E became 7E)")
    
    # Example 2: Escaped escape byte
    encoded2 = bytes([0x01, 0x7D, 0x5D, 0x02])
    decoded2 = decoder.decode(encoded2)
    print(f"\nExample 2: Escaped escape byte")
    print(f"  Received: {encoded2.hex(' ')}")
    print(f"  Decoded:  {decoded2.hex(' ')}")
    print(f"  (7D 5D became 7D)")
    
    # Example 3: Multiple escape sequences
    encoded3 = bytes([0x7D, 0x5E, 0x7D, 0x5D, 0x7D, 0x31])
    decoded3 = decoder.decode(encoded3)
    print(f"\nExample 3: Multiple escapes")
    print(f"  Received: {encoded3.hex(' ')}")
    print(f"  Decoded:  {decoded3.hex(' ')}")
    print(f"  (3 escape sequences → 3 bytes)")
    
    # Example 4: Verify round-trip encoding
    from escape_encoder import PPPEscapeEncoder  # Import encoder
    
    original = bytes([0x7E, 0x7D, 0x00, 0x11, 0x13, 0x41])
    encoder = PPPEscapeEncoder()
    encoded = encoder.encode(original)
    decoded = decoder.decode(encoded)
    
    print(f"\nExample 4: Round-trip verification")
    print(f"  Original: {original.hex(' ')}")
    print(f"  Encoded:  {encoded.hex(' ')}")
    print(f"  Decoded:  {decoded.hex(' ')}")
    print(f"  Match: {original == decoded}")
    
    # Example 5: Stream decoding
    print(f"\nExample 5: Stream (byte-by-byte) decoding")
    stream_decoder = PPPEscapeDecoder()
    stream_data = bytes([0x41, 0x7D, 0x5E, 0x42])
    print(f"  Input stream: {stream_data.hex(' ')}")
    print(f"  Byte-by-byte processing:")
    
    for byte in stream_data:
        ready, decoded = stream_decoder.decode_streaming(byte)
        hex_byte = f"0x{byte:02X}"
        if ready:
            print(f"    Received {hex_byte} → Output 0x{decoded:02X}")
        else:
            print(f"    Received {hex_byte} → (buffered, awaiting next byte)")
    
    # Example 6: Error case
    print(f"\nExample 6: Error case (incomplete escape)")
    try:
        bad_data = bytes([0x41, 0x7D])  # Ends with escape byte
        decoder.decode(bad_data)
    except DecodeError as e:
        print(f"  Data: {bad_data.hex(' ')}")
        print(f"  Error: {e}")
 
 
if __name__ == "__main__":
    demonstrate_escape_decoding()

Error Handling

Historical Context: BISYNC DLE Stuffing

The escape sequence approach used in PPP evolved from earlier character-oriented protocols. Understanding this history provides insight into why certain design decisions were made.

BISYNC (Binary Synchronous Communication):

Developed by IBM in the 1960s, BISYNC was one of the first widely-deployed byte-oriented protocols. Its framing used ASCII control characters:

[SYN][SYN][SOH][Header][STX][Data][ETX][BCC]

Where:

SYN (0x16): Synchronization character
SOH (0x01): Start of Header
STX (0x02): Start of Text (payload begins)
ETX (0x03): End of Text (payload ends)
BCC: Block Check Character (error detection)

The DLE Stuffing Technique:

To achieve transparency, BISYNC used the DLE (Data Link Escape, 0x10) character:

When switching to transparent mode: Send DLE STX
End transparent mode: Send DLE ETX
To include literal DLE in data: Send DLE DLE

This is the ancestor of the PPP escape mechanism!

Evolution of Escape Mechanisms
Era	Protocol	Escape Method	Characteristics
1960s	BISYNC	DLE stuffing	Control character based; text-oriented
1970s	DDCMP	Byte count + CRC	Length field avoids stuffing overhead
1979	HDLC	Bit stuffing	Bit-oriented; more efficient
1980s	SLIP	Two escapes (END, ESC)	Minimal implementation
1994	PPP	0x7D + XOR	Unified approach; configurable ACCM

SLIP: The Minimalist Predecessor:

Serial Line Internet Protocol (SLIP), defined in RFC 1055, used an even simpler escape scheme:

END byte (0xC0): Frame delimiter
ESC byte (0xDB): Escape indicator
ESC_END (0xDC): Represents literal 0xC0 in data
ESC_ESC (0xDD): Represents literal 0xDB in data

Escape mappings:
0xC0 in data → 0xDB 0xDC
0xDB in data → 0xDB 0xDD

SLIP chose fixed escape sequences rather than a computational transformation (like PPP's XOR). This is slightly simpler but requires separate handling for each escaped byte.

Why PPP Improved on SLIP:

Extensibility: PPP's XOR approach handles any escaped byte uniformly
ACCM: Configurable escape sets for different serial environments
Protocol Field: PPP supports multiple upper-layer protocols (IP, IPX, etc.)
Authentication: PPP includes PAP, CHAP authentication options
Compression: PPP can compress headers and negotiate compression

Design Evolution

Summary: Escape Sequences for Transparency

We have now thoroughly explored escape sequences as the mechanism that enables data transparency in byte-oriented protocols. Let's consolidate our understanding:

Key Takeaways

•Data transparency means the ability to transmit arbitrary byte sequences without misinterpretation as protocol control information
•Escape sequences replace reserved byte values with multi-byte sequences that the receiver can decode
•The escape byte itself must be escaped to prevent ambiguity when payload data resembles escape sequences
•PPP's XOR-based transformation (byte XOR 0x20) provides a simple, reversible, uniform encoding
•The ACCM allows negotiation of which additional control characters require escaping
•Sender encoding and receiver decoding are implemented as simple, efficient state machines

The Complete Picture:

Escape sequences are one component of the broader byte stuffing process:

Flag Bytes (Page 1): Delimit frame boundaries
Escape Sequences (This Page): Enable transparency for reserved bytes
Byte Stuffing (Next Page): The complete algorithm combining flags and escaping
Transparency (Page 4): Theory and formal properties of transparent communication
PPP Example (Page 5): Complete real-world implementation

What's Next:

Page Complete

2 / 5