Machine LearningFederated Learning

Federated Learning

LevelAdvanced

Duration120 mins

TopicFederated Learning

2 / 5

Privacy Preservation: Rigorous Privacy Guarantees in Federated Learning

The Illusion of Privacy

Federated learning keeps raw data on client devices—but does that mean privacy is guaranteed? Absolutely not. The model updates that clients share can reveal remarkable amounts of information about their private training data.

Consider this: a gradient update for a neural network encodes how that network should change to better fit a client's data. An adversary observing these gradients can potentially reconstruct training samples, infer membership (whether a specific data point was used in training), or extract sensitive attributes. This isn't theoretical—practical attacks have demonstrated gradient-based data reconstruction with stunning accuracy.

True privacy in federated learning requires formal guarantees, not just architectural choices. This page explores the privacy threat landscape and the rigorous mathematical frameworks that provide provable protection.

What You Will Learn

By the end of this page, you will understand privacy attacks against federated learning, the formal definition and mechanisms of differential privacy, how secure aggregation cryptographically protects individual contributions, and how to compose privacy guarantees across multiple training rounds. You'll be equipped to design federated systems with rigorous, quantifiable privacy properties.

Privacy Threats in Federated Learning

Before implementing defenses, we must understand what we're defending against. Privacy attacks on federated learning fall into three primary categories:

1. Gradient Inversion Attacks (Data Reconstruction)

These attacks attempt to reconstruct a client's training data from observed gradient updates. The intuition: if a gradient tells the server how to update weights to better fit certain data, that gradient implicitly encodes properties of that data.

The seminal work by Zhu et al. (2019), Deep Leakage from Gradients, demonstrated that by solving an optimization problem, an attacker can reconstruct training images with near-perfect fidelity from gradients alone.

gradient_inversion_attack.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Gradient Inversion Attack: Deep Leakage from Gradients
# Zhu et al., NeurIPS 2019
 
import torch
import torch.nn.functional as F
 
def gradient_inversion_attack(
    model: torch.nn.Module,
    observed_gradients: List[torch.Tensor],
    image_shape: Tuple[int, int, int, int],  # (B, C, H, W)
    num_iterations: int = 300,
    learning_rate: float = 1.0,
    tv_weight: float = 0.01  # Total variation regularization
) -> torch.Tensor:
    """
    Reconstruct training data from observed gradients.
    
    The attack optimizes dummy inputs such that their gradients
    match the observed gradients. When successful, dummy inputs
    closely resemble the original training data.
    
    Mathematical formulation:
        x* = argmin_x || ∇L(model, x) - ∇_observed ||² + λ·TV(x)
    
    Where TV(x) is total variation regularization for smoothness.
    
    Args:
        model: The neural network model
        observed_gradients: Gradients shared by the victim client
        image_shape: Shape of images to reconstruct
        num_iterations: Number of optimization steps
        learning_rate: Learning rate for reconstruction optimizer
        tv_weight: Weight for total variation regularization
    
    Returns:
        Reconstructed images approximating training data
    """
    # Initialize random "dummy" data to optimize
    dummy_data = torch.randn(image_shape, requires_grad=True)
    dummy_labels = torch.randn((image_shape[0], num_classes), requires_grad=True)
    
    optimizer = torch.optim.LBFGS(
        [dummy_data, dummy_labels], 
        lr=learning_rate
    )
    
    for iteration in range(num_iterations):
        def closure():
            optimizer.zero_grad()
            
            # Compute gradients on dummy data
            dummy_outputs = model(dummy_data)
            dummy_loss = F.cross_entropy(dummy_outputs, F.softmax(dummy_labels, dim=-1))
            dummy_gradients = torch.autograd.grad(
                dummy_loss, 
                model.parameters(), 
                create_graph=True
            )
            
            # Minimize distance between dummy gradients and observed gradients
            gradient_distance = sum(
                ((dg - og) ** 2).sum()
                for dg, og in zip(dummy_gradients, observed_gradients)
            )
            
            # Total variation regularization (encourages smooth images)
            tv_loss = total_variation(dummy_data)
            
            total_loss = gradient_distance + tv_weight * tv_loss
            total_loss.backward()
            
            return total_loss
        
        optimizer.step(closure)
        
        # Clamp to valid image range
        with torch.no_grad():
            dummy_data.clamp_(0, 1)
    
    return dummy_data.detach()
 
 
def total_variation(images: torch.Tensor) -> torch.Tensor:
    """
    Total variation regularization.
    Penalizes large differences between adjacent pixels,
    encouraging smooth, natural-looking reconstructions.
    """
    diff_h = images[:, :, 1:, :] - images[:, :, :-1, :]
    diff_w = images[:, :, :, 1:] - images[:, :, :, :-1]
    return torch.mean(diff_h ** 2) + torch.mean(diff_w ** 2)
 
 
# Attack effectiveness demonstration
# With batch size 1, reconstruction achieves >90% PSNR
# Larger batches make individual sample recovery harder
# But metadata (class distribution, statistics) still leaks

2. Membership Inference Attacks

These attacks determine whether a specific data point was part of a client's training set. If an attacker knows someone's medical record and can determine it was used to train a model, they've learned that person was a patient at a participating hospital.

Membership inference exploits the fact that models behave differently on data they've seen (lower loss, higher confidence) versus unseen data.

3. Property Inference Attacks

These attacks extract aggregate properties about a client's data that may be sensitive even if individual records aren't exposed. For example:

Inferring the demographic distribution of a hospital's patients
Determining the income bracket distribution of a bank's customers
Discovering what languages a user types in

Privacy Attack Taxonomy in Federated Learning
Attack Type	Attacker Goal	Information Leaked	Defense Approaches
Gradient Inversion	Reconstruct training samples	Individual data points	Gradient clipping, noise, SecAgg
Membership Inference	Determine if x ∈ training set	Data membership	Differential privacy, regularization
Property Inference	Learn aggregate data properties	Dataset statistics	Differential privacy, secure aggregation
Model Memorization	Extract memorized secrets	Verbatim training data	DP, deduplication, output filtering
Model Stealing	Replicate model functionality	Model IP/weights	Rate limiting, watermarking

Gradient Attacks Are Practical

Don't dismiss these as theoretical. Gradient inversion attacks can reconstruct images recognizable to humans from a single gradient update. For text, attackers can recover specific sentences from language model gradients. The threat is real, and defenses are mandatory for any privacy-sensitive deployment.

Differential Privacy: The Mathematical Foundation

Differential Privacy (DP) is the gold standard for formal privacy guarantees. Rather than making assumptions about attacker capabilities, DP provides guarantees that hold against any computationally unbounded adversary.

The Core Definition:

A randomized mechanism M satisfies (ε, δ)-differential privacy if for any two neighboring datasets D and D' (differing in exactly one record), and for any output S:

P[M(D) ∈ S] ≤ e^ε · P[M(D') ∈ S] + δ

Intuitively: the presence or absence of any individual's data barely affects the output distribution. An adversary observing the output cannot confidently determine whether any specific individual was in the dataset.

Understanding ε and δ

•ε (epsilon) — Privacy Loss — Measures how distinguishable outputs are. Smaller ε means stronger privacy. ε = 0 means perfect privacy (outputs identical regardless of participation). ε = 1 is considered reasonable privacy; ε > 10 provides weak privacy.
•δ (delta) — Failure Probability — The probability that the ε guarantee fails to hold. Should be cryptographically small (e.g., 10⁻⁵ or smaller). Often set to less than 1/(dataset size).
•e^ε Interpretation — If ε = 1, the likelihood ratio between D and D' is at most e ≈ 2.72. An adversary gains at most a 2.72x advantage from your participation.
•Plausible Deniability — Even if output suggests your data was included, you can plausibly deny it—the same output could have occurred without you.

differential_privacy_mechanisms.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
# Differential Privacy Mechanisms for Federated Learning
import numpy as np
from typing import Tuple, Callable
from scipy import stats
 
class DifferentialPrivacy:
    """
    Implementation of core differential privacy mechanisms
    used in federated learning.
    """
    
    @staticmethod
    def gaussian_mechanism(
        true_value: np.ndarray,
        sensitivity: float,
        epsilon: float,
        delta: float
    ) -> np.ndarray:
        """
        Gaussian mechanism for (ε, δ)-differential privacy.
        
        Adds Gaussian noise calibrated to the L2 sensitivity of the query.
        
        Noise scale: σ = Δ₂ · √(2 ln(1.25/δ)) / ε
        
        Where Δ₂ is the L2 sensitivity: max ||f(D) - f(D')||₂
        over all neighboring datasets D, D'.
        
        Args:
            true_value: The true query result to privatize
            sensitivity: L2 sensitivity of the query
            epsilon: Privacy parameter ε
            delta: Privacy parameter δ
        
        Returns:
            Noisy value satisfying (ε, δ)-DP
        """
        # Calculate noise scale using Gaussian mechanism formula
        # σ ≥ Δ₂ · √(2 ln(1.25/δ)) / ε
        sigma = sensitivity * np.sqrt(2 * np.log(1.25 / delta)) / epsilon
        
        # Sample Gaussian noise and add to true value
        noise = np.random.normal(0, sigma, size=true_value.shape)
        
        return true_value + noise
    
    @staticmethod
    def laplace_mechanism(
        true_value: np.ndarray,
        sensitivity: float,
        epsilon: float
    ) -> np.ndarray:
        """
        Laplace mechanism for ε-differential privacy (pure DP).
        
        Adds Laplace noise calibrated to the L1 sensitivity.
        
        Noise scale: b = Δ₁ / ε
        
        Where Δ₁ is the L1 sensitivity: max ||f(D) - f(D')||₁
        
        Provides pure DP (δ = 0) but requires more noise than Gaussian.
        
        Args:
            true_value: The true query result to privatize
            sensitivity: L1 sensitivity of the query
            epsilon: Privacy parameter ε
        
        Returns:
            Noisy value satisfying ε-DP
        """
        scale = sensitivity / epsilon
        noise = np.random.laplace(0, scale, size=true_value.shape)
        return true_value + noise
 
 
class DPFederatedLearning:
    """
    Differentially Private Federated Learning implementation
    following the DP-SGD approach (Abadi et al., 2016).
    """
    
    def __init__(
        self,
        target_epsilon: float,
        target_delta: float,
        clip_norm: float,
        noise_multiplier: float,
        num_rounds: int
    ):
        """
        Initialize DP-FL with privacy budget.
        
        Args:
            target_epsilon: Total privacy budget ε for all rounds
            target_delta: Target δ (typically 1/n for n users)
            clip_norm: Maximum L2 norm for client updates (sensitivity bound)
            noise_multiplier: σ/C ratio for Gaussian noise
            num_rounds: Total training rounds (affects privacy composition)
        """
        self.target_epsilon = target_epsilon
        self.target_delta = target_delta
        self.clip_norm = clip_norm
        self.noise_multiplier = noise_multiplier
        self.num_rounds = num_rounds
        
        # Track privacy spent so far
        self.rounds_completed = 0
    
    def clip_gradient(self, gradient: np.ndarray) -> np.ndarray:
        """
        Clip gradient to bound L2 sensitivity.
        
        Per-sample gradient clipping ensures that no single training
        example can influence the update by more than clip_norm.
        
        This is CRITICAL: without clipping, sensitivity is unbounded
        and no finite noise can achieve DP.
        """
        grad_norm = np.linalg.norm(gradient)
        
        if grad_norm > self.clip_norm:
            # Scale down to have exactly clip_norm magnitude
            gradient = gradient * (self.clip_norm / grad_norm)
        
        return gradient
    
    def add_noise_to_aggregate(
        self,
        aggregated_gradient: np.ndarray,
        num_clients: int
    ) -> np.ndarray:
        """
        Add calibrated Gaussian noise to aggregated update.
        
        The noise scale accounts for:
        1. The clip norm C (bounds sensitivity)
        2. The noise multiplier σ
        3. The number of clients (amplification via sampling)
        
        Returns:
            Noisy aggregate satisfying per-round DP guarantee
        """
        # Standard deviation of noise
        # σ_aggregate = noise_multiplier * C / num_clients
        sigma = self.noise_multiplier * self.clip_norm / num_clients
        
        noise = np.random.normal(0, sigma, size=aggregated_gradient.shape)
        noisy_aggregate = aggregated_gradient + noise
        
        self.rounds_completed += 1
        
        return noisy_aggregate
    
    def compute_privacy_spent(self) -> Tuple[float, float]:
        """
        Compute privacy budget spent using moments accountant.
        
        The Rényi Differential Privacy (RDP) framework provides tight
        privacy composition, essential for multi-round FL.
        
        Returns:
            (epsilon_spent, delta) tuple
        """
        # Simplified privacy accounting
        # In practice, use tensorflow-privacy or opacus for tight bounds
        from dp_accounting import compute_rdp, get_privacy_spent
        
        # Compute RDP at multiple orders
        orders = [1 + x / 10.0 for x in range(1, 100)]
        sampling_probability = 1.0  # If not subsampling clients
        
        rdp = compute_rdp(
            q=sampling_probability,
            noise_multiplier=self.noise_multiplier,
            steps=self.rounds_completed,
            orders=orders
        )
        
        epsilon_spent, _, _ = get_privacy_spent(
            orders, rdp, target_delta=self.target_delta
        )
        
        return epsilon_spent, self.target_delta
 
 
def dp_federated_averaging_round(
    global_model: np.ndarray,
    client_gradients: List[np.ndarray],
    dp_fl: DPFederatedLearning,
    client_weights: List[float]
) -> np.ndarray:
    """
    Execute one round of DP-FedAvg.
    
    Steps:
    1. Clip each client's gradient to bound sensitivity
    2. Compute weighted average of clipped gradients
    3. Add calibrated Gaussian noise to the aggregate
    4. Apply noisy update to global model
    """
    # Step 1: Clip each client's gradient
    clipped_gradients = [
        dp_fl.clip_gradient(grad) for grad in client_gradients
    ]
    
    # Step 2: Weighted average
    total_weight = sum(client_weights)
    aggregated = sum(
        (w / total_weight) * grad 
        for w, grad in zip(client_weights, clipped_gradients)
    )
    
    # Step 3: Add noise
    noisy_aggregate = dp_fl.add_noise_to_aggregate(
        aggregated, 
        num_clients=len(client_gradients)
    )
    
    # Step 4: Update model
    updated_model = global_model - noisy_aggregate  # Gradient descent
    
    return updated_model

DP-SGD: Making Gradient Descent Private

Differentially Private Stochastic Gradient Descent (DP-SGD), introduced by Abadi et al. (2016), provides the foundational technique for training neural networks with formal privacy guarantees. In federated learning, DP-SGD is adapted to work with distributed client updates.

The DP-SGD Algorithm:

Per-Sample Gradient Clipping — Compute gradients for each sample individually and clip their L2 norm to a maximum value C. This bounds the sensitivity of the gradient computation.
Noise Addition — Add Gaussian noise proportional to C to the sum of clipped gradients. The noise scale σ determines the privacy-utility tradeoff.
Privacy Accounting — Track privacy loss across training iterations using advanced composition theorems (moments accountant, RDP).

Converting Mermaid diagram...

Sensitivity Bounding via Clipping:

The key insight of DP-SGD is that clipping gradients bounds their sensitivity—the maximum change in the output from adding or removing one training sample.

Without clipping, a single outlier sample could produce an arbitrarily large gradient, requiring infinite noise for DP. With clipping, we guarantee:

Δ₂(gradient sum) ≤ C

This allows us to calibrate noise precisely: σ = noise_multiplier × C.

The Clipping Norm Tradeoff:

C too small — Gradients from normal samples are heavily clipped, biasing the update and slowing convergence
C too large — Must add more noise to maintain the same ε, degrading utility
Optimal C — Typically chosen to clip 50-90% of per-sample gradient norms

DP-SGD Hyperparameter Guidelines
Parameter	Typical Range	Effect on Privacy	Effect on Utility
Clipping norm C	0.1 - 10.0	Lower C allows less noise for same ε	Too low clips informative gradients
Noise multiplier σ	0.5 - 2.0	Higher σ → stronger privacy (lower ε)	Higher σ → more noise → slower convergence
Batch size	256 - 2048	Larger batches → better privacy amplification	Limited by memory, diminishing returns
Epochs	1 - 10	More epochs → larger ε (composition)	More epochs → better model performance

Privacy Amplification by Subsampling

If you sample q% of data per batch, you get privacy amplification: effective ε ≈ q × ε_base. In federated learning, sampling a fraction of clients per round provides similar amplification. This is why participating in fewer rounds or with smaller batches improves privacy.

Secure Aggregation: Cryptographic Protection

Secure Aggregation (SecAgg) is a cryptographic protocol that ensures the server learns only the aggregate of client updates—never individual contributions. This provides a fundamentally different privacy guarantee than differential privacy, protecting against a curious-but-honest server.

The SecAgg Protocol (Bonawitz et al., 2017):

Each client masks their update with random values that sum to zero across all clients. The server receives masked updates and can compute their sum (where masks cancel), but cannot recover individual updates.

secure_aggregation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
# Secure Aggregation Protocol (Simplified)
# Based on Bonawitz et al., CCS 2017
 
import secrets
from typing import Dict, List, Tuple
import numpy as np
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
 
class SecureAggregation:
    """
    Secure Aggregation Protocol Implementation.
    
    Key insight: Clients add pairwise masks that cancel in the sum.
    For clients i, j: mask_{i,j} = -mask_{j,i}
    When server sums all updates, masks cancel, revealing true sum.
    
    Security property: Server learns only Σᵢxᵢ, not individual xᵢ.
    
    Protocol phases:
    1. Key advertisement: Clients exchange Diffie-Hellman public keys
    2. Share keys: Clients secret-share their keys for dropout recovery
    3. Masked input: Clients submit masked updates
    4. Unmasking: Surviving clients help reconstruct dropped clients' masks
    """
    
    def __init__(self, num_clients: int, threshold: int):
        """
        Initialize SecAgg protocol.
        
        Args:
            num_clients: Total number of participating clients
            threshold: Minimum clients needed for aggregation
        """
        self.num_clients = num_clients
        self.threshold = threshold
        self.client_keys: Dict[int, bytes] = {}
        self.pairwise_masks: Dict[Tuple[int, int], np.ndarray] = {}
    
    def setup_pairwise_keys(self) -> Dict[int, Dict[int, bytes]]:
        """
        Phase 1: Clients exchange Diffie-Hellman keys to establish
        pairwise shared secrets.
        
        Each pair (i, j) derives a shared secret s_{i,j} = s_{j,i}
        using Diffie-Hellman key exchange.
        """
        # In practice, use proper DH key exchange
        # Here we simulate with random shared secrets
        shared_secrets = {}
        
        for i in range(self.num_clients):
            shared_secrets[i] = {}
            for j in range(self.num_clients):
                if i != j:
                    # Derive symmetric pairwise key
                    shared_secrets[i][j] = self._derive_pairwise_key(i, j)
        
        return shared_secrets
    
    def _derive_pairwise_key(self, i: int, j: int) -> bytes:
        """
        Derive pairwise key using HKDF.
        In practice, this uses Diffie-Hellman shared secret.
        """
        # Ensure consistent ordering for symmetric key
        pair = tuple(sorted([i, j]))
        seed = f"pair_{pair[0]}_{pair[1]}".encode()
        return HKDF(
            algorithm=hashes.SHA256(),
            length=32,
            salt=None,
            info=b'secagg-mask'
        ).derive(seed)
    
    def generate_pairwise_mask(
        self, 
        client_i: int, 
        client_j: int, 
        shape: Tuple[int, ...],
        shared_key: bytes
    ) -> np.ndarray:
        """
        Generate pairwise mask m_{i,j}.
        
        Critical property: m_{i,j} = -m_{j,i}
        This ensures masks cancel in the sum.
        
        Implementation: Use shared key as PRG seed, 
        negate if i > j.
        """
        # Use shared key to seed random number generator
        rng = np.random.default_rng(
            int.from_bytes(shared_key[:8], 'big')
        )
        
        # Generate mask
        mask = rng.standard_normal(shape).astype(np.float32)
        
        # Negate for one direction to ensure cancellation
        if client_i > client_j:
            mask = -mask
        
        return mask
    
    def mask_update(
        self,
        client_id: int,
        raw_update: np.ndarray,
        pairwise_keys: Dict[int, bytes]
    ) -> np.ndarray:
        """
        Mask a client's update for secure transmission.
        
        Masked update: ŷᵢ = xᵢ + Σⱼ m_{i,j} + rᵢ
        
        Where:
        - xᵢ is raw update
        - m_{i,j} are pairwise masks (cancel with j's contribution)
        - rᵢ is self-mask (for dropout recovery)
        """
        masked = raw_update.copy()
        
        # Add pairwise masks
        for j, key in pairwise_keys.items():
            mask = self.generate_pairwise_mask(
                client_id, j, raw_update.shape, key
            )
            masked += mask
        
        # Add self-mask (shared via secret sharing for dropout recovery)
        self_mask = self._generate_self_mask(client_id, raw_update.shape)
        masked += self_mask
        
        return masked
    
    def aggregate_masked_updates(
        self,
        masked_updates: Dict[int, np.ndarray],
        surviving_clients: List[int]
    ) -> np.ndarray:
        """
        Aggregate masked updates from surviving clients.
        
        For surviving clients, pairwise masks cancel:
        Σᵢ m_{i,j} + Σⱼ m_{j,i} = 0
        
        For dropped clients, surviving clients reconstruct
        the negative of dropped clients' self-masks.
        
        Result: Server learns only Σᵢ xᵢ
        """
        # Sum all masked updates
        aggregate = sum(masked_updates.values())
        
        # Handle dropped clients' self-masks
        dropped_clients = set(range(self.num_clients)) - set(surviving_clients)
        
        for dropped_id in dropped_clients:
            # Reconstruct dropped client's self-mask from secret shares
            # (held by surviving clients)
            reconstructed_self_mask = self._reconstruct_self_mask(
                dropped_id, surviving_clients
            )
            # Subtract to cancel the self-mask
            aggregate -= reconstructed_self_mask
        
        return aggregate
    
    def _generate_self_mask(
        self, 
        client_id: int, 
        shape: Tuple[int, ...]
    ) -> np.ndarray:
        """Generate client's self-mask for dropout recovery."""
        rng = np.random.default_rng(client_id * 1000)
        return rng.standard_normal(shape).astype(np.float32)
    
    def _reconstruct_self_mask(
        self, 
        dropped_id: int,
        surviving_clients: List[int]
    ) -> np.ndarray:
        """
        Reconstruct dropped client's self-mask from secret shares.
        Uses Shamir's secret sharing for threshold reconstruction.
        """
        # In practice: collect shares from t surviving clients
        # and use polynomial interpolation
        return self._generate_self_mask(dropped_id, self.update_shape)
 
 
class SecAggWithDropouts:
    """
    Production-ready SecAgg handling client dropouts.
    
    The protocol tolerates up to n-t dropouts while maintaining
    security and correctness, where t is the threshold.
    """
    
    def execute_protocol(
        self,
        client_updates: Dict[int, np.ndarray],
        dropout_probability: float = 0.1
    ) -> np.ndarray:
        """
        Execute full SecAgg protocol with dropout handling.
        
        Protocol rounds:
        1. Advertise keys
        2. Share keys  
        3. Submit masked inputs
        4. Unmask (handle dropouts)
        """
        # Simulate client dropouts
        active_clients = [
            cid for cid in client_updates.keys()
            if np.random.random() > dropout_probability
        ]
        
        if len(active_clients) < self.threshold:
            raise ProtocolFailedError(
                f"Only {len(active_clients)} clients survived, "
                f"need {self.threshold}"
            )
        
        # Execute aggregation with surviving clients
        return self._aggregate_with_recovery(
            {cid: client_updates[cid] for cid in active_clients}
        )

SecAgg vs. Differential Privacy:

Secure aggregation and differential privacy provide complementary protections:

SecAgg protects against a curious server reading individual updates. It provides perfect protection if the server is honest but curious.
DP protects against inferences from the aggregate output. It provides protection even against powerful adversaries who see all outputs.

Best practice: Use both together. SecAgg ensures clients don't expose updates to the server. DP ensures the aggregated model doesn't leak individual information.

Computational Cost:

SecAgg imposes overhead:

O(n²) key exchanges during setup (can be amortized)
O(n) mask computations per client per round
Communication overhead: ~2x without dropout, higher with dropout recovery

For cross-device FL with millions of clients, this is significant. Optimizations include hierarchical SecAgg and single-server protocols.

Trust Model Matters

SecAgg protects against honest-but-curious servers. If the server is actively malicious (sends different models to different clients, lies about aggregates), additional measures are needed: verifiable aggregation, Byzantine-robust protocols, or trusted execution environments.

Privacy Composition: Managing the Privacy Budget

Training a model requires many rounds of gradient updates. If each round provides (ε, δ)-DP, what is the total privacy guarantee after T rounds? This is the composition problem, and naive analysis vastly overestimates privacy loss.

Basic Composition Theorem:

If mechanisms M₁, M₂, ..., Mₜ each satisfy (εᵢ, δᵢ)-DP, their composition satisfies:

(Σεᵢ, Σδᵢ)-DP

This is simple but loose. For 1000 rounds with ε = 0.01 each, basic composition gives ε = 10, which is poor privacy.

Advanced Composition Theorem:

For T mechanisms each satisfying (ε₀, δ₀)-DP:

Total ε ≤ √(2T ln(1/δ')) · ε₀ + T · ε₀ · (e^ε₀ - 1)

For small ε₀, this is approximately O(√T · ε₀)—much better than O(T · ε₀).

Privacy Composition for 1000 Training Rounds (ε₀ = 0.01 per round)
Composition Method	Total ε	Interpretation
Basic (linear)	10.0	Very poor privacy
Advanced (√T)	0.45	Reasonable privacy
Moments Accountant (RDP)	0.35	Tight, practical bound
Privacy Loss Distributions	0.31	State-of-the-art tight bound

Rényi Differential Privacy (RDP):

RDP provides even tighter composition bounds. It tracks privacy loss via Rényi divergences, which compose additively:

Track: Dₐ(M(D) || M(D')) — the α-Rényi divergence between output distributions
Compose: Rényi divergences add across mechanisms
Convert: Convert back to (ε, δ)-DP when reporting

RDP is the standard for modern DP implementations (TensorFlow Privacy, Opacus).

Practical Privacy Budgeting:

Before training, decide:

Target ε — What total privacy level is acceptable? ε ≤ 1 is strong; ε ≤ 10 is meaningful.
δ — Typically 1/(10 × dataset size) to be negligible.
Training budget — How many rounds? How many epochs?

Then work backward: given target ε and T rounds, compute per-round noise multiplier needed.

privacy_accounting.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# Privacy Accounting with Rényi Differential Privacy
from dataclasses import dataclass
from typing import List, Tuple
import numpy as np
from scipy import special
 
@dataclass
class PrivacyBudget:
    """Track privacy budget consumption."""
    target_epsilon: float
    target_delta: float
    consumed_epsilon: float = 0.0
    consumed_delta: float = 0.0
    
    def remaining(self) -> Tuple[float, float]:
        return (
            self.target_epsilon - self.consumed_epsilon,
            self.target_delta - self.consumed_delta
        )
    
    def is_exhausted(self) -> bool:
        return (
            self.consumed_epsilon >= self.target_epsilon or
            self.consumed_delta >= self.target_delta
        )
 
 
class RenyiDifferentialPrivacy:
    """
    Rényi Differential Privacy (RDP) accounting.
    
    Mironov, 2017: "Rényi Differential Privacy"
    
    Key advantages:
    1. Tight composition (RDP adds across mechanisms)
    2. Natural for subsampled Gaussian mechanism
    3. Converts to (ε, δ)-DP for final reporting
    """
    
    @staticmethod
    def compute_rdp_gaussian(
        sampling_rate: float,
        noise_multiplier: float,
        orders: List[float]
    ) -> List[float]:
        """
        Compute RDP for subsampled Gaussian mechanism.
        
        For Gaussian mechanism with subsampling probability q
        and noise multiplier σ (noise std = σ × sensitivity):
        
        RDP at order α ≈ α / (2σ²)  [simplified, full formula complex]
        
        Subsampling provides amplification: effective RDP is much smaller.
        
        Args:
            sampling_rate: Probability q of including each record
            noise_multiplier: σ = noise_std / sensitivity
            orders: Rényi orders α to compute RDP for
        
        Returns:
            RDP at each order
        """
        rdp = []
        for order in orders:
            if order == 1:
                # KL divergence (order 1 Rényi)
                rdp.append(0)  # Needs special handling
            elif sampling_rate == 1.0:
                # No subsampling: α / (2σ²)
                rdp.append(order / (2 * noise_multiplier ** 2))
            else:
                # With subsampling: use numerical integration
                # Approximation for small q:
                rdp.append(
                    np.log1p(
                        sampling_rate ** 2 * 
                        (np.exp(order / (2 * noise_multiplier ** 2)) - 1)
                    ) / (order - 1)
                )
        return rdp
    
    @staticmethod
    def rdp_to_epsilon(
        rdp_values: List[float],
        orders: List[float],
        target_delta: float
    ) -> float:
        """
        Convert RDP to (ε, δ)-DP.
        
        For each order α with RDP value ρ:
            ε ≤ ρ + log(1/δ) / (α - 1)
        
        Return minimum ε across all orders.
        """
        epsilons = []
        for rdp, order in zip(rdp_values, orders):
            if order == 1:
                epsilon = rdp  # KL divergence equals ε for order 1
            else:
                epsilon = rdp + np.log(1 / target_delta) / (order - 1)
            epsilons.append(epsilon)
        
        return min(epsilons)
    
    @staticmethod
    def compose_rdp(
        rdp_per_round: List[float],
        num_rounds: int
    ) -> List[float]:
        """
        Compose RDP across multiple rounds.
        
        RDP composes additively: total RDP = sum of per-round RDP.
        This is the beauty of RDP—simple addition gives tight bounds.
        """
        return [rdp * num_rounds for rdp in rdp_per_round]
 
 
def plan_private_training(
    target_epsilon: float,
    target_delta: float,
    num_rounds: int,
    samples_per_round: int,
    total_samples: int
) -> float:
    """
    Plan private training by computing required noise multiplier.
    
    Given privacy budget and training plan, compute the minimum
    noise_multiplier that stays within budget.
    
    Args:
        target_epsilon: Privacy budget ε
        target_delta: Privacy budget δ
        num_rounds: Number of training rounds
        samples_per_round: Samples used per round
        total_samples: Total samples in dataset
    
    Returns:
        Required noise_multiplier σ
    """
    sampling_rate = samples_per_round / total_samples
    orders = [1 + x / 10.0 for x in range(1, 100)]
    
    # Binary search for noise multiplier
    low, high = 0.1, 100.0
    
    while high - low > 0.01:
        mid = (low + high) / 2
        
        # Compute RDP for this noise multiplier
        rdp_per_round = RenyiDifferentialPrivacy.compute_rdp_gaussian(
            sampling_rate, mid, orders
        )
        
        # Compose across rounds
        total_rdp = RenyiDifferentialPrivacy.compose_rdp(
            rdp_per_round, num_rounds
        )
        
        # Convert to epsilon
        achieved_epsilon = RenyiDifferentialPrivacy.rdp_to_epsilon(
            total_rdp, orders, target_delta
        )
        
        if achieved_epsilon > target_epsilon:
            low = mid  # Need more noise
        else:
            high = mid  # Can use less noise
    
    return high
 
 
# Example usage
noise_multiplier = plan_private_training(
    target_epsilon=1.0,
    target_delta=1e-5,
    num_rounds=100,
    samples_per_round=256,
    total_samples=60000  # MNIST size
)
print(f"Required noise multiplier: {noise_multiplier:.2f}")

Trusted Execution Environments (TEEs)

Trusted Execution Environments provide a hardware-based approach to privacy, creating secure enclaves where computation occurs in a protected region that even the host system's administrator cannot inspect.

TEE Technologies:

Intel SGX (Software Guard Extensions) — Creates encrypted memory enclaves. Code and data inside the enclave are protected from the OS, hypervisor, and physical attackers.
ARM TrustZone — Provides a secure world isolated from the normal world. Used in mobile devices and IoT.
AMD SEV (Secure Encrypted Virtualization) — Encrypts VM memory, protecting against hypervisor inspection.

TEE Use Cases in Federated Learning

•Secure Aggregation in Hardware — Perform aggregation inside a TEE. Clients can verify (via remote attestation) that the aggregation code is correct and secure.
•Private Model Evaluation — Clients send encrypted data to a TEE-protected server, which decrypts, evaluates the model, and returns encrypted results.
•Trusted Coordinator — The FL coordinator runs inside a TEE, ensuring it cannot observe individual updates even if the hosting infrastructure is compromised.
•Hybrid Approaches — Combine TEEs with differential privacy: TEE provides confidentiality, DP provides formal guarantees against inference.

TEE Advantages

•No additional noise—full model utility
•Hardware-enforced isolation
•Remote attestation for verification
•Fast execution (native speed)
•Protects code and data confidentiality

TEE Limitations

•Side-channel attacks (Spectre, Foreshadow)
•Limited enclave memory (SGX: ~90MB)
•Trusted computing base includes Intel/AMD
•Not universally available on edge devices
•Complex development and deployment

TEEs Are Not Silver Bullets

Numerous side-channel attacks have broken SGX guarantees in practice. Use TEEs as defense in depth alongside differential privacy and secure aggregation, not as the sole protection. The combination provides layered security: even if one layer fails, others maintain protection.

The Privacy-Utility Tradeoff

Privacy protection is not free. Adding noise to ensure differential privacy degrades model accuracy. Understanding and optimizing this tradeoff is essential for practical private FL.

The Fundamental Tradeoff:

Stronger privacy (lower ε) → More noise → Lower utility (accuracy)
Higher utility (accuracy) → Less noise → Weaker privacy (higher ε)

There is no way to achieve perfect privacy (ε = 0) with any meaningful learning. The art is in finding the sweet spot for your use case.

Accuracy Impact of Privacy Level (ImageNet Classification)
Privacy Level (ε)	Noise Multiplier (σ)	Top-1 Accuracy (%)	Accuracy Drop
∞ (no privacy)	0	76.6	0%
10 (weak)	0.5	75.2	-1.4%
3 (moderate)	1.0	71.8	-4.8%
1 (strong)	2.0	65.4	-11.2%
0.3 (very strong)	4.0	54.2	-22.4%

Strategies to Improve the Tradeoff:

More data — Privacy cost is per-sample. More samples means less noise per sample for the same total privacy.
More clients — In FL, noise is added after aggregation. With n clients, per-client noise is √n smaller for the same aggregate privacy.
Pre-training — Start from a pre-trained model (on public data). Fine-tuning requires fewer rounds than training from scratch.
Gradient compression — Communicating fewer gradient components naturally reduces what can be inferred. Combine with DP for better tradeoffs.
Private feature learning — Make early layers public (trained on non-sensitive features), keep only later layers private.

Practical Guidance

For most applications, ε between 1 and 10 provides meaningful privacy without catastrophic accuracy loss. ε < 1 is rarely achieved in practice without significant accuracy degradation. Work with your privacy and legal teams to determine acceptable thresholds—there's no universal 'right' value.

Summary: Privacy Preservation in FL

We've covered the critical privacy landscape of federated learning. Let's consolidate:

Key Takeaways

•FL's default privacy is insufficient — Gradient updates leak information. Gradient inversion can reconstruct training data; membership inference reveals participation.
•Differential privacy provides formal guarantees — The (ε, δ)-DP definition ensures that any individual's participation has bounded influence on outputs, regardless of adversary power.
•DP-SGD enables private training — Clip per-sample gradients to bound sensitivity, add Gaussian noise calibrated to the clipping norm.
•Secure aggregation hides individual contributions — Cryptographic protocols ensure the server sees only the aggregate, not individual updates.
•Privacy composition requires careful accounting — Use RDP or advanced composition for tight bounds across many training rounds.
•Defense in depth is essential — Combine DP + SecAgg + TEEs for layered protection. No single technique is sufficient alone.
•Privacy has utility costs — Stronger privacy requires more noise. Optimize through more data, more clients, and pre-training.

What's Next:

With privacy fundamentals established, we'll tackle Communication Efficiency in the next page. Federated learning often operates over slow, metered networks where model updates of millions of parameters are prohibitively expensive. You'll learn gradient compression, quantization, and sparse communication techniques that reduce bandwidth requirements by 10-100x.

Page Complete

You now understand the privacy threat landscape and defense mechanisms in federated learning. You can implement DP-SGD, explain secure aggregation cryptographic principles, and reason about privacy composition. Next, we address the communication bottleneck that constrains real-world FL deployments.

2 / 5

Loading learning content...

Machine LearningFederated Learning

Federated Learning

LevelAdvanced

Duration120 mins

TopicFederated Learning

2 / 5

Privacy Preservation: Rigorous Privacy Guarantees in Federated Learning

The Illusion of Privacy

What You Will Learn

Privacy Threats in Federated Learning

Before implementing defenses, we must understand what we're defending against. Privacy attacks on federated learning fall into three primary categories:

1. Gradient Inversion Attacks (Data Reconstruction)

gradient_inversion_attack.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Gradient Inversion Attack: Deep Leakage from Gradients
# Zhu et al., NeurIPS 2019
 
import torch
import torch.nn.functional as F
 
def gradient_inversion_attack(
    model: torch.nn.Module,
    observed_gradients: List[torch.Tensor],
    image_shape: Tuple[int, int, int, int],  # (B, C, H, W)
    num_iterations: int = 300,
    learning_rate: float = 1.0,
    tv_weight: float = 0.01  # Total variation regularization
) -> torch.Tensor:
    """
    Reconstruct training data from observed gradients.
    
    The attack optimizes dummy inputs such that their gradients
    match the observed gradients. When successful, dummy inputs
    closely resemble the original training data.
    
    Mathematical formulation:
        x* = argmin_x || ∇L(model, x) - ∇_observed ||² + λ·TV(x)
    
    Where TV(x) is total variation regularization for smoothness.
    
    Args:
        model: The neural network model
        observed_gradients: Gradients shared by the victim client
        image_shape: Shape of images to reconstruct
        num_iterations: Number of optimization steps
        learning_rate: Learning rate for reconstruction optimizer
        tv_weight: Weight for total variation regularization
    
    Returns:
        Reconstructed images approximating training data
    """
    # Initialize random "dummy" data to optimize
    dummy_data = torch.randn(image_shape, requires_grad=True)
    dummy_labels = torch.randn((image_shape[0], num_classes), requires_grad=True)
    
    optimizer = torch.optim.LBFGS(
        [dummy_data, dummy_labels], 
        lr=learning_rate
    )
    
    for iteration in range(num_iterations):
        def closure():
            optimizer.zero_grad()
            
            # Compute gradients on dummy data
            dummy_outputs = model(dummy_data)
            dummy_loss = F.cross_entropy(dummy_outputs, F.softmax(dummy_labels, dim=-1))
            dummy_gradients = torch.autograd.grad(
                dummy_loss, 
                model.parameters(), 
                create_graph=True
            )
            
            # Minimize distance between dummy gradients and observed gradients
            gradient_distance = sum(
                ((dg - og) ** 2).sum()
                for dg, og in zip(dummy_gradients, observed_gradients)
            )
            
            # Total variation regularization (encourages smooth images)
            tv_loss = total_variation(dummy_data)
            
            total_loss = gradient_distance + tv_weight * tv_loss
            total_loss.backward()
            
            return total_loss
        
        optimizer.step(closure)
        
        # Clamp to valid image range
        with torch.no_grad():
            dummy_data.clamp_(0, 1)
    
    return dummy_data.detach()
 
 
def total_variation(images: torch.Tensor) -> torch.Tensor:
    """
    Total variation regularization.
    Penalizes large differences between adjacent pixels,
    encouraging smooth, natural-looking reconstructions.
    """
    diff_h = images[:, :, 1:, :] - images[:, :, :-1, :]
    diff_w = images[:, :, :, 1:] - images[:, :, :, :-1]
    return torch.mean(diff_h ** 2) + torch.mean(diff_w ** 2)
 
 
# Attack effectiveness demonstration
# With batch size 1, reconstruction achieves >90% PSNR
# Larger batches make individual sample recovery harder
# But metadata (class distribution, statistics) still leaks

2. Membership Inference Attacks

Membership inference exploits the fact that models behave differently on data they've seen (lower loss, higher confidence) versus unseen data.

3. Property Inference Attacks

These attacks extract aggregate properties about a client's data that may be sensitive even if individual records aren't exposed. For example:

Inferring the demographic distribution of a hospital's patients
Determining the income bracket distribution of a bank's customers
Discovering what languages a user types in

Privacy Attack Taxonomy in Federated Learning
Attack Type	Attacker Goal	Information Leaked	Defense Approaches
Gradient Inversion	Reconstruct training samples	Individual data points	Gradient clipping, noise, SecAgg
Membership Inference	Determine if x ∈ training set	Data membership	Differential privacy, regularization
Property Inference	Learn aggregate data properties	Dataset statistics	Differential privacy, secure aggregation
Model Memorization	Extract memorized secrets	Verbatim training data	DP, deduplication, output filtering
Model Stealing	Replicate model functionality	Model IP/weights	Rate limiting, watermarking

Gradient Attacks Are Practical

Differential Privacy: The Mathematical Foundation

The Core Definition:

A randomized mechanism M satisfies (ε, δ)-differential privacy if for any two neighboring datasets D and D' (differing in exactly one record), and for any output S:

P[M(D) ∈ S] ≤ e^ε · P[M(D') ∈ S] + δ

Understanding ε and δ

•ε (epsilon) — Privacy Loss — Measures how distinguishable outputs are. Smaller ε means stronger privacy. ε = 0 means perfect privacy (outputs identical regardless of participation). ε = 1 is considered reasonable privacy; ε > 10 provides weak privacy.
•δ (delta) — Failure Probability — The probability that the ε guarantee fails to hold. Should be cryptographically small (e.g., 10⁻⁵ or smaller). Often set to less than 1/(dataset size).
•e^ε Interpretation — If ε = 1, the likelihood ratio between D and D' is at most e ≈ 2.72. An adversary gains at most a 2.72x advantage from your participation.
•Plausible Deniability — Even if output suggests your data was included, you can plausibly deny it—the same output could have occurred without you.

differential_privacy_mechanisms.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
# Differential Privacy Mechanisms for Federated Learning
import numpy as np
from typing import Tuple, Callable
from scipy import stats
 
class DifferentialPrivacy:
    """
    Implementation of core differential privacy mechanisms
    used in federated learning.
    """
    
    @staticmethod
    def gaussian_mechanism(
        true_value: np.ndarray,
        sensitivity: float,
        epsilon: float,
        delta: float
    ) -> np.ndarray:
        """
        Gaussian mechanism for (ε, δ)-differential privacy.
        
        Adds Gaussian noise calibrated to the L2 sensitivity of the query.
        
        Noise scale: σ = Δ₂ · √(2 ln(1.25/δ)) / ε
        
        Where Δ₂ is the L2 sensitivity: max ||f(D) - f(D')||₂
        over all neighboring datasets D, D'.
        
        Args:
            true_value: The true query result to privatize
            sensitivity: L2 sensitivity of the query
            epsilon: Privacy parameter ε
            delta: Privacy parameter δ
        
        Returns:
            Noisy value satisfying (ε, δ)-DP
        """
        # Calculate noise scale using Gaussian mechanism formula
        # σ ≥ Δ₂ · √(2 ln(1.25/δ)) / ε
        sigma = sensitivity * np.sqrt(2 * np.log(1.25 / delta)) / epsilon
        
        # Sample Gaussian noise and add to true value
        noise = np.random.normal(0, sigma, size=true_value.shape)
        
        return true_value + noise
    
    @staticmethod
    def laplace_mechanism(
        true_value: np.ndarray,
        sensitivity: float,
        epsilon: float
    ) -> np.ndarray:
        """
        Laplace mechanism for ε-differential privacy (pure DP).
        
        Adds Laplace noise calibrated to the L1 sensitivity.
        
        Noise scale: b = Δ₁ / ε
        
        Where Δ₁ is the L1 sensitivity: max ||f(D) - f(D')||₁
        
        Provides pure DP (δ = 0) but requires more noise than Gaussian.
        
        Args:
            true_value: The true query result to privatize
            sensitivity: L1 sensitivity of the query
            epsilon: Privacy parameter ε
        
        Returns:
            Noisy value satisfying ε-DP
        """
        scale = sensitivity / epsilon
        noise = np.random.laplace(0, scale, size=true_value.shape)
        return true_value + noise
 
 
class DPFederatedLearning:
    """
    Differentially Private Federated Learning implementation
    following the DP-SGD approach (Abadi et al., 2016).
    """
    
    def __init__(
        self,
        target_epsilon: float,
        target_delta: float,
        clip_norm: float,
        noise_multiplier: float,
        num_rounds: int
    ):
        """
        Initialize DP-FL with privacy budget.
        
        Args:
            target_epsilon: Total privacy budget ε for all rounds
            target_delta: Target δ (typically 1/n for n users)
            clip_norm: Maximum L2 norm for client updates (sensitivity bound)
            noise_multiplier: σ/C ratio for Gaussian noise
            num_rounds: Total training rounds (affects privacy composition)
        """
        self.target_epsilon = target_epsilon
        self.target_delta = target_delta
        self.clip_norm = clip_norm
        self.noise_multiplier = noise_multiplier
        self.num_rounds = num_rounds
        
        # Track privacy spent so far
        self.rounds_completed = 0
    
    def clip_gradient(self, gradient: np.ndarray) -> np.ndarray:
        """
        Clip gradient to bound L2 sensitivity.
        
        Per-sample gradient clipping ensures that no single training
        example can influence the update by more than clip_norm.
        
        This is CRITICAL: without clipping, sensitivity is unbounded
        and no finite noise can achieve DP.
        """
        grad_norm = np.linalg.norm(gradient)
        
        if grad_norm > self.clip_norm:
            # Scale down to have exactly clip_norm magnitude
            gradient = gradient * (self.clip_norm / grad_norm)
        
        return gradient
    
    def add_noise_to_aggregate(
        self,
        aggregated_gradient: np.ndarray,
        num_clients: int
    ) -> np.ndarray:
        """
        Add calibrated Gaussian noise to aggregated update.
        
        The noise scale accounts for:
        1. The clip norm C (bounds sensitivity)
        2. The noise multiplier σ
        3. The number of clients (amplification via sampling)
        
        Returns:
            Noisy aggregate satisfying per-round DP guarantee
        """
        # Standard deviation of noise
        # σ_aggregate = noise_multiplier * C / num_clients
        sigma = self.noise_multiplier * self.clip_norm / num_clients
        
        noise = np.random.normal(0, sigma, size=aggregated_gradient.shape)
        noisy_aggregate = aggregated_gradient + noise
        
        self.rounds_completed += 1
        
        return noisy_aggregate
    
    def compute_privacy_spent(self) -> Tuple[float, float]:
        """
        Compute privacy budget spent using moments accountant.
        
        The Rényi Differential Privacy (RDP) framework provides tight
        privacy composition, essential for multi-round FL.
        
        Returns:
            (epsilon_spent, delta) tuple
        """
        # Simplified privacy accounting
        # In practice, use tensorflow-privacy or opacus for tight bounds
        from dp_accounting import compute_rdp, get_privacy_spent
        
        # Compute RDP at multiple orders
        orders = [1 + x / 10.0 for x in range(1, 100)]
        sampling_probability = 1.0  # If not subsampling clients
        
        rdp = compute_rdp(
            q=sampling_probability,
            noise_multiplier=self.noise_multiplier,
            steps=self.rounds_completed,
            orders=orders
        )
        
        epsilon_spent, _, _ = get_privacy_spent(
            orders, rdp, target_delta=self.target_delta
        )
        
        return epsilon_spent, self.target_delta
 
 
def dp_federated_averaging_round(
    global_model: np.ndarray,
    client_gradients: List[np.ndarray],
    dp_fl: DPFederatedLearning,
    client_weights: List[float]
) -> np.ndarray:
    """
    Execute one round of DP-FedAvg.
    
    Steps:
    1. Clip each client's gradient to bound sensitivity
    2. Compute weighted average of clipped gradients
    3. Add calibrated Gaussian noise to the aggregate
    4. Apply noisy update to global model
    """
    # Step 1: Clip each client's gradient
    clipped_gradients = [
        dp_fl.clip_gradient(grad) for grad in client_gradients
    ]
    
    # Step 2: Weighted average
    total_weight = sum(client_weights)
    aggregated = sum(
        (w / total_weight) * grad 
        for w, grad in zip(client_weights, clipped_gradients)
    )
    
    # Step 3: Add noise
    noisy_aggregate = dp_fl.add_noise_to_aggregate(
        aggregated, 
        num_clients=len(client_gradients)
    )
    
    # Step 4: Update model
    updated_model = global_model - noisy_aggregate  # Gradient descent
    
    return updated_model

DP-SGD: Making Gradient Descent Private

The DP-SGD Algorithm:

Per-Sample Gradient Clipping — Compute gradients for each sample individually and clip their L2 norm to a maximum value C. This bounds the sensitivity of the gradient computation.
Noise Addition — Add Gaussian noise proportional to C to the sum of clipped gradients. The noise scale σ determines the privacy-utility tradeoff.
Privacy Accounting — Track privacy loss across training iterations using advanced composition theorems (moments accountant, RDP).

Converting Mermaid diagram...

Sensitivity Bounding via Clipping:

The key insight of DP-SGD is that clipping gradients bounds their sensitivity—the maximum change in the output from adding or removing one training sample.

Without clipping, a single outlier sample could produce an arbitrarily large gradient, requiring infinite noise for DP. With clipping, we guarantee:

Δ₂(gradient sum) ≤ C

This allows us to calibrate noise precisely: σ = noise_multiplier × C.

The Clipping Norm Tradeoff:

C too small — Gradients from normal samples are heavily clipped, biasing the update and slowing convergence
C too large — Must add more noise to maintain the same ε, degrading utility
Optimal C — Typically chosen to clip 50-90% of per-sample gradient norms

DP-SGD Hyperparameter Guidelines
Parameter	Typical Range	Effect on Privacy	Effect on Utility
Clipping norm C	0.1 - 10.0	Lower C allows less noise for same ε	Too low clips informative gradients
Noise multiplier σ	0.5 - 2.0	Higher σ → stronger privacy (lower ε)	Higher σ → more noise → slower convergence
Batch size	256 - 2048	Larger batches → better privacy amplification	Limited by memory, diminishing returns
Epochs	1 - 10	More epochs → larger ε (composition)	More epochs → better model performance

Privacy Amplification by Subsampling

Secure Aggregation: Cryptographic Protection

The SecAgg Protocol (Bonawitz et al., 2017):

secure_aggregation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
# Secure Aggregation Protocol (Simplified)
# Based on Bonawitz et al., CCS 2017
 
import secrets
from typing import Dict, List, Tuple
import numpy as np
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
 
class SecureAggregation:
    """
    Secure Aggregation Protocol Implementation.
    
    Key insight: Clients add pairwise masks that cancel in the sum.
    For clients i, j: mask_{i,j} = -mask_{j,i}
    When server sums all updates, masks cancel, revealing true sum.
    
    Security property: Server learns only Σᵢxᵢ, not individual xᵢ.
    
    Protocol phases:
    1. Key advertisement: Clients exchange Diffie-Hellman public keys
    2. Share keys: Clients secret-share their keys for dropout recovery
    3. Masked input: Clients submit masked updates
    4. Unmasking: Surviving clients help reconstruct dropped clients' masks
    """
    
    def __init__(self, num_clients: int, threshold: int):
        """
        Initialize SecAgg protocol.
        
        Args:
            num_clients: Total number of participating clients
            threshold: Minimum clients needed for aggregation
        """
        self.num_clients = num_clients
        self.threshold = threshold
        self.client_keys: Dict[int, bytes] = {}
        self.pairwise_masks: Dict[Tuple[int, int], np.ndarray] = {}
    
    def setup_pairwise_keys(self) -> Dict[int, Dict[int, bytes]]:
        """
        Phase 1: Clients exchange Diffie-Hellman keys to establish
        pairwise shared secrets.
        
        Each pair (i, j) derives a shared secret s_{i,j} = s_{j,i}
        using Diffie-Hellman key exchange.
        """
        # In practice, use proper DH key exchange
        # Here we simulate with random shared secrets
        shared_secrets = {}
        
        for i in range(self.num_clients):
            shared_secrets[i] = {}
            for j in range(self.num_clients):
                if i != j:
                    # Derive symmetric pairwise key
                    shared_secrets[i][j] = self._derive_pairwise_key(i, j)
        
        return shared_secrets
    
    def _derive_pairwise_key(self, i: int, j: int) -> bytes:
        """
        Derive pairwise key using HKDF.
        In practice, this uses Diffie-Hellman shared secret.
        """
        # Ensure consistent ordering for symmetric key
        pair = tuple(sorted([i, j]))
        seed = f"pair_{pair[0]}_{pair[1]}".encode()
        return HKDF(
            algorithm=hashes.SHA256(),
            length=32,
            salt=None,
            info=b'secagg-mask'
        ).derive(seed)
    
    def generate_pairwise_mask(
        self, 
        client_i: int, 
        client_j: int, 
        shape: Tuple[int, ...],
        shared_key: bytes
    ) -> np.ndarray:
        """
        Generate pairwise mask m_{i,j}.
        
        Critical property: m_{i,j} = -m_{j,i}
        This ensures masks cancel in the sum.
        
        Implementation: Use shared key as PRG seed, 
        negate if i > j.
        """
        # Use shared key to seed random number generator
        rng = np.random.default_rng(
            int.from_bytes(shared_key[:8], 'big')
        )
        
        # Generate mask
        mask = rng.standard_normal(shape).astype(np.float32)
        
        # Negate for one direction to ensure cancellation
        if client_i > client_j:
            mask = -mask
        
        return mask
    
    def mask_update(
        self,
        client_id: int,
        raw_update: np.ndarray,
        pairwise_keys: Dict[int, bytes]
    ) -> np.ndarray:
        """
        Mask a client's update for secure transmission.
        
        Masked update: ŷᵢ = xᵢ + Σⱼ m_{i,j} + rᵢ
        
        Where:
        - xᵢ is raw update
        - m_{i,j} are pairwise masks (cancel with j's contribution)
        - rᵢ is self-mask (for dropout recovery)
        """
        masked = raw_update.copy()
        
        # Add pairwise masks
        for j, key in pairwise_keys.items():
            mask = self.generate_pairwise_mask(
                client_id, j, raw_update.shape, key
            )
            masked += mask
        
        # Add self-mask (shared via secret sharing for dropout recovery)
        self_mask = self._generate_self_mask(client_id, raw_update.shape)
        masked += self_mask
        
        return masked
    
    def aggregate_masked_updates(
        self,
        masked_updates: Dict[int, np.ndarray],
        surviving_clients: List[int]
    ) -> np.ndarray:
        """
        Aggregate masked updates from surviving clients.
        
        For surviving clients, pairwise masks cancel:
        Σᵢ m_{i,j} + Σⱼ m_{j,i} = 0
        
        For dropped clients, surviving clients reconstruct
        the negative of dropped clients' self-masks.
        
        Result: Server learns only Σᵢ xᵢ
        """
        # Sum all masked updates
        aggregate = sum(masked_updates.values())
        
        # Handle dropped clients' self-masks
        dropped_clients = set(range(self.num_clients)) - set(surviving_clients)
        
        for dropped_id in dropped_clients:
            # Reconstruct dropped client's self-mask from secret shares
            # (held by surviving clients)
            reconstructed_self_mask = self._reconstruct_self_mask(
                dropped_id, surviving_clients
            )
            # Subtract to cancel the self-mask
            aggregate -= reconstructed_self_mask
        
        return aggregate
    
    def _generate_self_mask(
        self, 
        client_id: int, 
        shape: Tuple[int, ...]
    ) -> np.ndarray:
        """Generate client's self-mask for dropout recovery."""
        rng = np.random.default_rng(client_id * 1000)
        return rng.standard_normal(shape).astype(np.float32)
    
    def _reconstruct_self_mask(
        self, 
        dropped_id: int,
        surviving_clients: List[int]
    ) -> np.ndarray:
        """
        Reconstruct dropped client's self-mask from secret shares.
        Uses Shamir's secret sharing for threshold reconstruction.
        """
        # In practice: collect shares from t surviving clients
        # and use polynomial interpolation
        return self._generate_self_mask(dropped_id, self.update_shape)
 
 
class SecAggWithDropouts:
    """
    Production-ready SecAgg handling client dropouts.
    
    The protocol tolerates up to n-t dropouts while maintaining
    security and correctness, where t is the threshold.
    """
    
    def execute_protocol(
        self,
        client_updates: Dict[int, np.ndarray],
        dropout_probability: float = 0.1
    ) -> np.ndarray:
        """
        Execute full SecAgg protocol with dropout handling.
        
        Protocol rounds:
        1. Advertise keys
        2. Share keys  
        3. Submit masked inputs
        4. Unmask (handle dropouts)
        """
        # Simulate client dropouts
        active_clients = [
            cid for cid in client_updates.keys()
            if np.random.random() > dropout_probability
        ]
        
        if len(active_clients) < self.threshold:
            raise ProtocolFailedError(
                f"Only {len(active_clients)} clients survived, "
                f"need {self.threshold}"
            )
        
        # Execute aggregation with surviving clients
        return self._aggregate_with_recovery(
            {cid: client_updates[cid] for cid in active_clients}
        )

SecAgg vs. Differential Privacy:

Secure aggregation and differential privacy provide complementary protections:

SecAgg protects against a curious server reading individual updates. It provides perfect protection if the server is honest but curious.
DP protects against inferences from the aggregate output. It provides protection even against powerful adversaries who see all outputs.

Best practice: Use both together. SecAgg ensures clients don't expose updates to the server. DP ensures the aggregated model doesn't leak individual information.

Computational Cost:

SecAgg imposes overhead:

O(n²) key exchanges during setup (can be amortized)
O(n) mask computations per client per round
Communication overhead: ~2x without dropout, higher with dropout recovery

For cross-device FL with millions of clients, this is significant. Optimizations include hierarchical SecAgg and single-server protocols.

Trust Model Matters

Privacy Composition: Managing the Privacy Budget

Basic Composition Theorem:

If mechanisms M₁, M₂, ..., Mₜ each satisfy (εᵢ, δᵢ)-DP, their composition satisfies:

(Σεᵢ, Σδᵢ)-DP

This is simple but loose. For 1000 rounds with ε = 0.01 each, basic composition gives ε = 10, which is poor privacy.

Advanced Composition Theorem:

For T mechanisms each satisfying (ε₀, δ₀)-DP:

Total ε ≤ √(2T ln(1/δ')) · ε₀ + T · ε₀ · (e^ε₀ - 1)

For small ε₀, this is approximately O(√T · ε₀)—much better than O(T · ε₀).

Privacy Composition for 1000 Training Rounds (ε₀ = 0.01 per round)
Composition Method	Total ε	Interpretation
Basic (linear)	10.0	Very poor privacy
Advanced (√T)	0.45	Reasonable privacy
Moments Accountant (RDP)	0.35	Tight, practical bound
Privacy Loss Distributions	0.31	State-of-the-art tight bound

Rényi Differential Privacy (RDP):

RDP provides even tighter composition bounds. It tracks privacy loss via Rényi divergences, which compose additively:

Track: Dₐ(M(D) || M(D')) — the α-Rényi divergence between output distributions
Compose: Rényi divergences add across mechanisms
Convert: Convert back to (ε, δ)-DP when reporting

RDP is the standard for modern DP implementations (TensorFlow Privacy, Opacus).

Practical Privacy Budgeting:

Before training, decide:

Target ε — What total privacy level is acceptable? ε ≤ 1 is strong; ε ≤ 10 is meaningful.
δ — Typically 1/(10 × dataset size) to be negligible.
Training budget — How many rounds? How many epochs?

Then work backward: given target ε and T rounds, compute per-round noise multiplier needed.

privacy_accounting.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# Privacy Accounting with Rényi Differential Privacy
from dataclasses import dataclass
from typing import List, Tuple
import numpy as np
from scipy import special
 
@dataclass
class PrivacyBudget:
    """Track privacy budget consumption."""
    target_epsilon: float
    target_delta: float
    consumed_epsilon: float = 0.0
    consumed_delta: float = 0.0
    
    def remaining(self) -> Tuple[float, float]:
        return (
            self.target_epsilon - self.consumed_epsilon,
            self.target_delta - self.consumed_delta
        )
    
    def is_exhausted(self) -> bool:
        return (
            self.consumed_epsilon >= self.target_epsilon or
            self.consumed_delta >= self.target_delta
        )
 
 
class RenyiDifferentialPrivacy:
    """
    Rényi Differential Privacy (RDP) accounting.
    
    Mironov, 2017: "Rényi Differential Privacy"
    
    Key advantages:
    1. Tight composition (RDP adds across mechanisms)
    2. Natural for subsampled Gaussian mechanism
    3. Converts to (ε, δ)-DP for final reporting
    """
    
    @staticmethod
    def compute_rdp_gaussian(
        sampling_rate: float,
        noise_multiplier: float,
        orders: List[float]
    ) -> List[float]:
        """
        Compute RDP for subsampled Gaussian mechanism.
        
        For Gaussian mechanism with subsampling probability q
        and noise multiplier σ (noise std = σ × sensitivity):
        
        RDP at order α ≈ α / (2σ²)  [simplified, full formula complex]
        
        Subsampling provides amplification: effective RDP is much smaller.
        
        Args:
            sampling_rate: Probability q of including each record
            noise_multiplier: σ = noise_std / sensitivity
            orders: Rényi orders α to compute RDP for
        
        Returns:
            RDP at each order
        """
        rdp = []
        for order in orders:
            if order == 1:
                # KL divergence (order 1 Rényi)
                rdp.append(0)  # Needs special handling
            elif sampling_rate == 1.0:
                # No subsampling: α / (2σ²)
                rdp.append(order / (2 * noise_multiplier ** 2))
            else:
                # With subsampling: use numerical integration
                # Approximation for small q:
                rdp.append(
                    np.log1p(
                        sampling_rate ** 2 * 
                        (np.exp(order / (2 * noise_multiplier ** 2)) - 1)
                    ) / (order - 1)
                )
        return rdp
    
    @staticmethod
    def rdp_to_epsilon(
        rdp_values: List[float],
        orders: List[float],
        target_delta: float
    ) -> float:
        """
        Convert RDP to (ε, δ)-DP.
        
        For each order α with RDP value ρ:
            ε ≤ ρ + log(1/δ) / (α - 1)
        
        Return minimum ε across all orders.
        """
        epsilons = []
        for rdp, order in zip(rdp_values, orders):
            if order == 1:
                epsilon = rdp  # KL divergence equals ε for order 1
            else:
                epsilon = rdp + np.log(1 / target_delta) / (order - 1)
            epsilons.append(epsilon)
        
        return min(epsilons)
    
    @staticmethod
    def compose_rdp(
        rdp_per_round: List[float],
        num_rounds: int
    ) -> List[float]:
        """
        Compose RDP across multiple rounds.
        
        RDP composes additively: total RDP = sum of per-round RDP.
        This is the beauty of RDP—simple addition gives tight bounds.
        """
        return [rdp * num_rounds for rdp in rdp_per_round]
 
 
def plan_private_training(
    target_epsilon: float,
    target_delta: float,
    num_rounds: int,
    samples_per_round: int,
    total_samples: int
) -> float:
    """
    Plan private training by computing required noise multiplier.
    
    Given privacy budget and training plan, compute the minimum
    noise_multiplier that stays within budget.
    
    Args:
        target_epsilon: Privacy budget ε
        target_delta: Privacy budget δ
        num_rounds: Number of training rounds
        samples_per_round: Samples used per round
        total_samples: Total samples in dataset
    
    Returns:
        Required noise_multiplier σ
    """
    sampling_rate = samples_per_round / total_samples
    orders = [1 + x / 10.0 for x in range(1, 100)]
    
    # Binary search for noise multiplier
    low, high = 0.1, 100.0
    
    while high - low > 0.01:
        mid = (low + high) / 2
        
        # Compute RDP for this noise multiplier
        rdp_per_round = RenyiDifferentialPrivacy.compute_rdp_gaussian(
            sampling_rate, mid, orders
        )
        
        # Compose across rounds
        total_rdp = RenyiDifferentialPrivacy.compose_rdp(
            rdp_per_round, num_rounds
        )
        
        # Convert to epsilon
        achieved_epsilon = RenyiDifferentialPrivacy.rdp_to_epsilon(
            total_rdp, orders, target_delta
        )
        
        if achieved_epsilon > target_epsilon:
            low = mid  # Need more noise
        else:
            high = mid  # Can use less noise
    
    return high
 
 
# Example usage
noise_multiplier = plan_private_training(
    target_epsilon=1.0,
    target_delta=1e-5,
    num_rounds=100,
    samples_per_round=256,
    total_samples=60000  # MNIST size
)
print(f"Required noise multiplier: {noise_multiplier:.2f}")

Trusted Execution Environments (TEEs)

TEE Technologies:

Intel SGX (Software Guard Extensions) — Creates encrypted memory enclaves. Code and data inside the enclave are protected from the OS, hypervisor, and physical attackers.
ARM TrustZone — Provides a secure world isolated from the normal world. Used in mobile devices and IoT.
AMD SEV (Secure Encrypted Virtualization) — Encrypts VM memory, protecting against hypervisor inspection.

TEE Use Cases in Federated Learning

•Secure Aggregation in Hardware — Perform aggregation inside a TEE. Clients can verify (via remote attestation) that the aggregation code is correct and secure.
•Private Model Evaluation — Clients send encrypted data to a TEE-protected server, which decrypts, evaluates the model, and returns encrypted results.
•Trusted Coordinator — The FL coordinator runs inside a TEE, ensuring it cannot observe individual updates even if the hosting infrastructure is compromised.
•Hybrid Approaches — Combine TEEs with differential privacy: TEE provides confidentiality, DP provides formal guarantees against inference.

TEE Advantages

•No additional noise—full model utility
•Hardware-enforced isolation
•Remote attestation for verification
•Fast execution (native speed)
•Protects code and data confidentiality

TEE Limitations

•Side-channel attacks (Spectre, Foreshadow)
•Limited enclave memory (SGX: ~90MB)
•Trusted computing base includes Intel/AMD
•Not universally available on edge devices
•Complex development and deployment

TEEs Are Not Silver Bullets

The Privacy-Utility Tradeoff

Privacy protection is not free. Adding noise to ensure differential privacy degrades model accuracy. Understanding and optimizing this tradeoff is essential for practical private FL.

The Fundamental Tradeoff:

Stronger privacy (lower ε) → More noise → Lower utility (accuracy)
Higher utility (accuracy) → Less noise → Weaker privacy (higher ε)

There is no way to achieve perfect privacy (ε = 0) with any meaningful learning. The art is in finding the sweet spot for your use case.

Accuracy Impact of Privacy Level (ImageNet Classification)
Privacy Level (ε)	Noise Multiplier (σ)	Top-1 Accuracy (%)	Accuracy Drop
∞ (no privacy)	0	76.6	0%
10 (weak)	0.5	75.2	-1.4%
3 (moderate)	1.0	71.8	-4.8%
1 (strong)	2.0	65.4	-11.2%
0.3 (very strong)	4.0	54.2	-22.4%

Strategies to Improve the Tradeoff:

More data — Privacy cost is per-sample. More samples means less noise per sample for the same total privacy.
More clients — In FL, noise is added after aggregation. With n clients, per-client noise is √n smaller for the same aggregate privacy.
Pre-training — Start from a pre-trained model (on public data). Fine-tuning requires fewer rounds than training from scratch.
Gradient compression — Communicating fewer gradient components naturally reduces what can be inferred. Combine with DP for better tradeoffs.
Private feature learning — Make early layers public (trained on non-sensitive features), keep only later layers private.

Practical Guidance

Summary: Privacy Preservation in FL

We've covered the critical privacy landscape of federated learning. Let's consolidate:

Key Takeaways

•FL's default privacy is insufficient — Gradient updates leak information. Gradient inversion can reconstruct training data; membership inference reveals participation.
•Differential privacy provides formal guarantees — The (ε, δ)-DP definition ensures that any individual's participation has bounded influence on outputs, regardless of adversary power.
•DP-SGD enables private training — Clip per-sample gradients to bound sensitivity, add Gaussian noise calibrated to the clipping norm.
•Secure aggregation hides individual contributions — Cryptographic protocols ensure the server sees only the aggregate, not individual updates.
•Privacy composition requires careful accounting — Use RDP or advanced composition for tight bounds across many training rounds.
•Defense in depth is essential — Combine DP + SecAgg + TEEs for layered protection. No single technique is sufficient alone.
•Privacy has utility costs — Stronger privacy requires more noise. Optimize through more data, more clients, and pre-training.

What's Next:

Page Complete

2 / 5