Loading learning content...
Federated learning keeps raw data on client devices—but does that mean privacy is guaranteed? Absolutely not. The model updates that clients share can reveal remarkable amounts of information about their private training data.
Consider this: a gradient update for a neural network encodes how that network should change to better fit a client's data. An adversary observing these gradients can potentially reconstruct training samples, infer membership (whether a specific data point was used in training), or extract sensitive attributes. This isn't theoretical—practical attacks have demonstrated gradient-based data reconstruction with stunning accuracy.
True privacy in federated learning requires formal guarantees, not just architectural choices. This page explores the privacy threat landscape and the rigorous mathematical frameworks that provide provable protection.
By the end of this page, you will understand privacy attacks against federated learning, the formal definition and mechanisms of differential privacy, how secure aggregation cryptographically protects individual contributions, and how to compose privacy guarantees across multiple training rounds. You'll be equipped to design federated systems with rigorous, quantifiable privacy properties.
Before implementing defenses, we must understand what we're defending against. Privacy attacks on federated learning fall into three primary categories:
1. Gradient Inversion Attacks (Data Reconstruction)
These attacks attempt to reconstruct a client's training data from observed gradient updates. The intuition: if a gradient tells the server how to update weights to better fit certain data, that gradient implicitly encodes properties of that data.
The seminal work by Zhu et al. (2019), Deep Leakage from Gradients, demonstrated that by solving an optimization problem, an attacker can reconstruct training images with near-perfect fidelity from gradients alone.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
# Gradient Inversion Attack: Deep Leakage from Gradients# Zhu et al., NeurIPS 2019 import torchimport torch.nn.functional as F def gradient_inversion_attack( model: torch.nn.Module, observed_gradients: List[torch.Tensor], image_shape: Tuple[int, int, int, int], # (B, C, H, W) num_iterations: int = 300, learning_rate: float = 1.0, tv_weight: float = 0.01 # Total variation regularization) -> torch.Tensor: """ Reconstruct training data from observed gradients. The attack optimizes dummy inputs such that their gradients match the observed gradients. When successful, dummy inputs closely resemble the original training data. Mathematical formulation: x* = argmin_x || ∇L(model, x) - ∇_observed ||² + λ·TV(x) Where TV(x) is total variation regularization for smoothness. Args: model: The neural network model observed_gradients: Gradients shared by the victim client image_shape: Shape of images to reconstruct num_iterations: Number of optimization steps learning_rate: Learning rate for reconstruction optimizer tv_weight: Weight for total variation regularization Returns: Reconstructed images approximating training data """ # Initialize random "dummy" data to optimize dummy_data = torch.randn(image_shape, requires_grad=True) dummy_labels = torch.randn((image_shape[0], num_classes), requires_grad=True) optimizer = torch.optim.LBFGS( [dummy_data, dummy_labels], lr=learning_rate ) for iteration in range(num_iterations): def closure(): optimizer.zero_grad() # Compute gradients on dummy data dummy_outputs = model(dummy_data) dummy_loss = F.cross_entropy(dummy_outputs, F.softmax(dummy_labels, dim=-1)) dummy_gradients = torch.autograd.grad( dummy_loss, model.parameters(), create_graph=True ) # Minimize distance between dummy gradients and observed gradients gradient_distance = sum( ((dg - og) ** 2).sum() for dg, og in zip(dummy_gradients, observed_gradients) ) # Total variation regularization (encourages smooth images) tv_loss = total_variation(dummy_data) total_loss = gradient_distance + tv_weight * tv_loss total_loss.backward() return total_loss optimizer.step(closure) # Clamp to valid image range with torch.no_grad(): dummy_data.clamp_(0, 1) return dummy_data.detach() def total_variation(images: torch.Tensor) -> torch.Tensor: """ Total variation regularization. Penalizes large differences between adjacent pixels, encouraging smooth, natural-looking reconstructions. """ diff_h = images[:, :, 1:, :] - images[:, :, :-1, :] diff_w = images[:, :, :, 1:] - images[:, :, :, :-1] return torch.mean(diff_h ** 2) + torch.mean(diff_w ** 2) # Attack effectiveness demonstration# With batch size 1, reconstruction achieves >90% PSNR# Larger batches make individual sample recovery harder# But metadata (class distribution, statistics) still leaks2. Membership Inference Attacks
These attacks determine whether a specific data point was part of a client's training set. If an attacker knows someone's medical record and can determine it was used to train a model, they've learned that person was a patient at a participating hospital.
Membership inference exploits the fact that models behave differently on data they've seen (lower loss, higher confidence) versus unseen data.
3. Property Inference Attacks
These attacks extract aggregate properties about a client's data that may be sensitive even if individual records aren't exposed. For example:
| Attack Type | Attacker Goal | Information Leaked | Defense Approaches |
|---|---|---|---|
| Gradient Inversion | Reconstruct training samples | Individual data points | Gradient clipping, noise, SecAgg |
| Membership Inference | Determine if x ∈ training set | Data membership | Differential privacy, regularization |
| Property Inference | Learn aggregate data properties | Dataset statistics | Differential privacy, secure aggregation |
| Model Memorization | Extract memorized secrets | Verbatim training data | DP, deduplication, output filtering |
| Model Stealing | Replicate model functionality | Model IP/weights | Rate limiting, watermarking |
Don't dismiss these as theoretical. Gradient inversion attacks can reconstruct images recognizable to humans from a single gradient update. For text, attackers can recover specific sentences from language model gradients. The threat is real, and defenses are mandatory for any privacy-sensitive deployment.
Differential Privacy (DP) is the gold standard for formal privacy guarantees. Rather than making assumptions about attacker capabilities, DP provides guarantees that hold against any computationally unbounded adversary.
The Core Definition:
A randomized mechanism M satisfies (ε, δ)-differential privacy if for any two neighboring datasets D and D' (differing in exactly one record), and for any output S:
P[M(D) ∈ S] ≤ e^ε · P[M(D') ∈ S] + δ
Intuitively: the presence or absence of any individual's data barely affects the output distribution. An adversary observing the output cannot confidently determine whether any specific individual was in the dataset.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223
# Differential Privacy Mechanisms for Federated Learningimport numpy as npfrom typing import Tuple, Callablefrom scipy import stats class DifferentialPrivacy: """ Implementation of core differential privacy mechanisms used in federated learning. """ @staticmethod def gaussian_mechanism( true_value: np.ndarray, sensitivity: float, epsilon: float, delta: float ) -> np.ndarray: """ Gaussian mechanism for (ε, δ)-differential privacy. Adds Gaussian noise calibrated to the L2 sensitivity of the query. Noise scale: σ = Δ₂ · √(2 ln(1.25/δ)) / ε Where Δ₂ is the L2 sensitivity: max ||f(D) - f(D')||₂ over all neighboring datasets D, D'. Args: true_value: The true query result to privatize sensitivity: L2 sensitivity of the query epsilon: Privacy parameter ε delta: Privacy parameter δ Returns: Noisy value satisfying (ε, δ)-DP """ # Calculate noise scale using Gaussian mechanism formula # σ ≥ Δ₂ · √(2 ln(1.25/δ)) / ε sigma = sensitivity * np.sqrt(2 * np.log(1.25 / delta)) / epsilon # Sample Gaussian noise and add to true value noise = np.random.normal(0, sigma, size=true_value.shape) return true_value + noise @staticmethod def laplace_mechanism( true_value: np.ndarray, sensitivity: float, epsilon: float ) -> np.ndarray: """ Laplace mechanism for ε-differential privacy (pure DP). Adds Laplace noise calibrated to the L1 sensitivity. Noise scale: b = Δ₁ / ε Where Δ₁ is the L1 sensitivity: max ||f(D) - f(D')||₁ Provides pure DP (δ = 0) but requires more noise than Gaussian. Args: true_value: The true query result to privatize sensitivity: L1 sensitivity of the query epsilon: Privacy parameter ε Returns: Noisy value satisfying ε-DP """ scale = sensitivity / epsilon noise = np.random.laplace(0, scale, size=true_value.shape) return true_value + noise class DPFederatedLearning: """ Differentially Private Federated Learning implementation following the DP-SGD approach (Abadi et al., 2016). """ def __init__( self, target_epsilon: float, target_delta: float, clip_norm: float, noise_multiplier: float, num_rounds: int ): """ Initialize DP-FL with privacy budget. Args: target_epsilon: Total privacy budget ε for all rounds target_delta: Target δ (typically 1/n for n users) clip_norm: Maximum L2 norm for client updates (sensitivity bound) noise_multiplier: σ/C ratio for Gaussian noise num_rounds: Total training rounds (affects privacy composition) """ self.target_epsilon = target_epsilon self.target_delta = target_delta self.clip_norm = clip_norm self.noise_multiplier = noise_multiplier self.num_rounds = num_rounds # Track privacy spent so far self.rounds_completed = 0 def clip_gradient(self, gradient: np.ndarray) -> np.ndarray: """ Clip gradient to bound L2 sensitivity. Per-sample gradient clipping ensures that no single training example can influence the update by more than clip_norm. This is CRITICAL: without clipping, sensitivity is unbounded and no finite noise can achieve DP. """ grad_norm = np.linalg.norm(gradient) if grad_norm > self.clip_norm: # Scale down to have exactly clip_norm magnitude gradient = gradient * (self.clip_norm / grad_norm) return gradient def add_noise_to_aggregate( self, aggregated_gradient: np.ndarray, num_clients: int ) -> np.ndarray: """ Add calibrated Gaussian noise to aggregated update. The noise scale accounts for: 1. The clip norm C (bounds sensitivity) 2. The noise multiplier σ 3. The number of clients (amplification via sampling) Returns: Noisy aggregate satisfying per-round DP guarantee """ # Standard deviation of noise # σ_aggregate = noise_multiplier * C / num_clients sigma = self.noise_multiplier * self.clip_norm / num_clients noise = np.random.normal(0, sigma, size=aggregated_gradient.shape) noisy_aggregate = aggregated_gradient + noise self.rounds_completed += 1 return noisy_aggregate def compute_privacy_spent(self) -> Tuple[float, float]: """ Compute privacy budget spent using moments accountant. The Rényi Differential Privacy (RDP) framework provides tight privacy composition, essential for multi-round FL. Returns: (epsilon_spent, delta) tuple """ # Simplified privacy accounting # In practice, use tensorflow-privacy or opacus for tight bounds from dp_accounting import compute_rdp, get_privacy_spent # Compute RDP at multiple orders orders = [1 + x / 10.0 for x in range(1, 100)] sampling_probability = 1.0 # If not subsampling clients rdp = compute_rdp( q=sampling_probability, noise_multiplier=self.noise_multiplier, steps=self.rounds_completed, orders=orders ) epsilon_spent, _, _ = get_privacy_spent( orders, rdp, target_delta=self.target_delta ) return epsilon_spent, self.target_delta def dp_federated_averaging_round( global_model: np.ndarray, client_gradients: List[np.ndarray], dp_fl: DPFederatedLearning, client_weights: List[float]) -> np.ndarray: """ Execute one round of DP-FedAvg. Steps: 1. Clip each client's gradient to bound sensitivity 2. Compute weighted average of clipped gradients 3. Add calibrated Gaussian noise to the aggregate 4. Apply noisy update to global model """ # Step 1: Clip each client's gradient clipped_gradients = [ dp_fl.clip_gradient(grad) for grad in client_gradients ] # Step 2: Weighted average total_weight = sum(client_weights) aggregated = sum( (w / total_weight) * grad for w, grad in zip(client_weights, clipped_gradients) ) # Step 3: Add noise noisy_aggregate = dp_fl.add_noise_to_aggregate( aggregated, num_clients=len(client_gradients) ) # Step 4: Update model updated_model = global_model - noisy_aggregate # Gradient descent return updated_modelDifferentially Private Stochastic Gradient Descent (DP-SGD), introduced by Abadi et al. (2016), provides the foundational technique for training neural networks with formal privacy guarantees. In federated learning, DP-SGD is adapted to work with distributed client updates.
The DP-SGD Algorithm:
Per-Sample Gradient Clipping — Compute gradients for each sample individually and clip their L2 norm to a maximum value C. This bounds the sensitivity of the gradient computation.
Noise Addition — Add Gaussian noise proportional to C to the sum of clipped gradients. The noise scale σ determines the privacy-utility tradeoff.
Privacy Accounting — Track privacy loss across training iterations using advanced composition theorems (moments accountant, RDP).
Sensitivity Bounding via Clipping:
The key insight of DP-SGD is that clipping gradients bounds their sensitivity—the maximum change in the output from adding or removing one training sample.
Without clipping, a single outlier sample could produce an arbitrarily large gradient, requiring infinite noise for DP. With clipping, we guarantee:
Δ₂(gradient sum) ≤ C
This allows us to calibrate noise precisely: σ = noise_multiplier × C.
The Clipping Norm Tradeoff:
| Parameter | Typical Range | Effect on Privacy | Effect on Utility |
|---|---|---|---|
| Clipping norm C | 0.1 - 10.0 | Lower C allows less noise for same ε | Too low clips informative gradients |
| Noise multiplier σ | 0.5 - 2.0 | Higher σ → stronger privacy (lower ε) | Higher σ → more noise → slower convergence |
| Batch size | 256 - 2048 | Larger batches → better privacy amplification | Limited by memory, diminishing returns |
| Epochs | 1 - 10 | More epochs → larger ε (composition) | More epochs → better model performance |
If you sample q% of data per batch, you get privacy amplification: effective ε ≈ q × ε_base. In federated learning, sampling a fraction of clients per round provides similar amplification. This is why participating in fewer rounds or with smaller batches improves privacy.
Secure Aggregation (SecAgg) is a cryptographic protocol that ensures the server learns only the aggregate of client updates—never individual contributions. This provides a fundamentally different privacy guarantee than differential privacy, protecting against a curious-but-honest server.
The SecAgg Protocol (Bonawitz et al., 2017):
Each client masks their update with random values that sum to zero across all clients. The server receives masked updates and can compute their sum (where masks cancel), but cannot recover individual updates.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230
# Secure Aggregation Protocol (Simplified)# Based on Bonawitz et al., CCS 2017 import secretsfrom typing import Dict, List, Tupleimport numpy as npfrom cryptography.hazmat.primitives import hashesfrom cryptography.hazmat.primitives.kdf.hkdf import HKDF class SecureAggregation: """ Secure Aggregation Protocol Implementation. Key insight: Clients add pairwise masks that cancel in the sum. For clients i, j: mask_{i,j} = -mask_{j,i} When server sums all updates, masks cancel, revealing true sum. Security property: Server learns only Σᵢxᵢ, not individual xᵢ. Protocol phases: 1. Key advertisement: Clients exchange Diffie-Hellman public keys 2. Share keys: Clients secret-share their keys for dropout recovery 3. Masked input: Clients submit masked updates 4. Unmasking: Surviving clients help reconstruct dropped clients' masks """ def __init__(self, num_clients: int, threshold: int): """ Initialize SecAgg protocol. Args: num_clients: Total number of participating clients threshold: Minimum clients needed for aggregation """ self.num_clients = num_clients self.threshold = threshold self.client_keys: Dict[int, bytes] = {} self.pairwise_masks: Dict[Tuple[int, int], np.ndarray] = {} def setup_pairwise_keys(self) -> Dict[int, Dict[int, bytes]]: """ Phase 1: Clients exchange Diffie-Hellman keys to establish pairwise shared secrets. Each pair (i, j) derives a shared secret s_{i,j} = s_{j,i} using Diffie-Hellman key exchange. """ # In practice, use proper DH key exchange # Here we simulate with random shared secrets shared_secrets = {} for i in range(self.num_clients): shared_secrets[i] = {} for j in range(self.num_clients): if i != j: # Derive symmetric pairwise key shared_secrets[i][j] = self._derive_pairwise_key(i, j) return shared_secrets def _derive_pairwise_key(self, i: int, j: int) -> bytes: """ Derive pairwise key using HKDF. In practice, this uses Diffie-Hellman shared secret. """ # Ensure consistent ordering for symmetric key pair = tuple(sorted([i, j])) seed = f"pair_{pair[0]}_{pair[1]}".encode() return HKDF( algorithm=hashes.SHA256(), length=32, salt=None, info=b'secagg-mask' ).derive(seed) def generate_pairwise_mask( self, client_i: int, client_j: int, shape: Tuple[int, ...], shared_key: bytes ) -> np.ndarray: """ Generate pairwise mask m_{i,j}. Critical property: m_{i,j} = -m_{j,i} This ensures masks cancel in the sum. Implementation: Use shared key as PRG seed, negate if i > j. """ # Use shared key to seed random number generator rng = np.random.default_rng( int.from_bytes(shared_key[:8], 'big') ) # Generate mask mask = rng.standard_normal(shape).astype(np.float32) # Negate for one direction to ensure cancellation if client_i > client_j: mask = -mask return mask def mask_update( self, client_id: int, raw_update: np.ndarray, pairwise_keys: Dict[int, bytes] ) -> np.ndarray: """ Mask a client's update for secure transmission. Masked update: ŷᵢ = xᵢ + Σⱼ m_{i,j} + rᵢ Where: - xᵢ is raw update - m_{i,j} are pairwise masks (cancel with j's contribution) - rᵢ is self-mask (for dropout recovery) """ masked = raw_update.copy() # Add pairwise masks for j, key in pairwise_keys.items(): mask = self.generate_pairwise_mask( client_id, j, raw_update.shape, key ) masked += mask # Add self-mask (shared via secret sharing for dropout recovery) self_mask = self._generate_self_mask(client_id, raw_update.shape) masked += self_mask return masked def aggregate_masked_updates( self, masked_updates: Dict[int, np.ndarray], surviving_clients: List[int] ) -> np.ndarray: """ Aggregate masked updates from surviving clients. For surviving clients, pairwise masks cancel: Σᵢ m_{i,j} + Σⱼ m_{j,i} = 0 For dropped clients, surviving clients reconstruct the negative of dropped clients' self-masks. Result: Server learns only Σᵢ xᵢ """ # Sum all masked updates aggregate = sum(masked_updates.values()) # Handle dropped clients' self-masks dropped_clients = set(range(self.num_clients)) - set(surviving_clients) for dropped_id in dropped_clients: # Reconstruct dropped client's self-mask from secret shares # (held by surviving clients) reconstructed_self_mask = self._reconstruct_self_mask( dropped_id, surviving_clients ) # Subtract to cancel the self-mask aggregate -= reconstructed_self_mask return aggregate def _generate_self_mask( self, client_id: int, shape: Tuple[int, ...] ) -> np.ndarray: """Generate client's self-mask for dropout recovery.""" rng = np.random.default_rng(client_id * 1000) return rng.standard_normal(shape).astype(np.float32) def _reconstruct_self_mask( self, dropped_id: int, surviving_clients: List[int] ) -> np.ndarray: """ Reconstruct dropped client's self-mask from secret shares. Uses Shamir's secret sharing for threshold reconstruction. """ # In practice: collect shares from t surviving clients # and use polynomial interpolation return self._generate_self_mask(dropped_id, self.update_shape) class SecAggWithDropouts: """ Production-ready SecAgg handling client dropouts. The protocol tolerates up to n-t dropouts while maintaining security and correctness, where t is the threshold. """ def execute_protocol( self, client_updates: Dict[int, np.ndarray], dropout_probability: float = 0.1 ) -> np.ndarray: """ Execute full SecAgg protocol with dropout handling. Protocol rounds: 1. Advertise keys 2. Share keys 3. Submit masked inputs 4. Unmask (handle dropouts) """ # Simulate client dropouts active_clients = [ cid for cid in client_updates.keys() if np.random.random() > dropout_probability ] if len(active_clients) < self.threshold: raise ProtocolFailedError( f"Only {len(active_clients)} clients survived, " f"need {self.threshold}" ) # Execute aggregation with surviving clients return self._aggregate_with_recovery( {cid: client_updates[cid] for cid in active_clients} )SecAgg vs. Differential Privacy:
Secure aggregation and differential privacy provide complementary protections:
Best practice: Use both together. SecAgg ensures clients don't expose updates to the server. DP ensures the aggregated model doesn't leak individual information.
Computational Cost:
SecAgg imposes overhead:
For cross-device FL with millions of clients, this is significant. Optimizations include hierarchical SecAgg and single-server protocols.
SecAgg protects against honest-but-curious servers. If the server is actively malicious (sends different models to different clients, lies about aggregates), additional measures are needed: verifiable aggregation, Byzantine-robust protocols, or trusted execution environments.
Training a model requires many rounds of gradient updates. If each round provides (ε, δ)-DP, what is the total privacy guarantee after T rounds? This is the composition problem, and naive analysis vastly overestimates privacy loss.
Basic Composition Theorem:
If mechanisms M₁, M₂, ..., Mₜ each satisfy (εᵢ, δᵢ)-DP, their composition satisfies:
(Σεᵢ, Σδᵢ)-DP
This is simple but loose. For 1000 rounds with ε = 0.01 each, basic composition gives ε = 10, which is poor privacy.
Advanced Composition Theorem:
For T mechanisms each satisfying (ε₀, δ₀)-DP:
Total ε ≤ √(2T ln(1/δ')) · ε₀ + T · ε₀ · (e^ε₀ - 1)
For small ε₀, this is approximately O(√T · ε₀)—much better than O(T · ε₀).
| Composition Method | Total ε | Interpretation |
|---|---|---|
| Basic (linear) | 10.0 | Very poor privacy |
| Advanced (√T) | 0.45 | Reasonable privacy |
| Moments Accountant (RDP) | 0.35 | Tight, practical bound |
| Privacy Loss Distributions | 0.31 | State-of-the-art tight bound |
Rényi Differential Privacy (RDP):
RDP provides even tighter composition bounds. It tracks privacy loss via Rényi divergences, which compose additively:
RDP is the standard for modern DP implementations (TensorFlow Privacy, Opacus).
Practical Privacy Budgeting:
Before training, decide:
Then work backward: given target ε and T rounds, compute per-round noise multiplier needed.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184
# Privacy Accounting with Rényi Differential Privacyfrom dataclasses import dataclassfrom typing import List, Tupleimport numpy as npfrom scipy import special @dataclassclass PrivacyBudget: """Track privacy budget consumption.""" target_epsilon: float target_delta: float consumed_epsilon: float = 0.0 consumed_delta: float = 0.0 def remaining(self) -> Tuple[float, float]: return ( self.target_epsilon - self.consumed_epsilon, self.target_delta - self.consumed_delta ) def is_exhausted(self) -> bool: return ( self.consumed_epsilon >= self.target_epsilon or self.consumed_delta >= self.target_delta ) class RenyiDifferentialPrivacy: """ Rényi Differential Privacy (RDP) accounting. Mironov, 2017: "Rényi Differential Privacy" Key advantages: 1. Tight composition (RDP adds across mechanisms) 2. Natural for subsampled Gaussian mechanism 3. Converts to (ε, δ)-DP for final reporting """ @staticmethod def compute_rdp_gaussian( sampling_rate: float, noise_multiplier: float, orders: List[float] ) -> List[float]: """ Compute RDP for subsampled Gaussian mechanism. For Gaussian mechanism with subsampling probability q and noise multiplier σ (noise std = σ × sensitivity): RDP at order α ≈ α / (2σ²) [simplified, full formula complex] Subsampling provides amplification: effective RDP is much smaller. Args: sampling_rate: Probability q of including each record noise_multiplier: σ = noise_std / sensitivity orders: Rényi orders α to compute RDP for Returns: RDP at each order """ rdp = [] for order in orders: if order == 1: # KL divergence (order 1 Rényi) rdp.append(0) # Needs special handling elif sampling_rate == 1.0: # No subsampling: α / (2σ²) rdp.append(order / (2 * noise_multiplier ** 2)) else: # With subsampling: use numerical integration # Approximation for small q: rdp.append( np.log1p( sampling_rate ** 2 * (np.exp(order / (2 * noise_multiplier ** 2)) - 1) ) / (order - 1) ) return rdp @staticmethod def rdp_to_epsilon( rdp_values: List[float], orders: List[float], target_delta: float ) -> float: """ Convert RDP to (ε, δ)-DP. For each order α with RDP value ρ: ε ≤ ρ + log(1/δ) / (α - 1) Return minimum ε across all orders. """ epsilons = [] for rdp, order in zip(rdp_values, orders): if order == 1: epsilon = rdp # KL divergence equals ε for order 1 else: epsilon = rdp + np.log(1 / target_delta) / (order - 1) epsilons.append(epsilon) return min(epsilons) @staticmethod def compose_rdp( rdp_per_round: List[float], num_rounds: int ) -> List[float]: """ Compose RDP across multiple rounds. RDP composes additively: total RDP = sum of per-round RDP. This is the beauty of RDP—simple addition gives tight bounds. """ return [rdp * num_rounds for rdp in rdp_per_round] def plan_private_training( target_epsilon: float, target_delta: float, num_rounds: int, samples_per_round: int, total_samples: int) -> float: """ Plan private training by computing required noise multiplier. Given privacy budget and training plan, compute the minimum noise_multiplier that stays within budget. Args: target_epsilon: Privacy budget ε target_delta: Privacy budget δ num_rounds: Number of training rounds samples_per_round: Samples used per round total_samples: Total samples in dataset Returns: Required noise_multiplier σ """ sampling_rate = samples_per_round / total_samples orders = [1 + x / 10.0 for x in range(1, 100)] # Binary search for noise multiplier low, high = 0.1, 100.0 while high - low > 0.01: mid = (low + high) / 2 # Compute RDP for this noise multiplier rdp_per_round = RenyiDifferentialPrivacy.compute_rdp_gaussian( sampling_rate, mid, orders ) # Compose across rounds total_rdp = RenyiDifferentialPrivacy.compose_rdp( rdp_per_round, num_rounds ) # Convert to epsilon achieved_epsilon = RenyiDifferentialPrivacy.rdp_to_epsilon( total_rdp, orders, target_delta ) if achieved_epsilon > target_epsilon: low = mid # Need more noise else: high = mid # Can use less noise return high # Example usagenoise_multiplier = plan_private_training( target_epsilon=1.0, target_delta=1e-5, num_rounds=100, samples_per_round=256, total_samples=60000 # MNIST size)print(f"Required noise multiplier: {noise_multiplier:.2f}")Trusted Execution Environments provide a hardware-based approach to privacy, creating secure enclaves where computation occurs in a protected region that even the host system's administrator cannot inspect.
TEE Technologies:
Numerous side-channel attacks have broken SGX guarantees in practice. Use TEEs as defense in depth alongside differential privacy and secure aggregation, not as the sole protection. The combination provides layered security: even if one layer fails, others maintain protection.
Privacy protection is not free. Adding noise to ensure differential privacy degrades model accuracy. Understanding and optimizing this tradeoff is essential for practical private FL.
The Fundamental Tradeoff:
There is no way to achieve perfect privacy (ε = 0) with any meaningful learning. The art is in finding the sweet spot for your use case.
| Privacy Level (ε) | Noise Multiplier (σ) | Top-1 Accuracy (%) | Accuracy Drop |
|---|---|---|---|
| ∞ (no privacy) | 0 | 76.6 | 0% |
| 10 (weak) | 0.5 | 75.2 | -1.4% |
| 3 (moderate) | 1.0 | 71.8 | -4.8% |
| 1 (strong) | 2.0 | 65.4 | -11.2% |
| 0.3 (very strong) | 4.0 | 54.2 | -22.4% |
Strategies to Improve the Tradeoff:
More data — Privacy cost is per-sample. More samples means less noise per sample for the same total privacy.
More clients — In FL, noise is added after aggregation. With n clients, per-client noise is √n smaller for the same aggregate privacy.
Pre-training — Start from a pre-trained model (on public data). Fine-tuning requires fewer rounds than training from scratch.
Gradient compression — Communicating fewer gradient components naturally reduces what can be inferred. Combine with DP for better tradeoffs.
Private feature learning — Make early layers public (trained on non-sensitive features), keep only later layers private.
For most applications, ε between 1 and 10 provides meaningful privacy without catastrophic accuracy loss. ε < 1 is rarely achieved in practice without significant accuracy degradation. Work with your privacy and legal teams to determine acceptable thresholds—there's no universal 'right' value.
We've covered the critical privacy landscape of federated learning. Let's consolidate:
What's Next:
With privacy fundamentals established, we'll tackle Communication Efficiency in the next page. Federated learning often operates over slow, metered networks where model updates of millions of parameters are prohibitively expensive. You'll learn gradient compression, quantization, and sparse communication techniques that reduce bandwidth requirements by 10-100x.
You now understand the privacy threat landscape and defense mechanisms in federated learning. You can implement DP-SGD, explain secure aggregation cryptographic principles, and reason about privacy composition. Next, we address the communication bottleneck that constrains real-world FL deployments.