Loading content...
In 2018, NVIDIA research unveiled StyleGAN, a generative architecture that fundamentally reimagined how generators create images. While ProGAN had solved the high-resolution challenge through progressive growing, StyleGAN asked a different question: how can we give artists and researchers meaningful control over generation?
The core insight was revolutionary: instead of feeding the latent vector z directly into the generator, StyleGAN introduces an intermediate latent space W and uses style injection at each layer. This seemingly simple change had profound implications:
StyleGAN2 further refined the architecture, eliminating artifacts and improving quality. Together, the StyleGAN family represents the pinnacle of GAN-based image synthesis.
By the end of this page, you will understand the mapping network and W latent space, adaptive instance normalization (AdaIN), style injection mechanism, noise injection for stochastic detail, the constant input paradigm, style mixing and its implications, and StyleGAN2's improvements including weight demodulation.
The first innovation of StyleGAN is the mapping network f: Z → W, an 8-layer MLP that transforms the input latent code z into an intermediate latent code w. This might seem like added complexity for no reason, but the W space has fundamentally different properties than Z.
Why Z is problematic:
Z is drawn from a spherical Gaussian N(0, I). This imposes a specific geometry on the latent space. But the space of realistic images is not spherical—it has complex, non-convex structure. Forcing the generator to map from a sphere to this complex manifold creates entanglement: changing one dimension of z affects multiple image attributes simultaneously.
Why W is better:
The mapping network learns to 'warp' the spherical Z into a W space that better matches the structure of images. W is not constrained to any particular distribution—it naturally takes whatever shape best represents the data. This leads to much better disentanglement.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
import torchimport torch.nn as nnimport numpy as np class MappingNetwork(nn.Module): """ StyleGAN mapping network: f(z) → w 8 fully-connected layers with LeakyReLU activation. Transforms the spherical latent Z into the disentangled W space. Key design choices: - Uses equalized learning rate (like ProGAN) - LeakyReLU activation throughout - Output dimension equals input (both 512 typically) - No normalization layers (pure MLP) """ def __init__( self, z_dim=512, w_dim=512, num_layers=8, lr_multiplier=0.01 # Lower learning rate for mapping network ): super().__init__() layers = [] in_dim = z_dim for i in range(num_layers): out_dim = w_dim # Equalized linear layer layers.append(EqualizedLinear(in_dim, out_dim, lr_mul=lr_multiplier)) layers.append(nn.LeakyReLU(0.2)) in_dim = out_dim self.mapping = nn.Sequential(*layers) def forward(self, z): """ Map z → w with optional truncation. z: [batch, z_dim] - sampled from N(0, I) returns: [batch, w_dim] - in learned W space """ # Normalize z to unit sphere (pixel norm style) z = z / (z.norm(dim=1, keepdim=True) + 1e-8) return self.mapping(z) class EqualizedLinear(nn.Module): """Linear layer with equalized learning rate.""" def __init__(self, in_features, out_features, lr_mul=1.0): super().__init__() self.weight = nn.Parameter(torch.randn(out_features, in_features)) self.bias = nn.Parameter(torch.zeros(out_features)) # Scale for equalized learning rate self.scale = (1 / np.sqrt(in_features)) * lr_mul self.lr_mul = lr_mul def forward(self, x): return F.linear(x, self.weight * self.scale, self.bias * self.lr_mul) # The W space emerges from training"""During training, the mapping network learns to:1. Cluster similar image attributes together2. Separate independent attributes into different directions3. Create smooth interpolation paths Empirical observations:- Interpolating in Z: often passes through unrealistic images- Interpolating in W: smooth, realistic transitions throughout- Linear directions in W correspond to semantic attributes (age, gender, smile, glasses, etc.) This is not explicitly supervised - it emerges from the generator's pressure to create realistic images."""The truncation trick pulls w vectors toward the mean w̄ (average over many samples): w' = w̄ + ψ(w - w̄). With ψ < 1, samples are more 'average' but higher quality. With ψ > 1, samples are more varied but may have artifacts. This works much better in W than Z because W has a more regular distribution—its mean is meaningful.
Adaptive Instance Normalization (AdaIN) is the mechanism by which style information from W space is injected into the generator. Rather than using the latent code as input to the generator, StyleGAN uses it to modulate the activations at each layer.
The AdaIN operation:
$$\text{AdaIN}(x_i, y) = y_{s,i} \cdot \frac{x_i - \mu(x_i)}{\sigma(x_i)} + y_{b,i}$$
where:
In words: normalize each feature channel, then scale and shift it based on the style code. The style code thus controls the statistics of the activations, not their spatial structure.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133
import torchimport torch.nn as nnimport torch.nn.functional as F class AdaIN(nn.Module): """ Adaptive Instance Normalization. Takes feature maps and style vector, outputs modulated features. The style controls the 'style' (statistics) while the features provide the 'content' (spatial structure). """ def __init__(self, num_features, w_dim=512): super().__init__() # Affine transformation from w to scale and bias # 2 * num_features: one scale and one bias per feature channel self.style_transform = nn.Linear(w_dim, 2 * num_features) # Initialize to identity: scale=1, bias=0 self.style_transform.weight.data.zero_() self.style_transform.bias.data[:num_features] = 1.0 # scales self.style_transform.bias.data[num_features:] = 0.0 # biases self.num_features = num_features def forward(self, x, w): """ x: [batch, channels, height, width] - feature maps w: [batch, w_dim] - style vector returns: modulated feature maps """ batch_size = x.size(0) # Get per-channel scale and bias from style style = self.style_transform(w) # [batch, 2*channels] scale = style[:, :self.num_features].view(batch_size, -1, 1, 1) bias = style[:, self.num_features:].view(batch_size, -1, 1, 1) # Instance normalization: normalize each sample's each channel # [B, C, H, W] → normalize over H, W for each (B, C) pair mean = x.mean(dim=[2, 3], keepdim=True) std = x.std(dim=[2, 3], keepdim=True) + 1e-8 x_normalized = (x - mean) / std # Apply style modulation return scale * x_normalized + bias # Why AdaIN works for style control"""Insight from neural style transfer:- Content is captured by spatial patterns (where features activate)- Style is captured by feature statistics (mean, variance, correlations) By having AdaIN control only the statistics, we separate:- Structure/layout: determined by the generator's learned spatial patterns- Style/appearance: controlled by the W latent code This separation is why StyleGAN achieves such good disentanglement:- Early layers control global style (pose, face shape)- Middle layers control features (hair style, face features)- Late layers control fine details (colors, microstructure)""" class StyleGANSynthesisBlock(nn.Module): """ A single block in the StyleGAN synthesis network. Each block: 1. Upsamples the input (except first block) 2. Applies 2 convolutions with style modulation 3. Adds noise for stochastic variation """ def __init__( self, in_channels, out_channels, w_dim=512, resolution=None, # For noise generation is_first_block=False ): super().__init__() self.is_first_block = is_first_block self.resolution = resolution if not is_first_block: self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False) # Two convolutional layers self.conv1 = ModulatedConv2d(in_channels, out_channels, 3) self.conv2 = ModulatedConv2d(out_channels, out_channels, 3) # Style transform for each conv self.style1 = nn.Linear(w_dim, in_channels) self.style2 = nn.Linear(w_dim, out_channels) # Noise injection after each conv self.noise_weight1 = nn.Parameter(torch.zeros(1)) self.noise_weight2 = nn.Parameter(torch.zeros(1)) self.activation = nn.LeakyReLU(0.2) def forward(self, x, w, noise=None): if not self.is_first_block: x = self.upsample(x) # First conv + style + noise style1 = self.style1(w) x = self.conv1(x, style1) x = x + self.noise_weight1 * self._get_noise(x, noise) x = self.activation(x) # Second conv + style + noise style2 = self.style2(w) x = self.conv2(x, style2) x = x + self.noise_weight2 * self._get_noise(x, noise) x = self.activation(x) return x def _get_noise(self, x, noise=None): if noise is None: noise = torch.randn(x.size(0), 1, x.size(2), x.size(3), device=x.device) return noiseIn StyleGAN, a different (potentially different) w vector is fed to each layer. This extended space is called W+ (W-plus). Using the same w everywhere is called W space; using different w per layer is W+ space. W+ provides more control but makes optimization harder when inverting real images.
A key innovation in StyleGAN is per-pixel noise injection after each convolutional layer. This might seem counterintuitive—why add randomness to a generative process? The answer reveals a fundamental insight about image structure.
The observation:
Real images have two types of variation:
If we force the latent code to control everything, including random details, it becomes overloaded and less disentangled. By providing explicit noise channels, we free the latent space to focus on meaningful variations.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
import torchimport torch.nn as nn class NoiseInjection(nn.Module): """ Inject per-pixel Gaussian noise into feature maps. Noise is scaled by a learned per-channel weight, allowing the network to learn how much stochastic variation each feature channel should have. """ def __init__(self, num_channels): super().__init__() # One learnable weight per channel, initialized to 0 self.weight = nn.Parameter(torch.zeros(1, num_channels, 1, 1)) def forward(self, x, noise=None): """ x: [batch, channels, height, width] noise: optional [batch, 1, height, width] - shared across channels """ if noise is None: batch, _, height, width = x.shape noise = torch.randn(batch, 1, height, width, device=x.device) # Broadcast noise across channels, scale by learned weight return x + self.weight * noise # Demonstration: Effect of noisedef demonstrate_noise_effect(generator, w): """ Generate multiple images with same w but different noise. Result: Images have identical global structure (same person) but different stochastic details (different exact hair strands, slightly different freckle patterns, etc.) """ images = [] for _ in range(5): # Same w, different noise realization noise = [torch.randn(1, 1, 2**i, 2**i) for i in range(2, 11)] img = generator(w, noise=noise) images.append(img) return images # Same identity, different fine details # What noise controls at different resolutions:"""Resolution | Noise Controls4×4 | Barely visible effect (very coarse)8×8 | Large-scale texture variations16×16 | Hair shape variations32×32 | Hair strand patterns, facial texture64×64 | Finer hair details, skin texture128×128 | Individual hair strands, pore patterns256×256 | Very fine texture details512×512 | Sub-pixel variations1024×1024 | Finest details, almost imperceptible Key insight: This hierarchy emerges automatically from training.The network learns to use noise for resolution-appropriate details.""" class StyleGANGenerator(nn.Module): """ Complete StyleGAN generator with noise injection. """ def __init__(self, z_dim=512, w_dim=512, resolution=1024): super().__init__() self.mapping = MappingNetwork(z_dim, w_dim) # Learned constant input (replaces ProGAN's first layer) self.constant = nn.Parameter(torch.randn(1, 512, 4, 4)) # Build synthesis network self.synthesis_blocks = nn.ModuleList() self.to_rgb_blocks = nn.ModuleList() channels = {4: 512, 8: 512, 16: 512, 32: 512, 64: 256, 128: 128, 256: 64, 512: 32, 1024: 16} # ... build blocks as shown in previous sections def forward(self, z, noise=None, truncation_psi=1.0): """ Generate image from z. z: latent code from N(0, I) noise: optional list of noise tensors per layer truncation_psi: interpolate toward mean w (1.0 = no truncation) """ # Map z → w w = self.mapping(z) # Apply truncation trick if truncation_psi != 1.0: w_mean = self.get_mean_w() # Compute from many z samples w = w_mean + truncation_psi * (w - w_mean) # Generate noise if not provided if noise is None: noise = self._generate_noise() # Start from learned constant x = self.constant.expand(z.size(0), -1, -1, -1) # Apply synthesis blocks with style and noise for block, rgb, layer_noise in zip( self.synthesis_blocks, self.to_rgb_blocks, noise ): x = block(x, w, layer_noise) return xPerhaps StyleGAN's most surprising design choice is that the generator starts from a learned constant rather than the latent code z. In ProGAN and DCGAN, z is reshaped into a 4×4 feature map as the first layer. In StyleGAN, a fixed 4×4×512 tensor is learned during training, and all variation comes from style modulation and noise.
Why this works:
The constant input provides a stable 'canvas' for the synthesis network. The style (from W) and noise completely determine the output. This makes the role of each component crystal clear:
This separation is key to StyleGAN's interpretability and control.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
import torchimport torch.nn as nn class StyleGANSynthesisNetwork(nn.Module): """ StyleGAN synthesis network with constant input. Architecture: constant([1, 512, 4, 4]) → modulated convs → RGB output The constant is the only 'seed' - all variation comes from style injection and noise. """ def __init__(self, w_dim=512, img_resolution=1024): super().__init__() # THE learned constant (4×4×512) # This is the same for every generated image self.constant = nn.Parameter(torch.ones(1, 512, 4, 4)) nn.init.normal_(self.constant) # Channel progression self.channels = { 4: 512, 8: 512, 16: 512, 32: 512, 64: 256, 128: 128, 256: 64, 512: 32, 1024: 16 } # First layer: modulate the constant self.first_style = nn.Linear(w_dim, 512) self.first_conv = ModulatedConv2d(512, 512, 3) # ... rest of synthesis blocks def forward(self, w, noise=None): """ w: [batch, w_dim] or [batch, num_layers, w_dim] for W+ space """ batch_size = w.size(0) # Expand constant to batch size x = self.constant.expand(batch_size, -1, -1, -1) # Apply first style modulation # (note: we can also apply noise here) w0 = w[:, 0] if w.dim() == 3 else w style = self.first_style(w0) x = self.first_conv(x, style) # Continue through synthesis blocks... return x # Visualization: What the constant looks likedef analyze_constant(generator): """ The constant is a 4×4×512 tensor. We can't visualize 512 channels, but we can analyze its statistics. """ c = generator.synthesis.constant.data.squeeze(0) # [512, 4, 4] print(f"Constant shape: {c.shape}") print(f"Mean: {c.mean():.4f}") print(f"Std: {c.std():.4f}") print(f"Min: {c.min():.4f}, Max: {c.max():.4f}") # The constant typically converges to a pattern that # encodes a "neutral" face layout that can be modulated # into any specific face via style injection # Implications for image editing:"""Because all variation comes from style and noise, we can: 1. Style Mixing: Take styles from different images at different layers - Layers 0-3: coarse (pose, face shape) from image A - Layers 4-7: middle (hair, features) from image B - Layers 8+: fine (colors, texture) from image C 2. GAN Inversion: Find w (or w+) that generates a real image - Easier than finding z because W space is more regular - Enables editing real photos 3. Semantic Editing: Find directions in W that change specific attributes - Add "smile" vector to make any face smile - These directions often linearly combine"""Style mixing is both a training regularization technique and a powerful generation control mechanism. The idea is simple: instead of using one w vector for all layers, use different w vectors from different sources.
Training with style mixing:
During training, with probability p (often 0.9):
This regularizes the network, preventing it from expecting correlated styles across layers and improving disentanglement.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
import torchimport torch.nn as nnimport numpy as np class StyleMixer: """ Style mixing utilities for StyleGAN. """ def __init__(self, generator, num_layers=18): """ num_layers: total number of style injection points (e.g., 18 for 1024×1024 = 2 per resolution × 9 resolutions) """ self.G = generator self.num_layers = num_layers def mix_styles(self, w1, w2, crossover_layer): """ Create mixed style tensor using w1 for early layers, w2 for late. w1, w2: [batch, w_dim] crossover_layer: int, layer index where we switch from w1 to w2 returns: [batch, num_layers, w_dim] """ batch_size = w1.size(0) w_dim = w1.size(1) # Expand to [batch, num_layers, w_dim] w = torch.zeros(batch_size, self.num_layers, w_dim, device=w1.device) for layer_idx in range(self.num_layers): if layer_idx < crossover_layer: w[:, layer_idx] = w1 else: w[:, layer_idx] = w2 return w def generate_mixed(self, z1, z2, crossover_layer): """Generate image with style mixing.""" w1 = self.G.mapping(z1) w2 = self.G.mapping(z2) w_mixed = self.mix_styles(w1, w2, crossover_layer) return self.G.synthesis(w_mixed) def style_mixing_matrix(self, num_sources=5, num_destinations=5): """ Generate a grid showing style mixing between multiple latents. Rows: source images (provide coarse styles) Cols: destination images (provide fine styles) Cell (i,j): coarse from source i, fine from destination j """ z_source = torch.randn(num_sources, 512) z_dest = torch.randn(num_destinations, 512) w_source = self.G.mapping(z_source) w_dest = self.G.mapping(z_dest) crossover_layer = 4 # Switch after 4 layers (controls coarse structure) grid = [] for i in range(num_sources): row = [] for j in range(num_destinations): # Coarse from source[i], fine from dest[j] w_mixed = self.mix_styles( w_source[i:i+1], w_dest[j:j+1], crossover_layer ) img = self.G.synthesis(w_mixed) row.append(img) grid.append(row) return grid # What each layer range controls (for 1024×1024):"""Layer Range | Resolution | Controls0-1 | 4×4 | Overall face shape, pose2-3 | 8×8 | Face shape details4-5 | 16×16 | Hair style, eyes6-7 | 32×32 | Hair, face features8-9 | 64×64 | Smaller features10-11 | 128×128 | Colors, textures12-13 | 256×256 | Fine details14-15 | 512×512 | Very fine details16-17 | 1024×1024 | Microstructure General categories:- Coarse (0-3): Pose, face shape, general structure- Middle (4-7): Facial features, hairstyle- Fine (8+): Colors, textures, details These emerge from training without explicit supervision!""" # Training with style mixingdef train_with_mixing(G, D, z_batch, mixing_prob=0.9): """Apply style mixing regularization during training.""" batch_size = z_batch.size(0) if np.random.random() < mixing_prob: # Mix styles z2 = torch.randn_like(z_batch) crossover = np.random.randint(1, G.num_layers) w1 = G.mapping(z_batch) w2 = G.mapping(z2) # Create per-layer w w = torch.zeros(batch_size, G.num_layers, 512) for layer in range(G.num_layers): w[:, layer] = w1 if layer < crossover else w2 else: # No mixing - same w for all layers w = G.mapping(z_batch).unsqueeze(1).expand(-1, G.num_layers, -1) return G.synthesis(w)Style mixing enables powerful creative applications: (1) Create a person with one face structure but another's coloring; (2) Transfer just the hairstyle from one image to another; (3) Maintain identity while changing lighting/texture. These operations are impossible or very difficult with pre-StyleGAN architectures.
StyleGAN produced stunning results, but careful inspection revealed characteristic artifacts—'water droplet' blobs that appeared in many generated images. StyleGAN2 (2019) traced these to the AdaIN normalization step and introduced elegant solutions.
The problem with AdaIN:
AdaIN normalizes feature statistics, destroying information about the relative magnitudes of features. When certain features should dominate (e.g., edge features in high-contrast areas), normalization inappropriately equalizes them, creating artifacts.
StyleGAN2's key innovations:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
import torchimport torch.nn as nnimport torch.nn.functional as Fimport numpy as np class ModulatedConv2d(nn.Module): """ Modulated convolution used in StyleGAN2. Instead of modulating features (AdaIN), we modulate the weights. This achieves similar style control without normalizing features. Process: 1. Scale conv weights by style vector 2. Demodulate weights to maintain expected signal statistics 3. Apply convolution This is mathematically similar to AdaIN but avoids the artifacts. """ def __init__( self, in_channels, out_channels, kernel_size, demodulate=True, lr_mul=1.0 ): super().__init__() self.out_channels = out_channels self.kernel_size = kernel_size self.demodulate = demodulate # Base convolution weight self.weight = nn.Parameter( torch.randn(out_channels, in_channels, kernel_size, kernel_size) ) # Scale for equalized learning rate self.scale = (1 / np.sqrt(in_channels * kernel_size ** 2)) * lr_mul self.padding = kernel_size // 2 def forward(self, x, style): """ x: [batch, in_channels, height, width] style: [batch, in_channels] - per-channel modulation scales returns: [batch, out_channels, height, width] """ batch_size, in_channels, height, width = x.shape # Scale base weight weight = self.weight * self.scale # [out, in, k, k] # Modulate: multiply weight by style (per input channel) # style: [batch, in] → [batch, 1, in, 1, 1] style = style.view(batch_size, 1, in_channels, 1, 1) weight = weight.unsqueeze(0) * style # [batch, out, in, k, k] if self.demodulate: # Demodulate: normalize by expected output std # For each output channel, compute std over input channels and kernel # sigma = sqrt(sum(w^2)) demod = torch.rsqrt( weight.pow(2).sum(dim=[2, 3, 4], keepdim=True) + 1e-8 ) # [batch, out, 1, 1, 1] weight = weight * demod # Reshape for grouped convolution # This implements per-sample convolution efficiently weight = weight.view( batch_size * self.out_channels, in_channels, self.kernel_size, self.kernel_size ) # Group convolution: each sample uses its own weights x = x.view(1, batch_size * in_channels, height, width) out = F.conv2d(x, weight, padding=self.padding, groups=batch_size) out = out.view(batch_size, self.out_channels, height, width) return out # Why weight modulation works better:"""AdaIN approach (StyleGAN1):1. Convolve: y = conv(x)2. Normalize: y' = (y - μ) / σ 3. Scale: out = γ * y' + β Problem: Step 2 destroys relative magnitude information.If one feature is 10x larger than another (which conveys meaning),normalization makes them similar magnitude. Weight modulation approach (StyleGAN2):1. Modulate weights: w' = w * s2. Demodulate weights: w'' = w' / ||w'|| 3. Convolve: out = conv(x, w'') The key insight: we can achieve the same statistical effect by modifying the weights instead of the activations. The demodulationensures output has unit variance (like instance norm), but we neveractually normalize the activations themselves. Result: Same style control, no artifacts from feature normalization."""123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
import torch def path_length_regularization(generator, latents, mean_path_length, decay=0.01): """ Path length regularization for StyleGAN2. Encourages: ||J_w|| ≈ constant for all w Where J_w is the Jacobian of the generator output w.r.t latents. This makes the mapping from W to images locally stable, improving interpolation quality and inversion. """ # Generate images and track gradients w.r.t. latents images = generator(latents) # Compute path lengths (simplified: use noise as fake noise) noise = torch.randn_like(images) / np.sqrt(images.shape[2] * images.shape[3]) # Compute grad of (images * noise).sum() w.r.t latents # This is a stochastic estimate of ||J|| grad = torch.autograd.grad( outputs=(images * noise).sum(), inputs=latents, create_graph=True, )[0] # Path length for this batch path_lengths = torch.sqrt(grad.pow(2).sum(dim=1).mean(dim=0)) # Update running mean mean_path_length = mean_path_length + decay * ( path_lengths.mean().item() - mean_path_length ) # Penalty: encourage path length to match running mean path_penalty = (path_lengths - mean_path_length).pow(2).mean() return path_penalty, mean_path_length # The intuition:"""If the generator is well-behaved, small perturbations in Wshould cause small perturbations in the output image. Mathematically: ||∂G(w)/∂w|| should be roughly constant. The path length regularizer encourages this by:1. Computing the Jacobian norm (path length)2. Penalizing deviation from the average Benefits:- Smoother interpolations in W space- Better GAN inversion (finding w for real images)- More predictable edits when moving in W space"""StyleGAN's controllable, high-quality generation has enabled numerous applications beyond simple image synthesis.
StyleGAN's realism has raised significant ethical concerns. Generated faces can be used for fake profiles, disinformation, and fraud. The research community has responded with detection methods and watermarking techniques. Understanding these capabilities and their potential misuse is essential for responsible deployment.
StyleGAN and StyleGAN2 represent the pinnacle of GAN-based image synthesis, combining unprecedented quality with meaningful controllability.
StyleGAN's innovations—disentangled latent spaces, layer-wise control, separating content from style—have influenced architectures well beyond GANs. Many ideas in diffusion models and vision-language models trace conceptual ancestry to the StyleGAN paradigm. The next page covers Conditional GANs, which add explicit control through labels or input conditions.