Loading content...
Denoising Diffusion Probabilistic Models (DDPMs) represent a revolutionary approach in generative modeling that learns to reverse a gradual noise-corruption process. The core insight is elegant: rather than directly learning to generate complex data, we learn to iteratively remove small amounts of noise—a process that is fundamentally easier to model.
The diffusion paradigm consists of two complementary processes that work in opposition:
The forward process (also called the diffusion or noising process) systematically destroys information in the data by adding Gaussian noise according to a variance schedule β₁, β₂, ..., βₜ. Starting from clean data x₀, we progressively create noisier versions until the signal is completely obscured by pure noise.
The key mathematical insight is that we can compute any intermediate noisy sample xₜ directly from x₀ without simulating all intermediate steps. Using the cumulative product of noise retention factors:
$$\alpha_t = 1 - \beta_t$$ $$\bar{\alpha}t = \prod{s=1}^{t} \alpha_s$$
The closed-form forward sampling becomes:
$$x_t = \sqrt{\bar{\alpha}_t} \cdot x_0 + \sqrt{1 - \bar{\alpha}_t} \cdot \epsilon$$
where ε is standard Gaussian noise.
The backward process (also called the reverse or denoising process) is where the magic happens. Starting from pure noise, we iteratively denoise to reconstruct coherent data. At each step, a neural network predicts the noise component, which we then partially remove.
The reverse transition from xₜ to xₜ₋₁ uses the predicted noise ε̂ to compute the posterior mean:
$$\mu_\theta(x_t, t) = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}} \cdot \hat{\epsilon} \right)$$
The posterior variance for stochastic sampling is:
$$\tilde{\beta}t = \frac{\beta_t \cdot (1 - \bar{\alpha}{t-1})}{1 - \bar{\alpha}_t}$$
The denoised sample is then:
$$x_{t-1} = \mu_\theta(x_t, t) + \sqrt{\tilde{\beta}_t} \cdot z$$
where z is standard Gaussian noise (set to zero when t = 1 for the final deterministic step).
Create a function that performs both the forward corruption and backward restoration steps of the diffusion process. Given clean data, a noise schedule, and the current timestep, your implementation should:
Special Case: When t = 1, the backward step produces the final reconstruction without adding any stochastic noise (z = 0), as we are returning to the data distribution.
x_0 = [1, 2]
betas = [0.1, 0.2]
timestep = 2
forward_noise = [0.5, -0.5]
predicted_noise = [0.5, -0.5]
backward_noise = [0, 0]{"x_t": [1.1131, 1.4325], "x_t_minus_1": [1.0332, 1.8129]}Forward Process Calculation: • Compute α₁ = 1 - 0.1 = 0.9, α₂ = 1 - 0.2 = 0.8 • Compute ᾱ₂ = α₁ × α₂ = 0.9 × 0.8 = 0.72 • x_t = √0.72 × [1, 2] + √0.28 × [0.5, -0.5] • x_t = 0.8485 × [1, 2] + 0.5292 × [0.5, -0.5] • x_t = [0.8485, 1.6970] + [0.2646, -0.2646] = [1.1131, 1.4325]
Backward Process Calculation: • ᾱ₁ = 0.9, ᾱ₂ = 0.72 • Posterior variance: β̃ = 0.2 × (1 - 0.9)/(1 - 0.72) = 0.0714 • Coefficient for x_t: 1/√0.8 = 1.118 • Coefficient for predicted noise: 0.2/√0.28 = 0.378 • μ = 1.118 × (x_t - 0.378 × predicted_noise) • x_{t-1} = μ + √0.0714 × [0, 0] = [1.0332, 1.8129]
x_0 = [1, 0, -1]
betas = [0.1]
timestep = 1
forward_noise = [0.2, 0.3, -0.2]
predicted_noise = [0.2, 0.3, -0.2]
backward_noise = null{"x_t": [1.0119, 0.0949, -1.0119], "x_t_minus_1": [1, 0, -1]}Forward Process (Single Step): • α₁ = 1 - 0.1 = 0.9, thus ᾱ₁ = 0.9 • x_t = √0.9 × [1, 0, -1] + √0.1 × [0.2, 0.3, -0.2] • x_t = 0.9487 × [1, 0, -1] + 0.3162 × [0.2, 0.3, -0.2] • x_t = [0.9487, 0, -0.9487] + [0.0632, 0.0949, -0.0632] • x_t = [1.0119, 0.0949, -1.0119]
Backward Process (Final Step t=1): • At t=1, this is the final denoising step • backward_noise is null, meaning no stochastic noise is added • With perfect noise prediction, we recover x₀ exactly • x_{t-1} = [1, 0, -1] (original clean data)
x_0 = [0.5, 1.5, 2.5]
betas = [0.01, 0.02, 0.03]
timestep = 3
forward_noise = [0.1, -0.1, 0.2]
predicted_noise = [0.1, -0.1, 0.2]
backward_noise = [0, 0, 0]{"x_t": [0.5093, 1.4309, 2.4738], "x_t_minus_1": [0.5046, 1.4654, 2.4867]}Forward Process with Small Betas: • α₁ = 0.99, α₂ = 0.98, α₃ = 0.97 • ᾱ₃ = 0.99 × 0.98 × 0.97 = 0.9412 • x_t = √0.9412 × [0.5, 1.5, 2.5] + √0.0588 × [0.1, -0.1, 0.2] • x_t = 0.9701 × [0.5, 1.5, 2.5] + 0.2425 × [0.1, -0.1, 0.2] • x_t = [0.4851, 1.4552, 2.4253] + [0.0242, -0.0243, 0.0485] • x_t = [0.5093, 1.4309, 2.4738]
Backward Process: • With small beta values, each denoising step makes subtle adjustments • ᾱ₂ = 0.99 × 0.98 = 0.9702 • posterior variance β̃ = 0.03 × (1 - 0.9702)/(1 - 0.9412) • With zero backward noise, we get deterministic denoising • x_{t-1} = [0.5046, 1.4654, 2.4867]
Constraints