Loading problem...
In deep learning optimization, the learning rate schedule is one of the most critical hyperparameters affecting training dynamics and final model performance. A well-designed schedule helps models converge faster, escape sharp minima, and achieve better generalization.
The Warmup + Cosine Annealing schedule has become one of the most popular and effective learning rate strategies in modern deep learning. It combines two distinct phases:
Phase 1: Linear Warmup (Steps 0 to W-1) During the initial warmup phase, the learning rate starts from 0 and increases linearly to the maximum learning rate (lr_max). This gradual ramp-up prevents the model from making large, potentially destabilizing updates when the weights are still randomly initialized. The warmup helps the optimizer "explore" the loss landscape carefully before committing to aggressive updates.
For step t in the warmup phase (where t ranges from 0 to W-1), the learning rate is:
$$lr(t) = \frac{t}{W} \times lr_{max}$$
Phase 2: Cosine Annealing Decay (Steps W to T-1) After warmup completes at step W, the learning rate smoothly decays following a cosine curve from lr_max down to lr_min. The cosine decay provides a gentle, non-linear reduction that spends more time at moderate learning rates—helping the model refine its weights before settling into a local minimum.
For step t in the decay phase (where t ranges from W to T-1), the cosine annealing formula is:
$$lr(t) = lr_{min} + \frac{1}{2}(lr_{max} - lr_{min})\left(1 + \cos\left(\frac{\pi \cdot (t - W)}{T - W - 1}\right)\right)$$
Your Task: Implement a function that generates the complete learning rate schedule for T training steps. The function should return a list containing the learning rate at each step, where each value is rounded to 4 decimal places.
Notes:
T = 10, W = 3, lr_max = 1.0, lr_min = 0.0[0.0, 0.3333, 0.6667, 1.0, 0.9505, 0.8117, 0.6113, 0.3887, 0.1883, 0.0495]Warmup Phase (Steps 0-2): • Step 0: lr = (0/3) × 1.0 = 0.0 • Step 1: lr = (1/3) × 1.0 = 0.3333 • Step 2: lr = (2/3) × 1.0 = 0.6667
Transition Point (Step 3): • Step 3: lr = lr_max = 1.0 (warmup complete)
Cosine Decay Phase (Steps 4-9): The learning rate follows a cosine curve from 1.0 down to 0.0 over 6 steps. • Step 4: lr = 0.5 × (1 + cos(π × 1/6)) = 0.9505 • Step 5: lr = 0.5 × (1 + cos(π × 2/6)) = 0.8117 • Step 6: lr = 0.5 × (1 + cos(π × 3/6)) = 0.6113 • Step 7: lr = 0.5 × (1 + cos(π × 4/6)) = 0.3887 • Step 8: lr = 0.5 × (1 + cos(π × 5/6)) = 0.1883 • Step 9: lr = 0.5 × (1 + cos(π × 6/6)) = 0.0495
T = 5, W = 2, lr_max = 0.1, lr_min = 0.01[0.0, 0.05, 0.1, 0.0775, 0.0325]Warmup Phase (Steps 0-1): • Step 0: lr = (0/2) × 0.1 = 0.0 • Step 1: lr = (1/2) × 0.1 = 0.05
Transition Point (Step 2): • Step 2: lr = lr_max = 0.1
Cosine Decay Phase (Steps 3-4): The learning rate decays from 0.1 to 0.01 following a cosine curve. • Step 3: Uses cosine formula with lr_min = 0.01, yields 0.0775 • Step 4: Approaches the minimum learning rate, yields 0.0325
Note that the final value doesn't reach exactly 0.01 because of the discrete step nature of the schedule.
T = 8, W = 4, lr_max = 0.5, lr_min = 0.1[0.0, 0.125, 0.25, 0.375, 0.5, 0.4414, 0.3, 0.1586]Warmup Phase (Steps 0-3): The learning rate increases linearly from 0 to 0.5 over 4 steps: • Step 0: lr = 0.0 • Step 1: lr = (1/4) × 0.5 = 0.125 • Step 2: lr = (2/4) × 0.5 = 0.25 • Step 3: lr = (3/4) × 0.5 = 0.375
Transition Point (Step 4): • Step 4: lr = lr_max = 0.5
Cosine Decay Phase (Steps 5-7): The learning rate follows cosine decay from 0.5 toward 0.1: • Step 5: lr = 0.4414 • Step 6: lr = 0.3 (midpoint of decay range) • Step 7: lr = 0.1586 (approaching lr_min)
Constraints