Loading problem...
In deep learning optimization, the learning rate is one of the most critical hyperparameters that determines how quickly or slowly a model converges during training. A static learning rate often leads to suboptimal results—too high causes oscillation and divergence, while too low results in painfully slow convergence.
Learning rate scheduling provides an elegant solution by dynamically adjusting the learning rate throughout training. Among the various scheduling strategies, cosine annealing (also known as cosine decay) has emerged as one of the most effective and widely adopted approaches in modern deep learning.
The cosine annealing schedule smoothly decreases the learning rate following a cosine curve from the initial learning rate $$\eta_{\text{max}}$$ to a minimum learning rate $$\eta_{\text{min}}$$ over $$T_{\text{max}}$$ epochs:
$$\eta_t = \eta_{\text{min}} + \frac{1}{2}(\eta_{\text{max}} - \eta_{\text{min}})\left(1 + \cos\left(\frac{t \cdot \pi}{T_{\text{max}}}\right)\right)$$
Where:
Smooth Decay: The cosine function provides a gradual transition, starting with slow decay at the beginning, accelerating in the middle, and slowing down again near the end.
Bounded Range: The learning rate always remains within $$[\eta_{\text{min}}, \eta_{\text{max}}]$$
Natural Convergence: The slow decay near the minimum allows the model to fine-tune and settle into a good local minimum.
Implement a Python class CosineDecayScheduler that schedules the learning rate using cosine annealing:
__init__ method should accept initial_lr (float), T_max (int), and min_lr (float) as parametersget_lr(self, epoch: int) method should return the learning rate for the given epochmath module for trigonometric functionsinitial_lr = 0.1
T_max = 10
min_lr = 0.001
epoch = 00.1At epoch 0 (the starting point), the cosine term equals cos(0) = 1, so the formula yields:
η = 0.001 + 0.5 × (0.1 - 0.001) × (1 + 1) = 0.001 + 0.5 × 0.099 × 2 = 0.001 + 0.099 = 0.1
The learning rate starts at the initial value of 0.1.
initial_lr = 0.1
T_max = 10
min_lr = 0.001
epoch = 50.0505At the halfway point (epoch 5 of 10), the cosine term equals cos(π/2) = 0, yielding:
η = 0.001 + 0.5 × (0.1 - 0.001) × (1 + 0) = 0.001 + 0.0495 = 0.0505
The learning rate is exactly at the midpoint between the initial and minimum values.
initial_lr = 0.1
T_max = 10
min_lr = 0.001
epoch = 100.001At T_max (epoch 10), the cosine term equals cos(π) = -1, giving:
η = 0.001 + 0.5 × (0.1 - 0.001) × (1 + (-1)) = 0.001 + 0.5 × 0.099 × 0 = 0.001
The learning rate reaches its minimum value of 0.001, completing one full cosine decay cycle.
Constraints