Loading problem...
In neural network training and inference, temperature is a hyperparameter that controls the sharpness of probability distributions. When applied to a softmax function, the temperature τ modifies the output distribution:
$$\text{softmax}_\tau(z_i) = \frac{e^{z_i / \tau}}{\sum_j e^{z_j / \tau}}$$
Why Dynamic Temperature Scheduling Matters:
Rather than using a fixed temperature throughout training or generation, many state-of-the-art techniques dynamically adjust temperature over time. This approach is critical in:
Your Task:
Implement a temperature scheduler that computes the current temperature value at any training step. Support four common scheduling strategies:
1. Linear Decay: $$T(t) = T_0 - (T_0 - T_{min}) \cdot \frac{t}{T_{total}}$$
The temperature decreases at a constant rate from the initial value to the final value.
2. Exponential Decay: $$T(t) = T_0 \cdot \left(\frac{T_{min}}{T_0}\right)^{\frac{t}{T_{total}}}$$
The temperature decreases multiplicatively, with faster initial decay that gradually slows.
3. Cosine Annealing: $$T(t) = T_{min} + \frac{1}{2}(T_0 - T_{min}) \cdot \left(1 + \cos\left(\frac{\pi \cdot t}{T_{total}}\right)\right)$$
A smooth, wave-like decay inspired by learning rate schedules, popular in transformer training.
4. Constant: $$T(t) = T_0$$
The temperature remains unchanged throughout training (baseline for comparison).
Return the computed temperature value rounded to 2 decimal places.
schedule = 'linear'
initial = 2.0
step = 500
total = 1000
final = 0.11.05Using linear interpolation:
• Progress = step / total = 500 / 1000 = 0.5 • Temperature = initial - (initial - final) × progress • Temperature = 2.0 - (2.0 - 0.1) × 0.5 • Temperature = 2.0 - 1.9 × 0.5 = 2.0 - 0.95 = 1.05
At the midpoint of training, the temperature is exactly halfway between the initial (2.0) and final (0.1) values.
schedule = 'exponential'
initial = 1.0
step = 500
total = 1000
final = 0.010.1Using exponential decay:
• Progress = step / total = 500 / 1000 = 0.5 • Decay ratio = final / initial = 0.01 / 1.0 = 0.01 • Temperature = initial × (decay_ratio)^progress • Temperature = 1.0 × (0.01)^0.5 = 1.0 × 0.1 = 0.1
Exponential decay at the midpoint gives the geometric mean of initial and final values.
schedule = 'cosine'
initial = 2.0
step = 500
total = 1000
final = 0.11.05Using cosine annealing:
• Progress = π × step / total = π × 500 / 1000 = π / 2 • cos(π / 2) = 0 • Temperature = final + 0.5 × (initial - final) × (1 + cos(progress)) • Temperature = 0.1 + 0.5 × (2.0 - 0.1) × (1 + 0) • Temperature = 0.1 + 0.5 × 1.9 × 1 = 0.1 + 0.95 = 1.05
At the midpoint, cosine annealing coincides with linear decay, but the curves differ elsewhere.
Constraints