0/318

00:00:00

Description

Editorial

Hybrid Precision Computation for Deep Learning

MEDIUM20 pts

In modern deep learning, training efficiency is paramount. Large-scale neural networks can have billions of parameters, making memory bandwidth and computational throughput critical bottlenecks. Hybrid Precision Computation is a powerful technique that strategically combines different numerical precisions—using lower precision (float16 or bfloat16) for speed-critical operations while maintaining higher precision (float32) where numerical stability is essential.

The core insight is that neural network forward passes are robust to reduced precision because the values involved are typically within a reasonable range, and small rounding errors don't significantly affect the outcome. However, gradients during backpropagation can be very small (especially in deep networks), risking underflow to zero in reduced precision formats.

Loss Scaling addresses this challenge elegantly:

Forward Pass: Convert inputs to float16, perform computations, then scale the loss by a large factor (e.g., 1024.0)
Backward Pass: Compute gradients normally, then unscale them by dividing by the same factor
Gradient Checking: Detect overflow (inf/nan values) that may occur; if detected, skip the parameter update

This approach preserves representable gradient magnitudes in float16 range while keeping the final gradient values accurate in float32.

Your Task: Implement a HybridPrecisionTrainer class in Python using NumPy that performs hybrid precision training:

__init__(self, loss_scale=1024.0): Initialize with a loss scaling factor
forward(self, weights, inputs, targets):
- Convert inputs to float16 for computation
- Compute predictions as the dot product of inputs and weights
- Calculate Mean Squared Error (MSE) loss: $\text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(\text{pred}_i - \text{target}_i)^2$
- Scale the loss by the loss_scale factor
- Return the scaled loss as a Python float (float32 precision)
backward(self, gradients):
- Convert gradients to float32 if not already
- Unscale gradients by dividing by loss_scale
- Check for overflow (any inf or nan values in the result)
- If overflow detected, return an array of zeros with the same shape
- Return the unscaled gradients as a float32 numpy array

Example

Input

weights = [0.5, -0.3]
inputs = [[1.0, 2.0], [3.0, 4.0]]
targets = [1.0, 0.0]
gradients = [512.0, -256.0]
loss_scale = 1024.0
method = "combined"

Output

{"loss": 665.0, "gradients": [0.5, -0.25]}

Explanation

Forward Pass:

Convert inputs to float16: [[1.0, 2.0], [3.0, 4.0]]
Compute predictions:
- pred[0] = 1.0×0.5 + 2.0×(-0.3) = 0.5 - 0.6 = -0.1
- pred[1] = 3.0×0.5 + 4.0×(-0.3) = 1.5 - 1.2 = 0.3
Calculate MSE: ((−0.1 − 1.0)² + (0.3 − 0.0)²) / 2 = (1.21 + 0.09) / 2 ≈ 0.65
Scale loss: 0.65 × 1024.0 ≈ 665.0

Backward Pass:

Unscale gradients: [512.0/1024.0, -256.0/1024.0] = [0.5, -0.25]
No overflow detected (no inf/nan values)
Return [0.5, -0.25] as float32

Example

Input

weights = [1.0, 1.0]
inputs = [[1.0, 0.0], [0.0, 1.0]]
targets = [1.0, 1.0]
loss_scale = 1024.0
method = "forward"

Output

0.0

Explanation

Forward Pass:

Predictions: pred[0] = 1×1 + 0×1 = 1.0, pred[1] = 0×1 + 1×1 = 1.0
MSE = ((1.0 - 1.0)² + (1.0 - 1.0)²) / 2 = 0.0
Scaled loss = 0.0 × 1024.0 = 0.0

The predictions exactly match the targets, so the loss is zero.

Example

Input

gradients = [1024.0, 2048.0, 512.0]
loss_scale = 1024.0
method = "backward"

Output

[1.0, 2.0, 0.5]

Explanation

Backward Pass:

Unscale gradients by dividing by loss_scale (1024.0):
- 1024.0 / 1024.0 = 1.0
- 2048.0 / 1024.0 = 2.0
- 512.0 / 1024.0 = 0.5
Check for overflow: No inf or nan values present
Return [1.0, 2.0, 0.5] as float32 numpy array

Accepted0/0·0% Acceptance

Constraints

loss_scale is a positive float, typically a power of 2 (e.g., 512.0, 1024.0, 2048.0)
All input arrays (weights, inputs, targets, gradients) are provided as float32
The inputs array is 2D with shape (num_samples, num_features)
The weights array length must match the number of features in inputs
The targets array length must match the number of samples in inputs
Use float16 for forward pass computations, float32 for gradient accumulation
Return the loss as a Python float, gradients as a float32 numpy array
If gradients contain inf or nan after unscaling, return zeros array of same shape

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

inputs =

[[1,2],[3,4]]

method =

"combined"

targets =

[1,0]

weights =

[0.5,-0.30000001192092896]

gradients =

[512,-256]

loss_scale =

1024

Hybrid Precision Computation for Deep Learning

Hints

Hybrid Precision Computation for Deep Learning

Hints