Loading content...
During the training of deep neural networks, gradient magnitudes can become excessively large, leading to unstable weight updates and potential divergence of the optimization process. This phenomenon, commonly known as exploding gradients, is particularly prevalent in deep architectures, recurrent neural networks, and models with long dependency chains.
To combat this issue, practitioners employ bounded gradient constraints — a technique that restricts each gradient element to lie within a predetermined symmetric interval. This ensures that no single gradient component can dominate the update step, maintaining stable and controlled learning dynamics.
The Bounding Mechanism:
Given an array of gradients G and a bounding threshold τ (clip_value), each element gᵢ is transformed according to the following rule:
$$g'_i = \begin{cases} \tau & \text{if } g_i > \tau \ -\tau & \text{if } g_i < -\tau \ g_i & \text{otherwise} \end{cases}$$
This element-wise operation ensures that all gradient values fall within the closed interval [-τ, τ], effectively capping extreme values while preserving gradients that are already within acceptable bounds.
Your Task:
Implement a function that applies bounded gradient constraints to an input array. The function should:
gradients = [1.0, -2.0, 3.0, -0.5, 0.2]
clip_value = 1.5[1.0, -1.5, 1.5, -0.5, 0.2]Applying the bounding constraint with τ = 1.5:
• 1.0 is within [-1.5, 1.5] → remains 1.0 • -2.0 is below -1.5 → bounded to -1.5 • 3.0 is above 1.5 → bounded to 1.5 • -0.5 is within [-1.5, 1.5] → remains -0.5 • 0.2 is within [-1.5, 1.5] → remains 0.2
The resulting constrained gradient array is [1.0, -1.5, 1.5, -0.5, 0.2].
gradients = [0.5, -0.3, 0.8, -0.1, 0.0]
clip_value = 1[0.5, -0.3, 0.8, -0.1, 0.0]All gradient values are already within the range [-1, 1], so no bounding is applied. The output matches the input exactly:
• 0.5 ∈ [-1, 1] ✓ • -0.3 ∈ [-1, 1] ✓ • 0.8 ∈ [-1, 1] ✓ • -0.1 ∈ [-1, 1] ✓ • 0.0 ∈ [-1, 1] ✓
This demonstrates that bounded gradient constraint is a no-op when gradients are well-behaved.
gradients = [[1.5, -2.5, 0.5], [-1.0, 3.0, -0.2]]
clip_value = 2[[1.5, -2.0, 0.5], [-1.0, 2.0, -0.2]]For a 2D gradient matrix with τ = 2, the bounding is applied element-wise across all dimensions:
Row 1: • 1.5 is within [-2, 2] → remains 1.5 • -2.5 is below -2 → bounded to -2.0 • 0.5 is within [-2, 2] → remains 0.5
Row 2: • -1.0 is within [-2, 2] → remains -1.0 • 3.0 is above 2 → bounded to 2.0 • -0.2 is within [-2, 2] → remains -0.2
The function preserves the original 2×3 shape while constraining extreme values.
Constraints