Loading problem...
In deep neural networks, residual learning with skip connections has revolutionized how we train very deep architectures. The core insight behind residual networks (ResNets) is that instead of learning a direct mapping from input to output, the network learns a residual function that represents the difference between the desired output and the input.
A residual unit consists of:
The mathematical formulation is:
$$\mathbf{y} = \text{ReLU}(F(\mathbf{x}, {W_i}) + \mathbf{x})$$
Where:
Why Skip Connections Matter:
Your Task: Implement a function that computes a simple residual unit using NumPy. The unit should:
x = [1.0, 2.0]
w1 = [[1.0, 0.0], [0.0, 1.0]]
w2 = [[0.5, 0.0], [0.0, 0.5]][1.5, 3.0]Let's trace through the residual unit step by step:
Step 1: First linear transformation h₁ = w1 @ x = [[1.0, 0.0], [0.0, 1.0]] @ [1.0, 2.0] = [1.0, 2.0]
Step 2: First ReLU activation a₁ = ReLU(h₁) = ReLU([1.0, 2.0]) = [1.0, 2.0] (All values are positive, so ReLU keeps them unchanged)
Step 3: Second linear transformation h₂ = w2 @ a₁ = [[0.5, 0.0], [0.0, 0.5]] @ [1.0, 2.0] = [0.5, 1.0]
Step 4: Skip connection (add original input) sum = h₂ + x = [0.5, 1.0] + [1.0, 2.0] = [1.5, 3.0]
Step 5: Final ReLU activation output = ReLU([1.5, 3.0]) = [1.5, 3.0]
The final output is [1.5, 3.0].
x = [1.0, 0.0, -1.0]
w1 = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]
w2 = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]][2.0, 0.0, 0.0]With identity weight matrices and a mixed-sign input:
Step 1: First linear transformation h₁ = w1 @ x = I₃ @ [1.0, 0.0, -1.0] = [1.0, 0.0, -1.0] (Identity matrix preserves the input)
Step 2: First ReLU activation a₁ = ReLU([1.0, 0.0, -1.0]) = [1.0, 0.0, 0.0] (ReLU zeros out the negative value -1.0)
Step 3: Second linear transformation h₂ = w2 @ a₁ = I₃ @ [1.0, 0.0, 0.0] = [1.0, 0.0, 0.0]
Step 4: Skip connection sum = h₂ + x = [1.0, 0.0, 0.0] + [1.0, 0.0, -1.0] = [2.0, 0.0, -1.0]
Step 5: Final ReLU activation output = ReLU([2.0, 0.0, -1.0]) = [2.0, 0.0, 0.0]
The final output is [2.0, 0.0, 0.0], demonstrating how ReLU clips negative values at multiple stages.
x = [2.0, -3.0]
w1 = [[1.0, 0.0], [0.0, 1.0]]
w2 = [[0.5, 0.0], [0.0, 0.5]][3.0, 0.0]Starting with a partially negative input:
Step 1: First linear transformation h₁ = w1 @ x = [[1.0, 0.0], [0.0, 1.0]] @ [2.0, -3.0] = [2.0, -3.0]
Step 2: First ReLU activation a₁ = ReLU([2.0, -3.0]) = [2.0, 0.0] (The negative value -3.0 is clipped to 0)
Step 3: Second linear transformation h₂ = w2 @ a₁ = [[0.5, 0.0], [0.0, 0.5]] @ [2.0, 0.0] = [1.0, 0.0]
Step 4: Skip connection sum = h₂ + x = [1.0, 0.0] + [2.0, -3.0] = [3.0, -3.0]
Step 5: Final ReLU activation output = ReLU([3.0, -3.0]) = [3.0, 0.0]
The final output is [3.0, 0.0]. Notice how the second element becomes 0 because although the skip connection adds back the original -3.0, the final ReLU clips it to 0.
Constraints