Loading content...
At the heart of every modern deep learning framework lies an automatic differentiation engine (commonly called autograd). This powerful mechanism automatically computes gradients of complex mathematical expressions—enabling neural networks to learn through backpropagation without requiring engineers to manually derive and implement gradient formulas.
In this problem, you will build a simplified version of such an engine by implementing a Value class that wraps scalar values and tracks the operations performed on them. Your implementation must support three core operations:
When two Value objects are added, the result is a new Value containing their sum. During backpropagation, the gradient flows equally to both operands:
$$\frac{\partial (a + b)}{\partial a} = 1, \quad \frac{\partial (a + b)}{\partial b} = 1$$
When two Value objects are multiplied, the result is their product. The gradients follow the product rule:
$$\frac{\partial (a \cdot b)}{\partial a} = b, \quad \frac{\partial (a \cdot b)}{\partial b} = a$$
The Rectified Linear Unit activation function returns the input if it's positive, or zero otherwise:
$$\text{ReLU}(x) = \max(0, x)$$
The gradient of ReLU is:
$$\frac{\partial \text{ReLU}(x)}{\partial x} = \begin{cases} 1 & \text{if } x > 0 \ 0 & \text{if } x \leq 0 \end{cases}$$
Your Value class must implement a backward() method that performs backpropagation from the output node back through the entire computational graph. This involves:
The Chain Rule:
If y = f(g(x)), then dy/dx = (dy/dg) × (dg/dx)
Your Task:
Implement the Value class with __init__, __add__, __mul__, relu, and backward methods. After calling backward() on the final output, each Value object's grad attribute should contain the gradient of the output with respect to that value.
values = {'a': 2, 'b': -3, 'c': 10}
expression = "(a + b * c).relu()"{'a': {'data': 2, 'grad': 0.0}, 'b': {'data': -3, 'grad': 0.0}, 'c': {'data': 10, 'grad': 0.0}}Let's trace through the computation:
Forward Pass: • b * c = (-3) × 10 = -30 • a + (b * c) = 2 + (-30) = -28 • ReLU(-28) = max(0, -28) = 0
Backward Pass: Starting with output gradient = 1.0: • The ReLU output is 0 (since -28 ≤ 0), so the ReLU gradient is 0 • This means zero gradient flows backward to all inputs • Therefore: grad(a) = 0, grad(b) = 0, grad(c) = 0
The ReLU "kills" the gradient because the input was negative, blocking all gradient flow to the upstream variables.
values = {'a': 3, 'b': 4}
expression = "a + b"{'a': {'data': 3, 'grad': 1.0}, 'b': {'data': 4, 'grad': 1.0}}Forward Pass: • a + b = 3 + 4 = 7
Backward Pass: Starting with output gradient = 1.0: • For addition: ∂(a+b)/∂a = 1 and ∂(a+b)/∂b = 1 • Applying chain rule: grad(a) = 1.0 × 1 = 1.0, grad(b) = 1.0 × 1 = 1.0
Both inputs contribute equally to the sum, so they receive equal gradients.
values = {'a': 2, 'b': 5}
expression = "a * b"{'a': {'data': 2, 'grad': 5.0}, 'b': {'data': 5, 'grad': 2.0}}Forward Pass: • a * b = 2 × 5 = 10
Backward Pass: Starting with output gradient = 1.0: • For multiplication: ∂(ab)/∂a = b = 5 and ∂(ab)/∂b = a = 2 • Applying chain rule: grad(a) = 1.0 × 5 = 5.0, grad(b) = 1.0 × 2 = 2.0
The gradient of each variable in a product equals the value of the other variable. This reflects how much the output changes when each input changes by a small amount.
Constraints