Loading problem...
Parameter-Efficient Fine-Tuning with Low-Rank Decomposition
In modern deep learning, large pretrained models contain billions of parameters that are expensive to fully fine-tune for specific downstream tasks. Low-Rank Adaptation (LRA) is an elegant parameter-efficient fine-tuning technique that addresses this challenge by keeping the pretrained weights frozen and learning a low-rank decomposition of the weight updates.
The key insight behind LRA is that the weight updates during fine-tuning often lie in a low-rank subspace, meaning we can represent them efficiently using two smaller matrices instead of a full-sized update matrix. This dramatically reduces the number of trainable parameters while maintaining model quality.
Mathematical Formulation:
For a pretrained weight matrix W of dimensions (d_in × d_out), instead of computing:
$$h = x \cdot W_{fine-tuned}$$
LRA computes:
$$h = x \cdot W + \frac{\alpha}{r} \cdot (x \cdot B \cdot A)$$
Where:
The term α/r normalizes the adaptation to prevent it from dominating the pretrained weights, ensuring stable training and predictable behavior across different rank choices.
Why This Matters:
The product B · A represents a low-rank approximation to what would otherwise be a full (d_in × d_out) update matrix. By choosing a small rank r, the number of trainable parameters reduces from d_in × d_out to (d_in × r) + (r × d_out), which can be orders of magnitude smaller.
Your Task:
Implement the forward pass of a Low-Rank Adaptation layer. Given an input matrix x, frozen pretrained weights W, low-rank matrices B and A, and a scaling factor α, compute the output by combining the frozen path with the scaled low-rank adaptation path. The rank r should be inferred from the dimensions of matrices B and A (specifically, the number of columns in B or equivalently the number of rows in A).
x = [[1.0, 2.0]]
W = [[1.0, 0.0], [0.0, 1.0]]
B = [[1.0], [1.0]]
A = [[0.5, 0.5]]
alpha = 2.0[[4.0, 5.0]]Step-by-step computation:
Frozen path (x @ W):
Low-rank path (x @ B @ A):
Determine rank r:
Compute scaling factor:
Combine paths:
x = [[1.0, 0.0], [0.0, 1.0]]
W = [[2.0, 1.0], [1.0, 2.0]]
B = [[0.5, 0.5], [0.5, 0.5]]
A = [[1.0, 0.0], [0.0, 1.0]]
alpha = 1.0[[2.25, 1.25], [1.25, 2.25]]Step-by-step computation for batch input:
Frozen path (x @ W):
Low-rank path (x @ B @ A):
Determine rank r:
Compute scaling factor:
Combine paths:
x = [[2.0]]
W = [[3.0]]
B = [[1.0]]
A = [[1.0]]
alpha = 4.0[[14.0]]Minimal 1D example:
Frozen path: x @ W = [[2.0]] @ [[3.0]] = [[6.0]]
Low-rank path: x @ B @ A = [[2.0]] @ [[1.0]] @ [[1.0]] = [[2.0]]
Rank: r = 1 (B has 1 column)
Scaling: alpha / r = 4.0 / 1 = 4.0
Output: [[6.0]] + 4.0 × [[2.0]] = [[6.0]] + [[8.0]] = [[14.0]]
This example demonstrates how a larger alpha value amplifies the contribution of the low-rank adaptation, allowing it to significantly modify the output from the frozen pretrained weights.
Constraints