Loading content...
In computer vision and video analysis, quantifying the accuracy of motion estimation algorithms is essential for evaluating performance. The End-Point Error (EPE) is a widely adopted metric that measures the Euclidean distance between predicted and actual displacement vectors at each pixel location.
Given a predicted motion vector field pred and a ground-truth motion vector field gt, both represented as 3D arrays with shape (H, W, 2) where H is the height, W is the width, and the third dimension contains the horizontal and vertical displacement components (dx, dy), your task is to compute the mean EPE across the entire field.
End-Point Error Definition:
For each pixel at position (i, j), the EPE is calculated as the Euclidean distance between the predicted and ground-truth vectors:
$$EPE_{i,j} = \sqrt{(pred_{i,j,0} - gt_{i,j,0})^2 + (pred_{i,j,1} - gt_{i,j,1})^2}$$
Optional Features:
Validity Masking: An optional binary mask with shape (H, W) can specify which pixels are valid for evaluation. Pixels where the mask value is 0 (or equivalently falsy) should be excluded from the calculation. Only pixels where the mask value is 1 (or truthy) contribute to the mean EPE.
Outlier Clipping: An optional max_flow parameter can be used to clip individual EPE values. If a pixel's EPE exceeds max_flow, it should be clamped to max_flow before averaging.
Invalid Value Handling: Your implementation must gracefully handle invalid numeric values such as NaN or ±Infinity in the input arrays. These invalid values should be excluded from the calculation similar to masked-out pixels.
Your Task:
Write a Python function that computes the mean End-Point Error between predicted and ground-truth motion vector fields. The function should:
pred = [[[1, 0], [0, 1]], [[-1, 0], [0, -1]]]
gt = [[[0, 0], [0, 0]], [[0, 0], [0, 0]]]1.0The motion field has 4 pixels (2×2). At each pixel, the predicted vector differs from the ground-truth by exactly 1 unit in one direction:
• Pixel (0,0): pred = (1, 0), gt = (0, 0) → EPE = √(1² + 0²) = 1.0 • Pixel (0,1): pred = (0, 1), gt = (0, 0) → EPE = √(0² + 1²) = 1.0 • Pixel (1,0): pred = (-1, 0), gt = (0, 0) → EPE = √(1² + 0²) = 1.0 • Pixel (1,1): pred = (0, -1), gt = (0, 0) → EPE = √(0² + 1²) = 1.0
Mean EPE = (1.0 + 1.0 + 1.0 + 1.0) / 4 = 1.0
pred = [[[2, 0], [0, 0]], [[0, 0], [0, 2]]]
gt = [[[0, 0], [0, 0]], [[0, 0], [0, 0]]]1.0Computing the EPE for each pixel in this 2×2 motion field:
• Pixel (0,0): pred = (2, 0), gt = (0, 0) → EPE = √(2² + 0²) = 2.0 • Pixel (0,1): pred = (0, 0), gt = (0, 0) → EPE = √(0² + 0²) = 0.0 • Pixel (1,0): pred = (0, 0), gt = (0, 0) → EPE = √(0² + 0²) = 0.0 • Pixel (1,1): pred = (0, 2), gt = (0, 0) → EPE = √(0² + 2²) = 2.0
Mean EPE = (2.0 + 0.0 + 0.0 + 2.0) / 4 = 1.0
pred = [[[3, 4], [1, 0]], [[0, 1], [5, 12]]]
gt = [[[0, 0], [0, 0]], [[0, 0], [0, 0]]]
mask = [[1, 0], [0, 1]]9.0With a validity mask applied, only pixels where mask = 1 contribute to the calculation:
• Pixel (0,0): mask = 1, pred = (3, 4), gt = (0, 0) → EPE = √(3² + 4²) = √25 = 5.0 ✓ • Pixel (0,1): mask = 0 → excluded from calculation • Pixel (1,0): mask = 0 → excluded from calculation • Pixel (1,1): mask = 1, pred = (5, 12), gt = (0, 0) → EPE = √(5² + 12²) = √169 = 13.0 ✓
Mean EPE (valid pixels only) = (5.0 + 13.0) / 2 = 9.0
Constraints