0/318

00:00:00

Description

Editorial

Motion Vector Endpoint Error Metric

MEDIUM20 pts

In computer vision and video analysis, quantifying the accuracy of motion estimation algorithms is essential for evaluating performance. The End-Point Error (EPE) is a widely adopted metric that measures the Euclidean distance between predicted and actual displacement vectors at each pixel location.

Given a predicted motion vector field pred and a ground-truth motion vector field gt, both represented as 3D arrays with shape (H, W, 2) where H is the height, W is the width, and the third dimension contains the horizontal and vertical displacement components (dx, dy), your task is to compute the mean EPE across the entire field.

End-Point Error Definition:

For each pixel at position (i, j), the EPE is calculated as the Euclidean distance between the predicted and ground-truth vectors:

$$EPE_{i,j} = \sqrt{(pred_{i,j,0} - gt_{i,j,0})^2 + (pred_{i,j,1} - gt_{i,j,1})^2}$$

Optional Features:

Validity Masking: An optional binary mask with shape (H, W) can specify which pixels are valid for evaluation. Pixels where the mask value is 0 (or equivalently falsy) should be excluded from the calculation. Only pixels where the mask value is 1 (or truthy) contribute to the mean EPE.
Outlier Clipping: An optional max_flow parameter can be used to clip individual EPE values. If a pixel's EPE exceeds max_flow, it should be clamped to max_flow before averaging.
Invalid Value Handling: Your implementation must gracefully handle invalid numeric values such as NaN or ±Infinity in the input arrays. These invalid values should be excluded from the calculation similar to masked-out pixels.

Your Task:

Write a Python function that computes the mean End-Point Error between predicted and ground-truth motion vector fields. The function should:

Accept Python lists or NumPy arrays as input
Support optional validity masking to exclude certain pixels
Support optional outlier clipping via the max_flow parameter
Filter out NaN and ±Infinity values automatically
Return -1 if inputs are malformed, dimensions are incompatible, or no valid pixels remain after all filtering

Example

Input

pred = [[[1, 0], [0, 1]], [[-1, 0], [0, -1]]]
gt = [[[0, 0], [0, 0]], [[0, 0], [0, 0]]]

Output

1.0

Explanation

The motion field has 4 pixels (2×2). At each pixel, the predicted vector differs from the ground-truth by exactly 1 unit in one direction:

• Pixel (0,0): pred = (1, 0), gt = (0, 0) → EPE = √(1² + 0²) = 1.0 • Pixel (0,1): pred = (0, 1), gt = (0, 0) → EPE = √(0² + 1²) = 1.0 • Pixel (1,0): pred = (-1, 0), gt = (0, 0) → EPE = √(1² + 0²) = 1.0 • Pixel (1,1): pred = (0, -1), gt = (0, 0) → EPE = √(0² + 1²) = 1.0

Mean EPE = (1.0 + 1.0 + 1.0 + 1.0) / 4 = 1.0

Example

Input

pred = [[[2, 0], [0, 0]], [[0, 0], [0, 2]]]
gt = [[[0, 0], [0, 0]], [[0, 0], [0, 0]]]

Output

1.0

Explanation

Computing the EPE for each pixel in this 2×2 motion field:

• Pixel (0,0): pred = (2, 0), gt = (0, 0) → EPE = √(2² + 0²) = 2.0 • Pixel (0,1): pred = (0, 0), gt = (0, 0) → EPE = √(0² + 0²) = 0.0 • Pixel (1,0): pred = (0, 0), gt = (0, 0) → EPE = √(0² + 0²) = 0.0 • Pixel (1,1): pred = (0, 2), gt = (0, 0) → EPE = √(0² + 2²) = 2.0

Mean EPE = (2.0 + 0.0 + 0.0 + 2.0) / 4 = 1.0

Example

Input

pred = [[[3, 4], [1, 0]], [[0, 1], [5, 12]]]
gt = [[[0, 0], [0, 0]], [[0, 0], [0, 0]]]
mask = [[1, 0], [0, 1]]

Output

9.0

Explanation

With a validity mask applied, only pixels where mask = 1 contribute to the calculation:

• Pixel (0,0): mask = 1, pred = (3, 4), gt = (0, 0) → EPE = √(3² + 4²) = √25 = 5.0 ✓ • Pixel (0,1): mask = 0 → excluded from calculation • Pixel (1,0): mask = 0 → excluded from calculation • Pixel (1,1): mask = 1, pred = (5, 12), gt = (0, 0) → EPE = √(5² + 12²) = √169 = 13.0 ✓

Mean EPE (valid pixels only) = (5.0 + 13.0) / 2 = 9.0

Accepted0/0·0% Acceptance

Constraints

1 ≤ H, W ≤ 1000 (spatial dimensions of the motion field)
The third dimension must always be exactly 2 (representing dx, dy)
-10⁶ ≤ motion vector values ≤ 10⁶
Mask values should be interpretable as binary (0 or 1)
max_flow, if provided, must be a positive finite number
Both pred and gt must have identical shapes
If mask is provided, it must have shape (H, W) matching the spatial dimensions
Input arrays may contain NaN or ±Infinity values which must be handled gracefully

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

gt =

[[[0,0],[0,0]],[[0,0],[0,0]]]

mask =

[[1,0],[0,1]]

pred =

[[[3,4],[1,0]],[[0,1],[5,12]]]

max_flow =

null

Motion Vector Endpoint Error Metric

Hints

Motion Vector Endpoint Error Metric

Hints