0/318

00:00:00

Description

Editorial

Inference Performance Metrics Calculator

EASY10 pts

In production machine learning systems, monitoring inference performance is mission-critical for maintaining service quality, detecting model degradation, and ensuring that service-level agreements (SLAs) are met. Operations teams rely on real-time dashboards that display key performance indicators derived from inference latency measurements.

Given a collection of inference latency measurements (in milliseconds) from a deployed ML model, your task is to compute the following essential monitoring statistics:

1. Throughput (requests per second): The theoretical maximum number of inference requests that can be processed per second, assuming single-threaded sequential processing. This is calculated as:

$$\text{Throughput} = \frac{1000}{\text{Average Latency (ms)}}$$

2. Average Latency: The arithmetic mean of all latency measurements, providing a general sense of typical response time:

$$\text{Average Latency} = \frac{1}{n} \sum_{i=1}^{n} \text{latency}_i$$

3. Percentile Latencies (p50, p95, p99): Percentiles are crucial for understanding the tail latency distribution—the experience of the slowest requests that often impacts user satisfaction the most.

p50 (median): The latency below which 50% of requests complete
p95: The latency below which 95% of requests complete
p99: The latency below which 99% of requests complete

For percentile calculations, use linear interpolation between adjacent values when the percentile falls between two data points.

Percentile Calculation with Linear Interpolation:

For a sorted array of n latencies and a target percentile p (expressed as a decimal, e.g., 0.95 for p95):

Compute the rank: ( r = p \times (n - 1) )
Find the lower index: ( i = \lfloor r \rfloor )
Compute the fractional part: ( f = r - i )
Interpolate: ( \text{percentile} = \text{data}[i] + f \times (\text{data}[i+1] - \text{data}[i]) )

Your Task: Implement a function that takes a list of latency measurements and returns a dictionary containing all computed statistics. If the input list is empty, return an empty dictionary.

Important Notes:

Round throughput to 2 decimal places
All other values should maintain floating-point precision
Handle edge cases gracefully (empty input, single measurement, identical measurements)

Example

Input

latencies_ms = [10, 20, 30, 40, 50]

Output

{"throughput_per_sec":33.33,"avg_latency_ms":30.0,"p50_ms":30.0,"p95_ms":48.0,"p99_ms":49.6}

Explanation

With 5 latency measurements [10, 20, 30, 40, 50]:

Average Latency: (10 + 20 + 30 + 40 + 50) / 5 = 150 / 5 = 30.0 ms

Throughput: 1000 / 30.0 = 33.33 requests/second

Percentile Calculations (n = 5, so n - 1 = 4): • p50: rank = 0.50 × 4 = 2.0 → data[2] = 30.0 ms (exact index, no interpolation) • p95: rank = 0.95 × 4 = 3.8 → i = 3, f = 0.8 Interpolation: data[3] + 0.8 × (data[4] - data[3]) = 40 + 0.8 × 10 = 48.0 ms • p99: rank = 0.99 × 4 = 3.96 → i = 3, f = 0.96
Interpolation: 40 + 0.96 × (50 - 40) = 40 + 9.6 = 49.6 ms

Example

Input

latencies_ms = [5.0, 10.0, 15.0]

Output

{"throughput_per_sec":100.0,"avg_latency_ms":10.0,"p50_ms":10.0,"p95_ms":14.5,"p99_ms":14.9}

Explanation

With 3 latency measurements [5.0, 10.0, 15.0]:

Average Latency: (5.0 + 10.0 + 15.0) / 3 = 30.0 / 3 = 10.0 ms

Throughput: 1000 / 10.0 = 100.0 requests/second

Percentile Calculations (n = 3, so n - 1 = 2): • p50: rank = 0.50 × 2 = 1.0 → data[1] = 10.0 ms • p95: rank = 0.95 × 2 = 1.9 → i = 1, f = 0.9 Interpolation: 10.0 + 0.9 × (15.0 - 10.0) = 10.0 + 4.5 = 14.5 ms • p99: rank = 0.99 × 2 = 1.98 → i = 1, f = 0.98 Interpolation: 10.0 + 0.98 × (15.0 - 10.0) = 10.0 + 4.9 = 14.9 ms

Example

Input

latencies_ms = [25.0, 25.0, 25.0, 25.0]

Output

{"throughput_per_sec":40.0,"avg_latency_ms":25.0,"p50_ms":25.0,"p95_ms":25.0,"p99_ms":25.0}

Explanation

With 4 identical latency measurements [25.0, 25.0, 25.0, 25.0]:

Average Latency: Since all values are the same, the average is 25.0 ms

Throughput: 1000 / 25.0 = 40.0 requests/second

Percentile Calculations: When all values are identical, every percentile equals that value (25.0 ms). The interpolation formula still works correctly: • Any interpolation between 25.0 and 25.0 yields 25.0

This scenario represents highly consistent model performance—a desirable characteristic in production systems where predictable latency is valued.

Accepted0/0·0% Acceptance

Constraints

0 ≤ length of latencies_ms ≤ 10,000
0 < latencies_ms[i] ≤ 10,000 (if list is non-empty)
All latency values are positive floating-point numbers
Throughput must be rounded to exactly 2 decimal places
Use linear interpolation for percentile calculations (not nearest-rank method)
Return an empty dictionary {} for empty input

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

latencies_ms =

[10,20,30,40,50]

Inference Performance Metrics Calculator

Hints

Inference Performance Metrics Calculator

Hints