0/318

00:00:00

Description

Editorial

Compute Service Level Agreement Metrics for ML Inference Endpoints

EASY10 pts

In production machine learning systems, Service Level Agreement (SLA) monitoring is a critical component of MLOps infrastructure. When deploying ML models as real-time inference endpoints, engineering teams establish performance guarantees that must be continuously tracked and reported.

Your ML inference service handles thousands of prediction requests per minute. Each request is logged with its response latency and final status. As part of the observability pipeline, you need to compute essential SLA compliance metrics that enable the operations team to:

Monitor performance degradation before it impacts end users
Trigger automated scaling decisions based on latency violations
Generate compliance reports for stakeholders and SLA audits
Identify systematic failures through error rate tracking

Input Specification:

You are given a list of request logs, where each entry is a dictionary containing:

status: A string indicating the request outcome — one of 'success', 'error', or 'timeout'
latency_ms: A float representing the time taken to process the request in milliseconds

You are also given a latency threshold (in milliseconds) that defines the maximum acceptable response time for SLA compliance.

Required Metrics:

Your function should compute and return a dictionary with the following three metrics:

Latency SLA Compliance: The percentage of successful requests that completed within the specified latency threshold. This measures how well the service is performing for requests that didn't fail.
Error Rate: The percentage of all requests that resulted in either an 'error' or a 'timeout'. This quantifies the service's reliability.
Overall SLA Compliance: The percentage of all requests that both succeeded AND met the latency threshold. This is the most comprehensive metric, reflecting true end-to-end service quality.

Special Cases:

If the input request list is empty, return an empty dictionary {}
If there are no successful requests, the latency_sla_compliance should be 0.0
All percentage values should be in the range 0 to 100, rounded to 2 decimal places

Key Insight:

These metrics drive critical production decisions. High latency compliance with low overall compliance indicates an error problem. Low latency compliance with low error rates suggests resource constraints. Tracking these independently enables precise diagnosis and remediation.

Example

Input

requests = [
    {'status': 'success', 'latency_ms': 50},
    {'status': 'success', 'latency_ms': 80},
    {'status': 'success', 'latency_ms': 120},
    {'status': 'error', 'latency_ms': 30},
    {'status': 'timeout', 'latency_ms': 5000}
]
latency_threshold_ms = 100.0

Output

{'latency_sla_compliance': 66.67, 'error_rate': 40.0, 'overall_sla_compliance': 40.0}

Explanation

Total requests: 5

Successful requests: 3 (with latencies 50ms, 80ms, and 120ms) Failed requests: 2 (1 error + 1 timeout)

Latency SLA Compliance: Of the 3 successful requests, 2 completed within 100ms (50ms and 80ms). Calculation: (2 / 3) × 100 = 66.67%

Error Rate: 2 requests failed out of 5 total. Calculation: (2 / 5) × 100 = 40.0%

Overall SLA Compliance: Only 2 requests out of 5 both succeeded AND met the latency threshold. Calculation: (2 / 5) × 100 = 40.0%

Example

Input

requests = [
    {'status': 'success', 'latency_ms': 10},
    {'status': 'success', 'latency_ms': 20},
    {'status': 'success', 'latency_ms': 30}
]
latency_threshold_ms = 50.0

Output

{'latency_sla_compliance': 100.0, 'error_rate': 0.0, 'overall_sla_compliance': 100.0}

Explanation

Total requests: 3

Successful requests: 3 (all requests succeeded) Failed requests: 0 (no errors or timeouts)

Latency SLA Compliance: All 3 successful requests (10ms, 20ms, 30ms) completed within the 50ms threshold. Calculation: (3 / 3) × 100 = 100.0%

Error Rate: No requests failed. Calculation: (0 / 3) × 100 = 0.0%

Overall SLA Compliance: All 3 requests succeeded AND met the latency threshold. Calculation: (3 / 3) × 100 = 100.0%

This represents an ideal scenario where the service is performing optimally.

Example

Input

requests = []
latency_threshold_ms = 100.0

Output

{}

Explanation

When the request list is empty, there is no data to compute metrics from. The function returns an empty dictionary to indicate that no meaningful statistics can be derived. This edge case is important for handling monitoring windows with no traffic or during system initialization.

Accepted0/0·0% Acceptance

Constraints

0 ≤ length of requests ≤ 10,000
0 < latency_ms ≤ 100,000 for each request
0 < latency_threshold_ms ≤ 100,000
status is always one of: 'success', 'error', or 'timeout'
All requests contain both 'status' and 'latency_ms' fields
Output percentages must be rounded to exactly 2 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

requests =

[{"status":"success","latency_ms":50},{"status":"success","latency_ms":80},{"status":"success","latency_ms":120},{"status":"error","latency_ms":30},{"status":"timeout","latency_ms":5000}]

latency_sla_ms =

100

Compute Service Level Agreement Metrics for ML Inference Endpoints

Hints

Compute Service Level Agreement Metrics for ML Inference Endpoints

Hints