Loading content...
In production machine learning systems, Service Level Agreement (SLA) monitoring is a critical component of MLOps infrastructure. When deploying ML models as real-time inference endpoints, engineering teams establish performance guarantees that must be continuously tracked and reported.
Your ML inference service handles thousands of prediction requests per minute. Each request is logged with its response latency and final status. As part of the observability pipeline, you need to compute essential SLA compliance metrics that enable the operations team to:
Input Specification:
You are given a list of request logs, where each entry is a dictionary containing:
status: A string indicating the request outcome — one of 'success', 'error', or 'timeout'latency_ms: A float representing the time taken to process the request in millisecondsYou are also given a latency threshold (in milliseconds) that defines the maximum acceptable response time for SLA compliance.
Required Metrics:
Your function should compute and return a dictionary with the following three metrics:
Latency SLA Compliance: The percentage of successful requests that completed within the specified latency threshold. This measures how well the service is performing for requests that didn't fail.
Error Rate: The percentage of all requests that resulted in either an 'error' or a 'timeout'. This quantifies the service's reliability.
Overall SLA Compliance: The percentage of all requests that both succeeded AND met the latency threshold. This is the most comprehensive metric, reflecting true end-to-end service quality.
Special Cases:
{}latency_sla_compliance should be 0.0Key Insight:
These metrics drive critical production decisions. High latency compliance with low overall compliance indicates an error problem. Low latency compliance with low error rates suggests resource constraints. Tracking these independently enables precise diagnosis and remediation.
requests = [
{'status': 'success', 'latency_ms': 50},
{'status': 'success', 'latency_ms': 80},
{'status': 'success', 'latency_ms': 120},
{'status': 'error', 'latency_ms': 30},
{'status': 'timeout', 'latency_ms': 5000}
]
latency_threshold_ms = 100.0{'latency_sla_compliance': 66.67, 'error_rate': 40.0, 'overall_sla_compliance': 40.0}Total requests: 5
Successful requests: 3 (with latencies 50ms, 80ms, and 120ms) Failed requests: 2 (1 error + 1 timeout)
Latency SLA Compliance: Of the 3 successful requests, 2 completed within 100ms (50ms and 80ms). Calculation: (2 / 3) × 100 = 66.67%
Error Rate: 2 requests failed out of 5 total. Calculation: (2 / 5) × 100 = 40.0%
Overall SLA Compliance: Only 2 requests out of 5 both succeeded AND met the latency threshold. Calculation: (2 / 5) × 100 = 40.0%
requests = [
{'status': 'success', 'latency_ms': 10},
{'status': 'success', 'latency_ms': 20},
{'status': 'success', 'latency_ms': 30}
]
latency_threshold_ms = 50.0{'latency_sla_compliance': 100.0, 'error_rate': 0.0, 'overall_sla_compliance': 100.0}Total requests: 3
Successful requests: 3 (all requests succeeded) Failed requests: 0 (no errors or timeouts)
Latency SLA Compliance: All 3 successful requests (10ms, 20ms, 30ms) completed within the 50ms threshold. Calculation: (3 / 3) × 100 = 100.0%
Error Rate: No requests failed. Calculation: (0 / 3) × 100 = 0.0%
Overall SLA Compliance: All 3 requests succeeded AND met the latency threshold. Calculation: (3 / 3) × 100 = 100.0%
This represents an ideal scenario where the service is performing optimally.
requests = []
latency_threshold_ms = 100.0{}When the request list is empty, there is no data to compute metrics from. The function returns an empty dictionary to indicate that no meaningful statistics can be derived. This edge case is important for handling monitoring windows with no traffic or during system initialization.
Constraints