00:00:00

Description

Editorial

Inference Request Batching

MEDIUM20 pts

In production machine learning systems, processing individual inference requests one at a time is highly inefficient. Request batching is a critical optimization technique that groups multiple incoming requests together, enabling hardware accelerators (GPUs, TPUs) to leverage their parallel processing capabilities and dramatically improve throughput.

Your task is to implement an intelligent batching mechanism for ML model inference that balances two competing constraints:

Batch Size Constraint: A batch should not exceed a specified maximum number of requests, ensuring memory constraints are respected and batch processing remains efficient.
Latency Constraint: The elapsed time since the first request in the current batch should not exceed a specified wait time threshold, ensuring no request experiences excessive queuing delays.

Batching Logic: When a new request arrives, it should be added to the current batch if both constraints remain satisfied. If adding the request would violate either constraint, the current batch must be finalized and processed, and a new batch should begin with the incoming request.

Important Implementation Details:

Requests must be processed in chronological order based on their timestamps, regardless of their order in the input list
The processing timestamp for each batch is the timestamp of the last request added to that batch
All processing timestamps should be rounded to 4 decimal places
Any remaining requests at the end should form the final batch

This batching strategy is fundamental to building high-performance ML serving infrastructure and relates directly to reducing per-request latency while maximizing hardware utilization.

Example

Input

requests = [{'id': 1, 'timestamp': 0.0, 'features': [1.0, 2.0]}, {'id': 2, 'timestamp': 0.1, 'features': [3.0, 4.0]}, {'id': 3, 'timestamp': 0.2, 'features': [5.0, 6.0]}, {'id': 4, 'timestamp': 0.3, 'features': [7.0, 8.0]}]
max_batch_size = 2
max_wait_time = 10.0

Output

[([1, 2], [[1.0, 2.0], [3.0, 4.0]], 0.1), ([3, 4], [[5.0, 6.0], [7.0, 8.0]], 0.3)]

Explanation

With max_batch_size=2 and max_wait_time=10.0 seconds:

• Batch 1: Request 1 (t=0.0) starts the batch. Request 2 (t=0.1) arrives within the wait window and doesn't exceed batch size, so it joins. Batch reaches max size (2), so it's processed at t=0.1.

• Batch 2: Request 3 (t=0.2) starts a new batch. Request 4 (t=0.3) joins without violating constraints. Batch reaches max size, processed at t=0.3.

The wait time constraint (10.0s) is never triggered since all inter-arrival gaps are ≤0.3 seconds.

Example

Input

requests = [{'id': 1, 'timestamp': 0.0, 'features': [1.0, 2.0]}, {'id': 2, 'timestamp': 0.05, 'features': [3.0, 4.0]}, {'id': 3, 'timestamp': 0.15, 'features': [5.0, 6.0]}, {'id': 4, 'timestamp': 0.2, 'features': [7.0, 8.0]}]
max_batch_size = 10
max_wait_time = 0.1

Output

[([1, 2], [[1.0, 2.0], [3.0, 4.0]], 0.05), ([3, 4], [[5.0, 6.0], [7.0, 8.0]], 0.2)]

Explanation

With max_batch_size=10 (high) and max_wait_time=0.1 seconds (tight):

• Batch 1: Request 1 (t=0.0) starts the batch. Request 2 (t=0.05) arrives, and elapsed time is 0.05s ≤ 0.1s, so it joins. Request 3 (t=0.15) would cause elapsed time of 0.15s > 0.1s from first request, triggering batch finalization at t=0.05.

• Batch 2: Request 3 (t=0.15) starts new batch. Request 4 (t=0.2) arrives with elapsed time 0.05s ≤ 0.1s. All remaining requests processed at t=0.2.

Here, the wait time constraint drives batch formation, not batch size.

Example

Input

requests = [{'id': 3, 'timestamp': 0.2, 'features': [5.0, 6.0]}, {'id': 1, 'timestamp': 0.0, 'features': [1.0, 2.0]}, {'id': 4, 'timestamp': 0.3, 'features': [7.0, 8.0]}, {'id': 2, 'timestamp': 0.1, 'features': [3.0, 4.0]}]
max_batch_size = 2
max_wait_time = 10.0

Output

[([1, 2], [[1.0, 2.0], [3.0, 4.0]], 0.1), ([3, 4], [[5.0, 6.0], [7.0, 8.0]], 0.3)]

Explanation

The requests arrive in non-chronological order (IDs: 3, 1, 4, 2), but they must be processed by timestamp:

• First, sort by timestamp: Request 1 (t=0.0), Request 2 (t=0.1), Request 3 (t=0.2), Request 4 (t=0.3)

• Then apply the same batching logic: Requests 1 & 2 form Batch 1, Requests 3 & 4 form Batch 2.

This demonstrates that the batching algorithm must first establish temporal ordering before applying batch constraints.

Accepted0/0·0% Acceptance

Constraints

1 ≤ number of requests ≤ 10,000
1 ≤ max_batch_size ≤ 1,000
0.001 ≤ max_wait_time ≤ 3600.0 (seconds)
0.0 ≤ timestamp ≤ 86400.0 (represents time within a 24-hour period)
Each request has a unique integer 'id' field
Timestamps may not be in sorted order in the input
Feature vectors can have length 1 to 1,000
Feature values are floating-point numbers in range [-10⁶, 10⁶]

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

requests =

[{"id":1,"timestamp":0,"features":[1,2]},{"id":2,"timestamp":0.1,"features":[3,4]},{"id":3,"timestamp":0.2,"features":[5,6]},{"id":4,"timestamp":0.3,"features":[7,8]}]

max_wait_time =

max_batch_size =

Loading problem...

101

00:00:00

Description

Editorial

Inference Request Batching

MEDIUM20 pts

Your task is to implement an intelligent batching mechanism for ML model inference that balances two competing constraints:

Batch Size Constraint: A batch should not exceed a specified maximum number of requests, ensuring memory constraints are respected and batch processing remains efficient.
Latency Constraint: The elapsed time since the first request in the current batch should not exceed a specified wait time threshold, ensuring no request experiences excessive queuing delays.

Important Implementation Details:

Requests must be processed in chronological order based on their timestamps, regardless of their order in the input list
The processing timestamp for each batch is the timestamp of the last request added to that batch
All processing timestamps should be rounded to 4 decimal places
Any remaining requests at the end should form the final batch

This batching strategy is fundamental to building high-performance ML serving infrastructure and relates directly to reducing per-request latency while maximizing hardware utilization.

Example

Input

requests = [{'id': 1, 'timestamp': 0.0, 'features': [1.0, 2.0]}, {'id': 2, 'timestamp': 0.1, 'features': [3.0, 4.0]}, {'id': 3, 'timestamp': 0.2, 'features': [5.0, 6.0]}, {'id': 4, 'timestamp': 0.3, 'features': [7.0, 8.0]}]
max_batch_size = 2
max_wait_time = 10.0

Output

[([1, 2], [[1.0, 2.0], [3.0, 4.0]], 0.1), ([3, 4], [[5.0, 6.0], [7.0, 8.0]], 0.3)]

Explanation

With max_batch_size=2 and max_wait_time=10.0 seconds:

• Batch 2: Request 3 (t=0.2) starts a new batch. Request 4 (t=0.3) joins without violating constraints. Batch reaches max size, processed at t=0.3.

The wait time constraint (10.0s) is never triggered since all inter-arrival gaps are ≤0.3 seconds.

Example

Input

requests = [{'id': 1, 'timestamp': 0.0, 'features': [1.0, 2.0]}, {'id': 2, 'timestamp': 0.05, 'features': [3.0, 4.0]}, {'id': 3, 'timestamp': 0.15, 'features': [5.0, 6.0]}, {'id': 4, 'timestamp': 0.2, 'features': [7.0, 8.0]}]
max_batch_size = 10
max_wait_time = 0.1

Output

[([1, 2], [[1.0, 2.0], [3.0, 4.0]], 0.05), ([3, 4], [[5.0, 6.0], [7.0, 8.0]], 0.2)]

Explanation

With max_batch_size=10 (high) and max_wait_time=0.1 seconds (tight):

• Batch 2: Request 3 (t=0.15) starts new batch. Request 4 (t=0.2) arrives with elapsed time 0.05s ≤ 0.1s. All remaining requests processed at t=0.2.

Here, the wait time constraint drives batch formation, not batch size.

Example

Input

requests = [{'id': 3, 'timestamp': 0.2, 'features': [5.0, 6.0]}, {'id': 1, 'timestamp': 0.0, 'features': [1.0, 2.0]}, {'id': 4, 'timestamp': 0.3, 'features': [7.0, 8.0]}, {'id': 2, 'timestamp': 0.1, 'features': [3.0, 4.0]}]
max_batch_size = 2
max_wait_time = 10.0

Output

[([1, 2], [[1.0, 2.0], [3.0, 4.0]], 0.1), ([3, 4], [[5.0, 6.0], [7.0, 8.0]], 0.3)]

Explanation

The requests arrive in non-chronological order (IDs: 3, 1, 4, 2), but they must be processed by timestamp:

• First, sort by timestamp: Request 1 (t=0.0), Request 2 (t=0.1), Request 3 (t=0.2), Request 4 (t=0.3)

• Then apply the same batching logic: Requests 1 & 2 form Batch 1, Requests 3 & 4 form Batch 2.

This demonstrates that the batching algorithm must first establish temporal ordering before applying batch constraints.

Accepted0/0·0% Acceptance

Constraints

1 ≤ number of requests ≤ 10,000
1 ≤ max_batch_size ≤ 1,000
0.001 ≤ max_wait_time ≤ 3600.0 (seconds)
0.0 ≤ timestamp ≤ 86400.0 (represents time within a 24-hour period)
Each request has a unique integer 'id' field
Timestamps may not be in sorted order in the input
Feature vectors can have length 1 to 1,000
Feature values are floating-point numbers in range [-10⁶, 10⁶]

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

requests =

[{"id":1,"timestamp":0,"features":[1,2]},{"id":2,"timestamp":0.1,"features":[3,4]},{"id":3,"timestamp":0.2,"features":[5,6]},{"id":4,"timestamp":0.3,"features":[7,8]}]

max_wait_time =

max_batch_size =

Inference Request Batching

Hints

Inference Request Batching

Hints