0/318

00:00:00

Description

Editorial

Dataset Batch Generator for Training Pipelines

EASY10 pts

In machine learning, training models on large datasets requires efficient data handling strategies. One of the most fundamental techniques is batch processing, where the dataset is divided into smaller, equally-sized groups called batches. Instead of loading the entire dataset into memory at once, batches allow for incremental processing, enabling training on datasets that would otherwise exceed available memory.

A batch generator (or batch iterator) is a utility function that sequentially partitions a dataset into fixed-size chunks. This technique is essential for:

Mini-batch gradient descent: Updates model weights using gradients computed from small subsets of data
Memory efficiency: Processes large datasets that don't fit in RAM
Regularization effect: Introduces beneficial noise through varying batch compositions

The Batching Process:

Given a feature matrix X with n samples and an optional target vector y of length n, the batch generator divides the data into ⌈n / batch_size⌉ sequential groups:

Full batches: Most batches contain exactly batch_size samples
Final batch: The last batch may contain fewer samples if n is not evenly divisible by batch_size

Your Task:

Implement a function that generates sequential batches from a numpy array X and an optional numpy array y. The function should:

Divide the input data into chunks of the specified batch_size
If y is provided, return batches as pairs [X_batch, y_batch]
If y is None, return batches containing only X_batch
Handle the final batch gracefully when the dataset size is not perfectly divisible by the batch size

Example

Input

X = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
y = [1, 2, 3, 4, 5]
batch_size = 2

Output

[[[[1, 2], [3, 4]], [1, 2]], [[[5, 6], [7, 8]], [3, 4]], [[[9, 10]], [5]]]

Explanation

The feature matrix X contains 5 samples with 2 features each, paired with a target vector y of 5 labels.

• Batch 1: First 2 samples → X_batch: [[1, 2], [3, 4]], y_batch: [1, 2] • Batch 2: Next 2 samples → X_batch: [[5, 6], [7, 8]], y_batch: [3, 4] • Batch 3: Remaining 1 sample → X_batch: [[9, 10]], y_batch: [5]

Since y is provided, each batch is returned as a [X_batch, y_batch] pair. The last batch contains only 1 sample because 5 is not evenly divisible by 2.

Example

Input

X = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
batch_size = 2

Output

[[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]

Explanation

The feature matrix X contains 4 samples with 3 features each, and no target vector y is provided.

• Batch 1: First 2 samples → [[1, 2, 3], [4, 5, 6]] • Batch 2: Next 2 samples → [[7, 8, 9], [10, 11, 12]]

Since y is None, each batch contains only the X_batch array. The dataset divides evenly into 2 batches of size 2.

Example

Input

X = [[1, 1], [2, 2], [3, 3], [4, 4]]
y = [10, 20, 30, 40]
batch_size = 2

Output

[[[[1, 1], [2, 2]], [10, 20]], [[[3, 3], [4, 4]], [30, 40]]]

Explanation

The feature matrix X contains 4 samples, and the target vector y contains 4 corresponding labels.

• Batch 1: Samples 0-1 → X_batch: [[1, 1], [2, 2]], y_batch: [10, 20] • Batch 2: Samples 2-3 → X_batch: [[3, 3], [4, 4]], y_batch: [30, 40]

Both batches contain exactly 2 samples since 4 is evenly divisible by 2.

Accepted0/0·0% Acceptance

Constraints

1 ≤ n (number of samples) ≤ 10,000
1 ≤ m (number of features) ≤ 1,000
1 ≤ batch_size ≤ n
-10⁶ ≤ X[i][j] ≤ 10⁶
-10⁶ ≤ y[i] ≤ 10⁶ (when y is provided)
If y is provided, len(y) will always equal number of rows in X
X will always be a valid 2D numpy array with consistent row lengths

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[1,2],[3,4],[5,6],[7,8],[9,10]]

y =

[1,2,3,4,5]

batch_size =

Dataset Batch Generator for Training Pipelines

Hints

Dataset Batch Generator for Training Pipelines

Hints