Synchronized Data Permutation (Easy) — Practice with Code Visualizer

In machine learning workflows, maintaining the paired relationship between feature data and labels is absolutely critical during data manipulation. When training models, we often need to randomize the order of samples to prevent the model from learning spurious patterns based on sequential ordering, reduce bias from sorted data, and ensure proper batching during stochastic training processes.

The Synchronized Permutation Challenge: Given a feature matrix X of dimensions (n_samples × n_features) where each row represents a distinct data sample, and a corresponding label vector y of length n_samples where each element represents the label for the corresponding row in X, your task is to implement a function that randomly reorders both arrays while preserving the pairing between each sample (row of X) and its associated label (element of y).

Why Correspondence Matters: If we shuffle X and y independently, we would break the fundamental relationship between features and labels, effectively corrupting our dataset. For example, if sample [1, 2] has label 3, after synchronized shuffling, wherever [1, 2] appears in the new X, the value 3 must appear at the same index in the new y.

Reproducibility with Seeds: Machine learning experiments must be reproducible. By providing an optional seed parameter, the shuffling operation becomes deterministic—running the function with the same seed will always produce the identical permutation. This is essential for debugging, experiment tracking, and scientific reproducibility.

Your Task: Implement a Python function that performs synchronized random permutation of paired data arrays. The function should:

Accept a 2D numpy array X (feature matrix) and a 1D numpy array y (labels)
Accept an optional seed parameter for reproducible shuffling
Return a tuple containing the shuffled X and shuffled y with preserved sample-label correspondence
When a seed is provided, produce deterministic results

With seed=42, the random permutation indices generated are [1, 3, 0, 2]. This means:

• Original index 1 → new index 0: X[1]=[3,4] and y[1]=2 move to position 0 • Original index 3 → new index 1: X[3]=[7,8] and y[3]=4 move to position 1 • Original index 0 → new index 2: X[0]=[1,2] and y[0]=1 move to position 2 • Original index 2 → new index 3: X[2]=[5,6] and y[2]=3 move to position 3

Notice that each sample maintains its correct label: [3,4] still has label 2, [7,8] still has label 4, etc.

With seed=123 for this 3-sample dataset, the random permutation happens to generate indices [0, 1, 2], which is the original order. This is a valid shuffle outcome—random doesn't mean the order must change, it means each permutation has an equal probability. The correspondence is trivially preserved since no reordering occurred.

With seed=7, the permutation reorders the 5 samples while maintaining correspondence:

• Sample [1,0] with label 0 remains at position 0 • Sample [0,0] with label 0 moves from position 3 to position 1 • Sample [1,1] with label 1 moves from position 2 to position 2 • Sample [0,1] with label 1 moves from position 1 to position 3 • Sample [2,2] with label 1 remains at position 4

The pairing integrity is preserved throughout the shuffling operation.