Loading content...
Feature scaling is a critical preprocessing step in machine learning that transforms numerical features to a common scale without distorting differences in the ranges of values. Raw features often have vastly different magnitudes—for instance, age might range from 0-100 while income ranges from 0-1,000,000. This discrepancy can cause significant problems for many machine learning algorithms.
This problem focuses on implementing two of the most widely-used normalization techniques:
Standardization transforms each feature to have a mean of 0 and a standard deviation of 1. For each value in a feature column, the transformation is computed as:
$$z = \frac{x - \mu}{\sigma}$$
Where:
This technique is particularly useful when:
Min-max normalization rescales each feature to a fixed range of [0, 1], where the minimum value maps to 0 and the maximum value maps to 1. The formula is:
$$x_{scaled} = \frac{x - x_{min}}{x_{max} - x_{min}}$$
Where:
This technique is preferred when:
Write a Python function that takes a 2D NumPy array representing a dataset (where each row is a data sample and each column represents a feature). Apply both scaling techniques column-wise and return the results as two separate 2D arrays. All values should be rounded to 4 decimal places.
Important: Apply scaling independently to each feature column, computing statistics (mean, std, min, max) for each column separately.
data = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]][[[-1.2247, -1.2247], [0.0, 0.0], [1.2247, 1.2247]], [[0.0, 0.0], [0.5, 0.5], [1.0, 1.0]]]For Column 1 (values: 1, 3, 5): • Mean = 3, Std = 1.633 • Standardization: (1-3)/1.633 = -1.2247, (3-3)/1.633 = 0, (5-3)/1.633 = 1.2247 • Min = 1, Max = 5 • Min-Max: (1-1)/(5-1) = 0, (3-1)/(5-1) = 0.5, (5-1)/(5-1) = 1
For Column 2 (values: 2, 4, 6): • Mean = 4, Std = 1.633 • Standardization: (2-4)/1.633 = -1.2247, (4-4)/1.633 = 0, (6-4)/1.633 = 1.2247 • Min = 2, Max = 6 • Min-Max: (2-2)/(6-2) = 0, (4-2)/(6-2) = 0.5, (6-2)/(6-2) = 1
The first output array shows standardized values, the second shows min-max scaled values.
data = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]][[[-1.2247, -1.2247, -1.2247], [0.0, 0.0, 0.0], [1.2247, 1.2247, 1.2247]], [[0.0, 0.0, 0.0], [0.5, 0.5, 0.5], [1.0, 1.0, 1.0]]]Each column follows a uniform arithmetic progression with equal spacing. The standardization produces z-scores of approximately -1.2247, 0, and 1.2247 for each column. The min-max normalization maps the three evenly-spaced values to 0, 0.5, and 1 respectively.
data = [[-1.0, 0.0, 1.0], [2.0, 3.0, 4.0]][[[-1.0, -1.0, -1.0], [1.0, 1.0, 1.0]], [[0.0, 0.0, 0.0], [1.0, 1.0, 1.0]]]With only two samples per column, the standardization produces z-scores of exactly -1 and 1 (one standard deviation below and above the mean). The min-max normalization maps the smaller value to 0 and the larger value to 1 for each feature column.
Constraints