Loading content...
In machine learning and data preprocessing, many algorithms require numerical input and cannot directly handle categorical data. One of the most widely used techniques to transform categorical labels into a format suitable for algorithms is binary indicator encoding (also commonly referred to as one-hot encoding or dummy variable encoding).
This transformation converts each categorical value into a binary vector where exactly one element is set to 1.0 (indicating the presence of that category) and all other elements are set to 0.0. The position of the 1.0 corresponds to the integer value of the original label.
Mathematical Definition:
Given an input array x containing integer labels in the range [0, k-1] where k is the number of distinct categories, and a length n (the number of samples), the encoding produces a matrix E of dimensions n × k such that:
$$E_{ij} = \begin{cases} 1.0 & \text{if } x_i = j \ 0.0 & \text{otherwise} \end{cases}$$
Optional Column Count:
In some scenarios, you may want to specify the number of columns explicitly (e.g., when encoding a subset of data that doesn't contain all possible categories). The n_col parameter allows you to control the width of the output matrix. If not provided, the number of columns should be automatically determined as one more than the maximum value in the input array: max(x) + 1.
Your Task:
Write a Python function that converts a 1D numpy array of non-negative integer class labels into their binary indicator matrix representation. The function should:
x = np.array([0, 1, 2, 1, 0])[[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0], [0.0, 1.0, 0.0], [1.0, 0.0, 0.0]]The input array contains values 0, 1, and 2. Since the maximum value is 2, the output has 3 columns (indices 0, 1, 2).
• Element 0 → [1.0, 0.0, 0.0] (1.0 at index 0) • Element 1 → [0.0, 1.0, 0.0] (1.0 at index 1) • Element 2 → [0.0, 0.0, 1.0] (1.0 at index 2) • Element 1 → [0.0, 1.0, 0.0] (1.0 at index 1) • Element 0 → [1.0, 0.0, 0.0] (1.0 at index 0)
Each label is converted to a row vector where only the position matching the label value contains 1.0.
x = np.array([0, 1, 2])
n_col = 4[[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0]]Although the maximum value in the input is 2, we explicitly request 4 columns.
• Element 0 → [1.0, 0.0, 0.0, 0.0] (1.0 at index 0) • Element 1 → [0.0, 1.0, 0.0, 0.0] (1.0 at index 1) • Element 2 → [0.0, 0.0, 1.0, 0.0] (1.0 at index 2)
The 4th column (index 3) remains all zeros since no input has value 3. This is useful when you know there should be more categories than present in this particular batch.
x = np.array([2])
n_col = 5[[0.0, 0.0, 1.0, 0.0, 0.0]]A single element with value 2, encoded into a 5-column format:
• Element 2 → [0.0, 0.0, 1.0, 0.0, 0.0] (1.0 at index 2)
The result is a 1×5 matrix with the third position (index 2) set to 1.0.
Constraints