0/318

00:00:00

Description

Editorial

Categorical Binary Encoding

EASY10 pts

In machine learning and data preprocessing, many algorithms require numerical input and cannot directly handle categorical data. One of the most widely used techniques to transform categorical labels into a format suitable for algorithms is binary indicator encoding (also commonly referred to as one-hot encoding or dummy variable encoding).

This transformation converts each categorical value into a binary vector where exactly one element is set to 1.0 (indicating the presence of that category) and all other elements are set to 0.0. The position of the 1.0 corresponds to the integer value of the original label.

Mathematical Definition:

Given an input array x containing integer labels in the range [0, k-1] where k is the number of distinct categories, and a length n (the number of samples), the encoding produces a matrix E of dimensions n × k such that:

$$E_{ij} = \begin{cases} 1.0 & \text{if } x_i = j \ 0.0 & \text{otherwise} \end{cases}$$

Optional Column Count:

In some scenarios, you may want to specify the number of columns explicitly (e.g., when encoding a subset of data that doesn't contain all possible categories). The n_col parameter allows you to control the width of the output matrix. If not provided, the number of columns should be automatically determined as one more than the maximum value in the input array: max(x) + 1.

Your Task:

Write a Python function that converts a 1D numpy array of non-negative integer class labels into their binary indicator matrix representation. The function should:

Return a 2D numpy array where each row corresponds to an input element
Each row should be a binary vector with a single 1.0 at the index corresponding to the input value
Optionally accept a specified number of columns for the output matrix

Example

Input

x = np.array([0, 1, 2, 1, 0])

Output

[[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0], [0.0, 1.0, 0.0], [1.0, 0.0, 0.0]]

Explanation

The input array contains values 0, 1, and 2. Since the maximum value is 2, the output has 3 columns (indices 0, 1, 2).

• Element 0 → [1.0, 0.0, 0.0] (1.0 at index 0) • Element 1 → [0.0, 1.0, 0.0] (1.0 at index 1) • Element 2 → [0.0, 0.0, 1.0] (1.0 at index 2) • Element 1 → [0.0, 1.0, 0.0] (1.0 at index 1) • Element 0 → [1.0, 0.0, 0.0] (1.0 at index 0)

Each label is converted to a row vector where only the position matching the label value contains 1.0.

Example

Input

x = np.array([0, 1, 2])
n_col = 4

Output

[[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0]]

Explanation

Although the maximum value in the input is 2, we explicitly request 4 columns.

• Element 0 → [1.0, 0.0, 0.0, 0.0] (1.0 at index 0) • Element 1 → [0.0, 1.0, 0.0, 0.0] (1.0 at index 1) • Element 2 → [0.0, 0.0, 1.0, 0.0] (1.0 at index 2)

The 4th column (index 3) remains all zeros since no input has value 3. This is useful when you know there should be more categories than present in this particular batch.

Example

Input

x = np.array([2])
n_col = 5

Output

[[0.0, 0.0, 1.0, 0.0, 0.0]]

Explanation

A single element with value 2, encoded into a 5-column format:

• Element 2 → [0.0, 0.0, 1.0, 0.0, 0.0] (1.0 at index 2)

The result is a 1×5 matrix with the third position (index 2) set to 1.0.

Accepted0/0·0% Acceptance

Constraints

1 ≤ length of x ≤ 10,000
0 ≤ x[i] ≤ 1,000 for all elements in x
If n_col is provided: n_col > max(x) (to ensure valid encoding)
If n_col is not provided: the number of columns defaults to max(x) + 1
All elements in x are non-negative integers
The output matrix should contain float values (1.0 and 0.0)

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

x =

[0,1,2]

n_col =

Categorical Binary Encoding

Hints

Categorical Binary Encoding

Hints