Loading content...
Hyperdimensional Computing (HDC) is an emerging computational paradigm inspired by how the human brain processes information. At its core, HDC represents data as high-dimensional vectors (typically thousands of dimensions) called hypervectors. These hypervectors have remarkable mathematical properties that enable efficient and robust symbolic reasoning.
In this problem, you will implement a fundamental HDC operation: encoding a structured data record (row) into a single composite hypervector. This encoding preserves the semantic relationships between feature names and their values while creating a holistic representation suitable for machine learning tasks.
Each element (feature name or value) is mapped to a unique bipolar hypervector (containing only -1 or +1 values). These base hypervectors are generated deterministically using random seeds.
Binding creates a unique representation for a key-value pair. It uses element-wise multiplication:
$$\text{bound}{i} = \text{name_hv}{i} \times \text{value_hv}_{i}$$
The bound hypervector is quasi-orthogonal to both the name and value hypervectors, effectively creating a distinct "slot" for each feature-value association.
Bundling combines multiple hypervectors into a single composite representation using element-wise summation followed by bipolar normalization:
$$\text{composite}{i} = \text{sign}\left(\sum{k=1}^{K} \text{bound}^{(k)}_{i}\right)$$
Where the sign function maps non-negative values to +1 and negative values to -1.
Implement the function encode_row_hypervector(row, dim, random_seeds) that:
For each feature in the row:
random_seeds[feature_name]hash(value) + seed as the combined seed)Bundle all bound hypervectors:
Return the final composite hypervector as a numpy array of shape (dim,)
To generate a deterministic bipolar hypervector of dimension dim with a given seed:
1. Set numpy random seed to the given seed value
2. Generate dim random values from uniform distribution [0, 1)
3. Map each value to bipolar: value < 0.5 → -1, value ≥ 0.5 → +1
row = {'FeatureA': 'value1', 'FeatureB': 'value2'}
dim = 5
random_seeds = {'FeatureA': 42, 'FeatureB': 7}[1, -1, 1, -1, 1]Step-by-step encoding process:
1. Process 'FeatureA':
2. Process 'FeatureB':
3. Bundle all bound hypervectors:
Result: [1, -1, 1, -1, 1] — a compact 5-dimensional representation encoding both features and their values.
row = {'Color': 'Red'}
dim = 8
random_seeds = {'Color': 100}[1, -1, -1, -1, -1, -1, 1, 1]Single feature encoding:
With only one feature, the encoding process is straightforward:
1. Generate hypervector for "Color" (seed 100) 2. Generate hypervector for "Red" (seed = hash("Red") + 100) 3. Bind the name and value hypervectors via element-wise multiplication 4. No bundling required since there's only one bound hypervector 5. Normalize to bipolar values
The result directly captures the association between the feature "Color" and its value "Red" in an 8-dimensional hypervector.
row = {'Height': '170', 'Weight': '65', 'Age': '25'}
dim = 10
random_seeds = {'Height': 1, 'Weight': 2, 'Age': 3}[-1, 1, -1, -1, 1, -1, 1, 1, 1, -1]Multi-feature encoding with bundling:
1. Process each feature-value pair:
2. Bundle the three bound hypervectors:
3. Result: A 10-dimensional bipolar hypervector that holistically represents this data record with height 170, weight 65, and age 25.
Key insight: The bundled representation preserves approximate similarity — records with similar feature values will produce similar hypervectors, enabling efficient nearest-neighbor searches and classification.
Constraints