Loading problem...
In modern sequence-to-sequence models like Transformers, understanding the order of tokens in a sequence is crucial for tasks like language understanding, translation, and generation. Unlike recurrent networks that inherently process sequences step-by-step, Transformers process all tokens simultaneously through self-attention mechanisms. This creates a fundamental challenge: how does the model know which position each token occupies in the sequence?
The solution is sinusoidal position embeddings—a deterministic encoding scheme that injects positional information into the model without requiring any learned parameters. This technique was introduced in the seminal "Attention Is All You Need" paper and remains foundational to understanding modern AI architectures.
Position embeddings assign a unique, continuous vector to each position in a sequence. The sinusoidal approach uses sine and cosine functions at varying frequencies to create these vectors. The mathematical elegance lies in the fact that:
For a position pos in the sequence and dimension index i within the embedding vector of size d_model, the encoding is computed as:
$$PE_{(pos, 2i)} = \sin\left(\frac{pos}{10000^{2i/d_{model}}}\right)$$
$$PE_{(pos, 2i+1)} = \cos\left(\frac{pos}{10000^{2i/d_{model}}}\right)$$
Where:
Implement a function that generates sinusoidal position embeddings for a given sequence length and embedding dimensionality. Your function should:
This implementation forms the backbone of positional understanding in transformer-based models and is essential knowledge for anyone working with modern NLP and AI systems.
sequence_length = 2, embedding_dim = 8[[[0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0], [0.84130859375, 0.54052734375, 0.099853515625, 0.9951171875, 0.01000213623046875, 1.0, 0.0010004043579101562, 1.0]]]For an 8-dimensional embedding with 2 positions:
Position 0 (all zeros for sine, all ones for cosine):
Position 1 (varying frequencies):
The output shape is (1, 2, 8), representing a batch of 1 with 2 positions and 8 embedding dimensions.
sequence_length = 3, embedding_dim = 4[[[0.0, 1.0, 0.0, 1.0], [0.84130859375, 0.54052734375, 0.01000213623046875, 1.0], [0.9091796875, -0.416259765625, 0.0200042724609375, 1.0]]]For a 4-dimensional embedding with 3 positions:
Position 0: [sin(0), cos(0), sin(0), cos(0)] = [0.0, 1.0, 0.0, 1.0]
Position 1:
Position 2:
Notice how higher frequency components (indices 0,1) change rapidly between positions while lower frequency components (indices 2,3) change slowly. This multi-scale encoding helps the model distinguish both nearby and distant positions.
sequence_length = 0, embedding_dim = 8-1When sequence_length is 0, there are no positions to encode. The function returns -1 to indicate invalid input, as generating embeddings for zero positions is meaningless and undefined.
Constraints