The immense representational capacity of Transformer architectures has largely been focused on semantic embedding space. However, the rotation manifold employed by Rotary Positional Embeddings (RoPE), treated as a fixed, discrete structure, represents a significantly overlooked dimension of expressivity within the attention mechanism. This paper argues that by treating this rotation space as a learnable, signal-conditioned dimension, akin to the introduction of the imaginary axis in complex numbers, a new degree of freedom can be unlocked for attention-based models.
The Untapped Orthogonal Dimension: Rotation as Dynamic Relation
Traditionally, token embeddings capture the semantic 'what' of a token. The proposed framework posits that the rotation manifold can encode the dynamic 'how'—a token's relationship across time, position, and context. This orthogonal dimension, separate from semantic meaning, offers a powerful new avenue for representation learning. The researchers introduce SIREN-RoPE as a concrete instantiation, populating this rotation dimension with continuous timestamps, cyclical temporal patterns, and categorical metadata through a dual-branch Sinusoidal Representation Network (SIREN).