Unlocking Transformer Potential Beyond Semantics

The immense representational capacity of Transformer architectures has largely been focused on semantic embedding space. However, the rotation manifold employed by Rotary Positional Embeddings (RoPE), treated as a fixed, discrete structure, represents a significantly overlooked dimension of expressivity within the attention mechanism. This paper argues that by treating this rotation space as a learnable, signal-conditioned dimension, akin to the introduction of the imaginary axis in complex numbers, a new degree of freedom can be unlocked for attention-based models.

The Untapped Orthogonal Dimension: Rotation as Dynamic Relation

Traditionally, token embeddings capture the semantic 'what' of a token. The proposed framework posits that the rotation manifold can encode the dynamic 'how'—a token's relationship across time, position, and context. This orthogonal dimension, separate from semantic meaning, offers a powerful new avenue for representation learning. The researchers introduce SIREN-RoPE as a concrete instantiation, populating this rotation dimension with continuous timestamps, cyclical temporal patterns, and categorical metadata through a dual-branch Sinusoidal Representation Network (SIREN).

Empirical Validation on Production-Scale Data

As a proof of concept, SIREN-RoPE was evaluated on a large-scale news feed dataset from a major social network, utilizing a generative recommender as the ranking model. The results demonstrate that activating this 'hidden' rotation dimension leads to consistent improvements across calibration and ranking objectives. Crucially, these gains are achieved with negligible computational overhead, suggesting a highly efficient method for enhancing Transformer performance.

Unlocking Transformer Potential Beyond Semantics

The Untapped Orthogonal Dimension: Rotation as Dynamic Relation

Related startups

Empirical Validation on Production-Scale Data

AI Daily Digest