#Attention Mechanisms

2 articles with this tag

Unlocking Transformer Potential Beyond Semantics

Researchers propose SIREN-RoPE, unlocking a novel 'rotation space' in Transformers for dynamic relational encoding, yielding consistent performance gains with minimal overhead.

11 days ago

AI Research

MoDA: Unlocking LLM Depth Scaling

Mixture-of-Depths Attention (MoDA) tackles LLM signal degradation by enabling cross-layer attention, boosting performance with minimal overhead.

about 2 months ago