Steering LRMs Beyond Output Degradation

A new probe-based method, FPCG, distinguishes prediction from detection features to enable precise large reasoning models steering with minimal output quality degradation.

6 min read
Abstract illustration of neural network nodes and data flow
Visualizing the internal representations of large reasoning models for controlled generation.

Deployed large reasoning models (LRMs) frequently exhibit unpredictable behaviors, a challenge that test-time steering methods have attempted to address. However, existing approaches often degrade output quality by relying on internal features that detect already generated text, rather than predicting future outcomes.

Visual TL;DR. LRM Output Degradation problem Existing Steering Methods. Existing Steering Methods flaw Detection vs Prediction. Detection vs Prediction solution Activation Probes. Activation Probes demonstrates Predicting Future Behavior. Activation Probes enables FPCG Method. FPCG Method achieves Minimal Quality Degradation.

  1. LRM Output Degradation: deployed large reasoning models often exhibit unpredictable behaviors
  2. Existing Steering Methods: rely on internal features detecting already generated text
  3. Detection vs Prediction: distinguishing features that signal existing vs future behavior
  4. Activation Probes: trained to forecast future behavior likelihoods from intermediate steps
  5. Predicting Future Behavior: probes demonstrate significant accuracy from 64% to 91%
  6. FPCG Method: future probe controlled generation enables precise steering
  7. Minimal Quality Degradation: enables precise large reasoning model steering with minimal output quality degradation
Visual TL;DR
Visual TL;DR — startuphub.ai LRM Output Degradation problem Existing Steering Methods. Existing Steering Methods flaw Detection vs Prediction. Detection vs Prediction solution Activation Probes. Activation Probes enables FPCG Method. FPCG Method achieves Minimal Quality Degradation problem flaw solution enables achieves LRM Output Degradation Existing Steering Methods Detection vs Prediction Activation Probes FPCG Method Minimal Quality Degradation From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai LRM Output Degradation problem Existing Steering Methods. Existing Steering Methods flaw Detection vs Prediction. Detection vs Prediction solution Activation Probes. Activation Probes enables FPCG Method. FPCG Method achieves Minimal Quality Degradation problem flaw solution enables achieves LRM OutputDegradation Existing SteeringMethods Detection vsPrediction Activation Probes FPCG Method Minimal QualityDegradation From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai LRM Output Degradation problem Existing Steering Methods. Existing Steering Methods flaw Detection vs Prediction. Detection vs Prediction solution Activation Probes. Activation Probes enables FPCG Method. FPCG Method achieves Minimal Quality Degradation problem flaw solution enables achieves LRM Output Degradation deployed large reasoning models oftenexhibit unpredictable behaviors Existing Steering Methods rely on internal features detectingalready generated text Detection vs Prediction distinguishing features that signalexisting vs future behavior Activation Probes trained to forecast future behaviorlikelihoods from intermediate steps FPCG Method future probe controlled generation enablesprecise steering Minimal Quality Degradation enables precise large reasoning modelsteering with minimal output qualitydegradation From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai LRM Output Degradation problem Existing Steering Methods. Existing Steering Methods flaw Detection vs Prediction. Detection vs Prediction solution Activation Probes. Activation Probes enables FPCG Method. FPCG Method achieves Minimal Quality Degradation problem flaw solution enables achieves LRM OutputDegradation deployed largereasoning modelsoften exhibit… Existing SteeringMethods rely on internalfeatures detectingalready generated… Detection vsPrediction distinguishingfeatures thatsignal existing vs… Activation Probes trained to forecastfuture behaviorlikelihoods from… FPCG Method future probecontrolledgeneration enables… Minimal QualityDegradation enables preciselarge reasoningmodel steering with… From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai LRM Output Degradation problem Existing Steering Methods. Existing Steering Methods flaw Detection vs Prediction. Detection vs Prediction solution Activation Probes. Activation Probes demonstrates Predicting Future Behavior. Activation Probes enables FPCG Method. FPCG Method achieves Minimal Quality Degradation problem flaw solution demonstrates enables achieves LRM Output Degradation deployed large reasoning models oftenexhibit unpredictable behaviors Existing Steering Methods rely on internal features detectingalready generated text Detection vs Prediction distinguishing features that signalexisting vs future behavior Activation Probes trained to forecast future behaviorlikelihoods from intermediate steps Predicting Future Behavior probes demonstrate significant accuracyfrom 64% to 91% FPCG Method future probe controlled generation enablesprecise steering Minimal Quality Degradation enables precise large reasoning modelsteering with minimal output qualitydegradation From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai LRM Output Degradation problem Existing Steering Methods. Existing Steering Methods flaw Detection vs Prediction. Detection vs Prediction solution Activation Probes. Activation Probes demonstrates Predicting Future Behavior. Activation Probes enables FPCG Method. FPCG Method achieves Minimal Quality Degradation problem flaw solution demonstrates enables achieves LRM OutputDegradation deployed largereasoning modelsoften exhibit… Existing SteeringMethods rely on internalfeatures detectingalready generated… Detection vsPrediction distinguishingfeatures thatsignal existing vs… Activation Probes trained to forecastfuture behaviorlikelihoods from… Predicting FutureBehavior probes demonstratesignificantaccuracy from 64%… FPCG Method future probecontrolledgeneration enables… Minimal QualityDegradation enables preciselarge reasoningmodel steering with… From startuphub.ai · The publishers behind this format

Unmasking Prediction Features for Control

The core innovation presented by Kortukov, Komorowski, and colleagues in their arXiv preprint lies in identifying a critical distinction between detection and prediction features within LRM hidden states. Prior steering techniques inadvertently focused on features that signal existing behavior, which proved to be poor indicators of future actions. This paper introduces activation probes trained to forecast future behavior likelihoods from intermediate reasoning steps. These probes demonstrate significant accuracy, ranging from 64% to 91%, in predicting the most probable behavior, thereby revealing a distinct set of internal prediction features.

Related startups

Future Probe Controlled Generation: Precision Steering

Building upon these newly identified prediction features, the authors propose Future Probe Controlled Generation (FPCG). This novel text-level steering method enhances control by sampling multiple candidate sentences and selecting the optimal one based on a probe's prediction of future behavior likelihood. FPCG enables precise large reasoning models steering with remarkably little degradation in output quality, a significant improvement over previous methods. Furthermore, FPCG demonstrates efficacy in steering scenarios where activation steering methods fail, underscoring its robustness and broader applicability.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.