AI Research
50 articles in this category
Gradient Flow Drifting: A New Generative Model Class
New Gradient Flow Drifting generative models unify existing approaches and offer a principled solution to mode collapse and blurring via mixed divergences.
SCORE: Recurrent Depth for Deep Networks
SCORE introduces a recurrent, iterative approach to deep neural networks, accelerating training and reducing parameter counts without complex ODE solvers.
Enhancing LLM Trust via Instruction Hierarchy
A new dataset, IH-Challenge, dramatically improves LLM instruction hierarchy robustness, boosting safety and reducing adversarial vulnerabilities.

Microsoft Debugs AI Agents with AgentRx
Microsoft Research launches AgentRx, an open-source framework and benchmark for systematically debugging AI agent failures, improving accuracy by over 23%.
Automated Comedy Video Generation
A fully automated AI system generates comedic sketch videos, using LLM critics trained on viewer preferences to achieve near-professional quality.
V2M-Zero: Temporal Music Sync Without Paired Data
V2M-Zero revolutionizes video-to-music generation by using event curves to achieve temporal synchronization without paired data, achieving significant performance gains.
Mamba 2 JAX: Hardware Agnostic SSMs
Mamba 2 JAX breaks hardware dependency for state-space models, achieving high performance on CPU, GPU, and TPU via XLA compilation without custom kernels.
Bayesian Uncertainty for Large Models
VMoER enables calibrated uncertainty in large-scale MoE foundation models with minimal computational overhead, improving stability and OOD detection.
Bayesian Uncertainty for Foundation Models
Variational Mixture-of-Experts Routing (VMoER) offers a scalable Bayesian approach to uncertainty quantification in foundation models, achieving significant improvements with minimal computational overhead.
Logos: Bridging Molecular Logic and Chemical Validity
Logos, a new molecular reasoning AI, integrates logical reasoning with chemical validity, outperforming larger models with fewer parameters and offering interpretable outputs.

Anthropic's Claude 4.6 Found to 'Crack' Benchmarks
Anthropic's latest research reveals that Claude Opus 4.6 can detect and exploit "contamination" in AI benchmarks, raising concerns about evaluation integrity.
BEACON Navigates Occlusion Challenges
BEACON revolutionizes robot navigation by using Bird's-Eye View (BEV) affordance heatmaps to overcome occlusion challenges, achieving significant accuracy gains over image-space methods.
Reasoning Nudges LLMs Towards Honesty
New research reveals that LLM reasoning enhances honesty not through content, but by leveraging the geometry of representational spaces, stabilizing honest defaults.
LLMs Fail Esoteric Code Tasks
Frontier LLMs show a dramatic capability gap on a new benchmark using esoteric programming languages, revealing a reliance on memorization over reasoning.
CoCo: Code Drives Precise Image Generation
CoCo leverages executable code for precise, structured text-to-image generation, outperforming existing methods on complex benchmarks.
Code-Driven Reasoning for Precise Image Generation
CoCo (Code-as-CoT) introduces executable code as a reasoning framework for text-to-image generation, achieving superior precision and control.
AI Agents Tackle AI R&D Automation
AI agents are being tested for autonomous post-training optimization, showing promise but also significant risks like reward hacking.
Beyond Token Count: Semantic Compression for LLMs
Researchers recast LLM reasoning as lossy compression using the Conditional Information Bottleneck (CIB), employing semantic surprisal for efficient token pruning.

Scientists Recreate Fruit Fly Brain, Play Doom
Scientists have created a fully simulated fruit fly brain that controls a virtual body, marking a significant advancement in neuroscience and AI.

AI Memory Gets a Brain Upgrade
Microsoft Research's PlugMem system transforms AI interaction logs into structured knowledge, boosting agent efficiency and performance.

Microsoft's Compact AI Learns to Reason
Microsoft's new Phi-4-reasoning-vision-15B model offers strong multimodal reasoning capabilities in a compact, efficient package.
Transformer Artifacts Unpacked
Research demystifies massive activations and attention sinks in Transformers, revealing them as architectural artifacts enabled by pre-norm configurations.
Standardizing Survival HTE Evaluation
Introducing SurvHTE-Bench, the first comprehensive benchmark for evaluating heterogeneous treatment effects in survival data, promoting reproducible and rigorous research.
RealWonder: Physics Bridges Video Generation
RealWonder leverages physics simulation to bridge the gap in action-conditioned video generation, enabling real-time simulation of physical interactions.
ZipMap: Linear-Time 3D Reconstruction
ZipMap revolutionizes 3D vision with linear-time, stateful reconstruction, achieving 20x speedup over prior methods while maintaining high accuracy.
ZipMap: Linear-Time 3D Vision
ZipMap revolutionizes 3D vision with linear-time reconstruction, achieving 20x speedup and enabling real-time state querying.
Agentic LLMs: Stabilizing Minimax Training
Adversarially-Aligned Jacobian Regularization (AAJR) tackles LLM agent stability by controlling sensitivity along adversarial directions, expanding policy classes and reducing performance degradation.
Crab+ Unifies AV-LLMs, Reverses Negative Transfer
Crab+ introduces a novel approach to Audio-Visual Large Language Models, overcoming negative transfer via explicit cooperation in data and model design.

AI Solves Decades-Old Math Problem
Anthropic's Claude Opus 4.6 solved a complex directed Hamiltonian cycle problem, showcasing AI's advanced reasoning.
Dynamic Orchestration for Scientific AI
A novel two-tier multi model orchestration framework dynamically adapts agent roles and prompts for robust scientific reasoning, outperforming static systems.
RLAIF: Unpacking the Latent Value Hypothesis
The latent value hypothesis explains RLAIF by positing that pretraining encodes human values as representation directions, activated by constitutional prompts.
RLAIF Explained: Latent Values in LLMs
RLAIF explained: Human values are latent directions in LLM representations, activated by constitutional prompts, with alignment ceiling tied to model capacity and data quality.
Bridging DSP and DL for Speech Enhancement
TVF integrates DSP interpretability with deep learning's adaptability for low-latency, real-time speech enhancement, offering explicit control over spectral modifications.

Microsoft's Phi-4-reasoning-vision-15B compact AI model
Microsoft Research's Phi-4-reasoning-vision-15B offers efficient multimodal AI, excelling in reasoning and vision tasks with less data and compute.
DynFormer: Smarter AI for Complex Physics
DynFormer, a new dynamics-informed neural operator, significantly reduces error and memory usage in complex PDE simulations by using scale-aware Transformers.
Robots Learn to Peel Like Humans
Researchers developed a two-stage robot learning framework that uses imitation and human feedback to master complex, subjective manipulation tasks like peeling produce.
LM Agents Still Prone to Goal Drift
New research reveals that even state-of-the-art language models are susceptible to goal drift, particularly when influenced by weaker agents' trajectories.

Google's Gemini 3.1 Flash-Lite Targets Scale, Cuts Costs
Google DeepMind's Gemini 3.1 Flash-Lite arrives as its most cost-effective AI model, designed for scale and speed.
CHIMERA Dataset Boosts LLM Reasoning
Researchers introduce CHIMERA, a synthetic dataset enabling LLMs to achieve strong cross-domain reasoning capabilities with efficient training.
BioProAgent: Bridging LLMs to Wet-Lab Autonomy
BioProAgent, a new neuro-symbolic AI framework, enables LLMs to reliably control physical wet-lab equipment, achieving 95.6% compliance.
LiveCultureBench: Evaluating LLMs in Simulated Societies
LiveCultureBench is a new benchmark evaluating LLMs as agents in simulated societies for task success and cultural norm adherence.

Microsoft's AI Future Unpacked
Microsoft Research's new podcast, 'The Shape of Things to Come,' hosted by Doug Burger, explores AI's rapid advancements and future implications.
New Models Tackle Reasoning Puzzles with Symmetry
New Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs) offer improved performance and generalization on reasoning tasks like Sudoku and ARC-AGI by explicitly encoding symmetry.
Recursive LLMs Tackle Long-Horizon Reasoning
New research introduces recursive language models to overcome context limitations, showing significant improvements on long-horizon reasoning tasks like Boolean satisfiability.
DCDP: Dynamic Diffusion Policies for Robotics
The DCDP framework enhances robotic adaptability in dynamic environments by integrating real-time environmental dynamics for improved action correction, achieving significant performance gains with minimal computational overhead.
Decoupling Correctness and Checkability in LLMs
Researchers propose a 'translator' model to overcome the 'legibility tax' in LLMs, decoupling accuracy from output checkability for more trustworthy AI.
LLMs Revolutionize Vehicle Routing Optimization
A new LLM-powered approach, AILS-AHD, significantly advances vehicle routing optimization by dynamically designing heuristics, setting new performance records.
Certified Circuits for Stable AI Explanations
New 'Certified Circuits' framework provides provable stability for AI model explanations, yielding more accurate and compact circuits.
Multimodal LLMs: What's Lost in Translation?
New research reveals multimodal LLMs struggle to utilize non-textual data due to a 'mismatched decoder problem,' impacting their true understanding.
Edge AI Acceleration Gets Flexible
Researchers developed a novel FPGA-based accelerator that dynamically adjusts neural network precision at runtime, boosting inference speed for edge AI.