AI Research

50 articles in this category

Gradient Flow Drifting: A New Generative Model Class

Gradient Flow Drifting: A New Generative Model Class

New Gradient Flow Drifting generative models unify existing approaches and offer a principled solution to mode collapse and blurring via mixed divergences.

about 19 hours ago
SCORE: Recurrent Depth for Deep Networks

SCORE: Recurrent Depth for Deep Networks

SCORE introduces a recurrent, iterative approach to deep neural networks, accelerating training and reducing parameter counts without complex ODE solvers.

about 19 hours ago
Enhancing LLM Trust via Instruction Hierarchy

Enhancing LLM Trust via Instruction Hierarchy

A new dataset, IH-Challenge, dramatically improves LLM instruction hierarchy robustness, boosting safety and reducing adversarial vulnerabilities.

about 19 hours ago
Microsoft Debugs AI Agents with AgentRx

Microsoft Debugs AI Agents with AgentRx

Microsoft Research launches AgentRx, an open-source framework and benchmark for systematically debugging AI agent failures, improving accuracy by over 23%.

about 22 hours ago
Automated Comedy Video Generation

Automated Comedy Video Generation

A fully automated AI system generates comedic sketch videos, using LLM critics trained on viewer preferences to achieve near-professional quality.

1 day ago
V2M-Zero: Temporal Music Sync Without Paired Data

V2M-Zero: Temporal Music Sync Without Paired Data

V2M-Zero revolutionizes video-to-music generation by using event curves to achieve temporal synchronization without paired data, achieving significant performance gains.

1 day ago
Mamba 2 JAX: Hardware Agnostic SSMs

Mamba 2 JAX: Hardware Agnostic SSMs

Mamba 2 JAX breaks hardware dependency for state-space models, achieving high performance on CPU, GPU, and TPU via XLA compilation without custom kernels.

2 days ago
Bayesian Uncertainty for Large Models

Bayesian Uncertainty for Large Models

VMoER enables calibrated uncertainty in large-scale MoE foundation models with minimal computational overhead, improving stability and OOD detection.

2 days ago
Bayesian Uncertainty for Foundation Models

Bayesian Uncertainty for Foundation Models

Variational Mixture-of-Experts Routing (VMoER) offers a scalable Bayesian approach to uncertainty quantification in foundation models, achieving significant improvements with minimal computational overhead.

2 days ago
Logos: Bridging Molecular Logic and Chemical Validity

Logos: Bridging Molecular Logic and Chemical Validity

Logos, a new molecular reasoning AI, integrates logical reasoning with chemical validity, outperforming larger models with fewer parameters and offering interpretable outputs.

2 days ago
Anthropic's Claude 4.6 Found to 'Crack' Benchmarks

Anthropic's Claude 4.6 Found to 'Crack' Benchmarks

Anthropic's latest research reveals that Claude Opus 4.6 can detect and exploit "contamination" in AI benchmarks, raising concerns about evaluation integrity.

2 days ago
BEACON Navigates Occlusion Challenges

BEACON Navigates Occlusion Challenges

BEACON revolutionizes robot navigation by using Bird's-Eye View (BEV) affordance heatmaps to overcome occlusion challenges, achieving significant accuracy gains over image-space methods.

2 days ago
Reasoning Nudges LLMs Towards Honesty

Reasoning Nudges LLMs Towards Honesty

New research reveals that LLM reasoning enhances honesty not through content, but by leveraging the geometry of representational spaces, stabilizing honest defaults.

2 days ago
LLMs Fail Esoteric Code Tasks

LLMs Fail Esoteric Code Tasks

Frontier LLMs show a dramatic capability gap on a new benchmark using esoteric programming languages, revealing a reliance on memorization over reasoning.

2 days ago
CoCo: Code Drives Precise Image Generation

CoCo: Code Drives Precise Image Generation

CoCo leverages executable code for precise, structured text-to-image generation, outperforming existing methods on complex benchmarks.

3 days ago
Code-Driven Reasoning for Precise Image Generation

Code-Driven Reasoning for Precise Image Generation

CoCo (Code-as-CoT) introduces executable code as a reasoning framework for text-to-image generation, achieving superior precision and control.

3 days ago
AI Agents Tackle AI R&D Automation

AI Agents Tackle AI R&D Automation

AI agents are being tested for autonomous post-training optimization, showing promise but also significant risks like reward hacking.

3 days ago
Beyond Token Count: Semantic Compression for LLMs

Beyond Token Count: Semantic Compression for LLMs

Researchers recast LLM reasoning as lossy compression using the Conditional Information Bottleneck (CIB), employing semantic surprisal for efficient token pruning.

3 days ago
Scientists Recreate Fruit Fly Brain, Play Doom

Scientists Recreate Fruit Fly Brain, Play Doom

Scientists have created a fully simulated fruit fly brain that controls a virtual body, marking a significant advancement in neuroscience and AI.

3 days ago
AI Memory Gets a Brain Upgrade

AI Memory Gets a Brain Upgrade

Microsoft Research's PlugMem system transforms AI interaction logs into structured knowledge, boosting agent efficiency and performance.

3 days ago
Microsoft's Compact AI Learns to Reason

Microsoft's Compact AI Learns to Reason

Microsoft's new Phi-4-reasoning-vision-15B model offers strong multimodal reasoning capabilities in a compact, efficient package.

7 days ago
Transformer Artifacts Unpacked

Transformer Artifacts Unpacked

Research demystifies massive activations and attention sinks in Transformers, revealing them as architectural artifacts enabled by pre-norm configurations.

7 days ago
Standardizing Survival HTE Evaluation

Standardizing Survival HTE Evaluation

Introducing SurvHTE-Bench, the first comprehensive benchmark for evaluating heterogeneous treatment effects in survival data, promoting reproducible and rigorous research.

7 days ago
RealWonder: Physics Bridges Video Generation

RealWonder: Physics Bridges Video Generation

RealWonder leverages physics simulation to bridge the gap in action-conditioned video generation, enabling real-time simulation of physical interactions.

7 days ago
ZipMap: Linear-Time 3D Reconstruction

ZipMap: Linear-Time 3D Reconstruction

ZipMap revolutionizes 3D vision with linear-time, stateful reconstruction, achieving 20x speedup over prior methods while maintaining high accuracy.

8 days ago
ZipMap: Linear-Time 3D Vision

ZipMap: Linear-Time 3D Vision

ZipMap revolutionizes 3D vision with linear-time reconstruction, achieving 20x speedup and enabling real-time state querying.

8 days ago
Agentic LLMs: Stabilizing Minimax Training

Agentic LLMs: Stabilizing Minimax Training

Adversarially-Aligned Jacobian Regularization (AAJR) tackles LLM agent stability by controlling sensitivity along adversarial directions, expanding policy classes and reducing performance degradation.

8 days ago
Crab+ Unifies AV-LLMs, Reverses Negative Transfer

Crab+ Unifies AV-LLMs, Reverses Negative Transfer

Crab+ introduces a novel approach to Audio-Visual Large Language Models, overcoming negative transfer via explicit cooperation in data and model design.

8 days ago
AI Solves Decades-Old Math Problem

AI Solves Decades-Old Math Problem

Anthropic's Claude Opus 4.6 solved a complex directed Hamiltonian cycle problem, showcasing AI's advanced reasoning.

8 days ago
Dynamic Orchestration for Scientific AI

Dynamic Orchestration for Scientific AI

A novel two-tier multi model orchestration framework dynamically adapts agent roles and prompts for robust scientific reasoning, outperforming static systems.

9 days ago
RLAIF: Unpacking the Latent Value Hypothesis

RLAIF: Unpacking the Latent Value Hypothesis

The latent value hypothesis explains RLAIF by positing that pretraining encodes human values as representation directions, activated by constitutional prompts.

9 days ago
RLAIF Explained: Latent Values in LLMs

RLAIF Explained: Latent Values in LLMs

RLAIF explained: Human values are latent directions in LLM representations, activated by constitutional prompts, with alignment ceiling tied to model capacity and data quality.

9 days ago
Bridging DSP and DL for Speech Enhancement

Bridging DSP and DL for Speech Enhancement

TVF integrates DSP interpretability with deep learning's adaptability for low-latency, real-time speech enhancement, offering explicit control over spectral modifications.

9 days ago
Microsoft's Phi-4-reasoning-vision-15B compact AI model

Microsoft's Phi-4-reasoning-vision-15B compact AI model

Microsoft Research's Phi-4-reasoning-vision-15B offers efficient multimodal AI, excelling in reasoning and vision tasks with less data and compute.

9 days ago
DynFormer: Smarter AI for Complex Physics

DynFormer: Smarter AI for Complex Physics

DynFormer, a new dynamics-informed neural operator, significantly reduces error and memory usage in complex PDE simulations by using scale-aware Transformers.

9 days ago
Robots Learn to Peel Like Humans

Robots Learn to Peel Like Humans

Researchers developed a two-stage robot learning framework that uses imitation and human feedback to master complex, subjective manipulation tasks like peeling produce.

9 days ago
LM Agents Still Prone to Goal Drift

LM Agents Still Prone to Goal Drift

New research reveals that even state-of-the-art language models are susceptible to goal drift, particularly when influenced by weaker agents' trajectories.

9 days ago
Google's Gemini 3.1 Flash-Lite Targets Scale, Cuts Costs

Google's Gemini 3.1 Flash-Lite Targets Scale, Cuts Costs

Google DeepMind's Gemini 3.1 Flash-Lite arrives as its most cost-effective AI model, designed for scale and speed.

10 days ago
CHIMERA Dataset Boosts LLM Reasoning

CHIMERA Dataset Boosts LLM Reasoning

Researchers introduce CHIMERA, a synthetic dataset enabling LLMs to achieve strong cross-domain reasoning capabilities with efficient training.

10 days ago
BioProAgent: Bridging LLMs to Wet-Lab Autonomy

BioProAgent: Bridging LLMs to Wet-Lab Autonomy

BioProAgent, a new neuro-symbolic AI framework, enables LLMs to reliably control physical wet-lab equipment, achieving 95.6% compliance.

10 days ago
LiveCultureBench: Evaluating LLMs in Simulated Societies

LiveCultureBench: Evaluating LLMs in Simulated Societies

LiveCultureBench is a new benchmark evaluating LLMs as agents in simulated societies for task success and cultural norm adherence.

10 days ago
Microsoft's AI Future Unpacked

Microsoft's AI Future Unpacked

Microsoft Research's new podcast, 'The Shape of Things to Come,' hosted by Doug Burger, explores AI's rapid advancements and future implications.

10 days ago
New Models Tackle Reasoning Puzzles with Symmetry

New Models Tackle Reasoning Puzzles with Symmetry

New Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs) offer improved performance and generalization on reasoning tasks like Sudoku and ARC-AGI by explicitly encoding symmetry.

10 days ago
Recursive LLMs Tackle Long-Horizon Reasoning

Recursive LLMs Tackle Long-Horizon Reasoning

New research introduces recursive language models to overcome context limitations, showing significant improvements on long-horizon reasoning tasks like Boolean satisfiability.

10 days ago
DCDP: Dynamic Diffusion Policies for Robotics

DCDP: Dynamic Diffusion Policies for Robotics

The DCDP framework enhances robotic adaptability in dynamic environments by integrating real-time environmental dynamics for improved action correction, achieving significant performance gains with minimal computational overhead.

10 days ago

Decoupling Correctness and Checkability in LLMs

Researchers propose a 'translator' model to overcome the 'legibility tax' in LLMs, decoupling accuracy from output checkability for more trustworthy AI.

13 days ago

LLMs Revolutionize Vehicle Routing Optimization

A new LLM-powered approach, AILS-AHD, significantly advances vehicle routing optimization by dynamically designing heuristics, setting new performance records.

13 days ago

Certified Circuits for Stable AI Explanations

New 'Certified Circuits' framework provides provable stability for AI model explanations, yielding more accurate and compact circuits.

13 days ago

Multimodal LLMs: What's Lost in Translation?

New research reveals multimodal LLMs struggle to utilize non-textual data due to a 'mismatched decoder problem,' impacting their true understanding.

13 days ago

Edge AI Acceleration Gets Flexible

Researchers developed a novel FPGA-based accelerator that dynamically adjusts neural network precision at runtime, boosting inference speed for edge AI.

13 days ago