#Reinforcement Learning

45 articles with this tag

Agent-Designing Agents Emerge
AI Research

Agent-Designing Agents Emerge

Memento-Skills introduces an agent-designing agent that autonomously creates and refines specialized LLM agents through skill evolution, bypassing core LLM retraining.

about 5 hours ago
OS-Themis: Scalable Rewards for GUI Agents
AI Research

OS-Themis: Scalable Rewards for GUI Agents

OS-Themis revolutionizes GUI agent training with a scalable, milestone-based critic framework and OGRBench, achieving significant performance uplifts.

1 day ago
OS-Themis: Scalable Rewards for Robust RL
AI Research

OS-Themis: Scalable Rewards for Robust RL

OS-Themis, a new multi-agent critic framework, revolutionizes GUI agent training by providing scalable, accurate rewards through milestone decomposition and evidence auditing.

1 day ago
Enhancing LLM Trust via Instruction Hierarchy
AI Research

Enhancing LLM Trust via Instruction Hierarchy

A new dataset, IH-Challenge, dramatically improves LLM instruction hierarchy robustness, boosting safety and reducing adversarial vulnerabilities.

9 days ago
Databricks Buys Quotient AI
Technology

Databricks Buys Quotient AI

Databricks acquires Quotient AI to enhance AI agent reliability and performance in production environments, integrating its evaluation technology into key products.

10 days ago
Databricks' KARL Cuts Agent Costs
Technology

Databricks' KARL Cuts Agent Costs

Databricks' new KARL AI agent drastically cuts costs and latency for enterprise knowledge tasks using custom reinforcement learning.

16 days ago
RLAIF: Unpacking the Latent Value Hypothesis
AI Research

RLAIF: Unpacking the Latent Value Hypothesis

The latent value hypothesis explains RLAIF by positing that pretraining encodes human values as representation directions, activated by constitutional prompts.

17 days ago
RLAIF Explained: Latent Values in LLMs
AI Research

RLAIF Explained: Latent Values in LLMs

RLAIF explained: Human values are latent directions in LLM representations, activated by constitutional prompts, with alignment ceiling tied to model capacity and data quality.

17 days ago
AI Research

AI Governance: Optimization's Normative Limits

A new paper on arXiv argues that optimization-based AI, including RLHF LLMs, are formally incapable of normative governance due to inherent structural limitations.

21 days ago
AI Agents Learn to Cooperate Without Rules
AI Research

AI Agents Learn to Cooperate Without Rules

Google researchers propose a simpler way for AI agents to cooperate: train them against diverse opponents, leveraging in-context learning to drive mutual cooperation through 'extortion' dynamics.

about 1 month ago
AI Learns Faster by Predicting the Future
AI Research

AI Learns Faster by Predicting the Future

AI learns faster with Predictive Inverse Dynamics Models (PIDMs) by forecasting future states, making imitation learning more data-efficient than traditional methods.

about 1 month ago
RL Fixes Overfitting in AI Radiology Reports
AI Research

RL Fixes Overfitting in AI Radiology Reports

Microsoft Research’s UniRG framework uses reinforcement learning guided by clinical error signals to achieve state-of-the-art performance in AI radiology reports.

about 2 months ago
Argos Framework Delivers Grounded AI Reasoning
AI Research

Argos Framework Delivers Grounded AI Reasoning

Argos is an agentic verification framework that fundamentally changes reinforcement learning by rewarding models only for Grounded AI reasoning based on verifiable evidence.

2 months ago
DeepMind, OpenAI Vets Launch humans& for Human-Centered AI
Funding Round

DeepMind, OpenAI Vets Launch humans& for Human-Centered AI

A new frontier AI lab, humans&, launched by veterans of DeepMind and OpenAI, aims to pivot the industry toward truly human-centered AI focused on collaboration and trust.

2 months ago
Uniqueness-Aware RL stops LLMs from getting lazy
AI Research

Uniqueness-Aware RL stops LLMs from getting lazy

Uniqueness-Aware RL prevents LLMs from converging on a single solution path by explicitly rewarding correct answers that employ rare problem-solving strategies.

2 months ago
Poolside’s Full-Stack Bet: Building AGI Agents from Data Centers to Code Completion
AI Video

Poolside’s Full-Stack Bet: Building AGI Agents from Data Centers to Code Completion

3 months ago
ChatGPT prompt injection is so bad they built an AI attacker
Technology

ChatGPT prompt injection is so bad they built an AI attacker

3 months ago
LLM Agent Reinforcement Learning Gets Practical
AI Research

LLM Agent Reinforcement Learning Gets Practical

3 months ago
OpenAI Unveils Agent RFT: Revolutionizing AI with Self-Improving Tool-Using Models
AI Video

OpenAI Unveils Agent RFT: Revolutionizing AI with Self-Improving Tool-Using Models

3 months ago
AI Research

AI Model Confessions: A New Honesty Layer

4 months ago
NVIDIA Autonomous Driving AI Gains Human-Like Reasoning
AI Research

NVIDIA Autonomous Driving AI Gains Human-Like Reasoning

4 months ago
DR Tulu deep research: Open AI closes proprietary gap
AI Research

DR Tulu deep research: Open AI closes proprietary gap

4 months ago
AlphaProof system proves its worth at the Math Olympiad
AI Research

AlphaProof system proves its worth at the Math Olympiad

4 months ago
OpenAI’s Agent RFT: Boosting Autonomous AI Performance Through Tailored Reinforcement Learning
AI Video

OpenAI’s Agent RFT: Boosting Autonomous AI Performance Through Tailored Reinforcement Learning

4 months ago
Kimi Linear promises to beat full attention with less memory
AI Research

Kimi Linear promises to beat full attention with less memory

5 months ago
Microsoft's ECHO Language Model learns from failure
AI Research

Microsoft's ECHO Language Model learns from failure

5 months ago
Cognition's New Bet on Fast Context Retrieval
AI Research

Cognition's New Bet on Fast Context Retrieval

5 months ago
Reinforcement Fine-Tuning: Osmosis AI fine-tunes agents past FMs
Startup News

Reinforcement Fine-Tuning: Osmosis AI fine-tunes agents past FMs

5 months ago
LoRA vs full fine-tuning: The debate is over
Startup News

LoRA vs full fine-tuning: The debate is over

6 months ago
New Fast GPU Weight Transfer Syncs Trillion-Parameter AI in 1.3s
Funding Round

New Fast GPU Weight Transfer Syncs Trillion-Parameter AI in 1.3s

Updating the brains of a massive AI model used to be a sluggish affair, often taking minutes to sync new knowledge from a training cluster to a live inference c...

6 months ago
New Fast GPU Weight Transfer Syncs Trillion-Parameter AI in 1.3s
Funding Round

New Fast GPU Weight Transfer Syncs Trillion-Parameter AI in 1.3s

Updating the brains of a massive AI model used to be a sluggish affair, often taking minutes to sync new knowledge from a training cluster to a live inference c...

6 months ago
OpenAI's Leap Towards Reasoning and Automated Discovery with GPT-5
AI Video

OpenAI's Leap Towards Reasoning and Automated Discovery with GPT-5

6 months ago
AI's Dual Realities: Hallucinations, Augmentation, and the Micro-Model Frontier
AI Video

AI's Dual Realities: Hallucinations, Augmentation, and the Micro-Model Frontier

6 months ago
Reinforcement Fine-Tuning: Elevating AI Reasoning with Grader-Driven Optimization
AI Video

Reinforcement Fine-Tuning: Elevating AI Reasoning with Grader-Driven Optimization

6 months ago
OpenAI's Hallucination Breakthrough: A Feature, Not a Bug, and How to Fix It
AI Video

OpenAI's Hallucination Breakthrough: A Feature, Not a Bug, and How to Fix It

6 months ago
OpenAI’s Open-Weight GPT-OSS Challenges AI Landscape
AI Video

OpenAI’s Open-Weight GPT-OSS Challenges AI Landscape

8 months ago
OpenAI’s AI Achieves Gold at International Math Olympiad, Unveiling Path to General Reasoning
AI Video

OpenAI’s AI Achieves Gold at International Math Olympiad, Unveiling Path to General Reasoning

8 months ago
AI's Predictable Ascent: Scaling Laws Reshape the Path to Human-Level Intelligence
AI Video

AI's Predictable Ascent: Scaling Laws Reshape the Path to Human-Level Intelligence

8 months ago
DeepSeek's Reasoning Leap Reshapes AI Scaling Paradigms
AI Video

DeepSeek's Reasoning Leap Reshapes AI Scaling Paradigms

8 months ago
OpenAI’s New ChatGPT Agent Unifies AI Capabilities
AI Video

OpenAI’s New ChatGPT Agent Unifies AI Capabilities

8 months ago
CollabLLM: Microsoft Boosts LLM AI Collaboration
Artificial Intelligence

CollabLLM: Microsoft Boosts LLM AI Collaboration

8 months ago
Waymo's AI Shift to Generative Learning for Autonomous Adaptation
Artificial Intelligence

Waymo's AI Shift to Generative Learning for Autonomous Adaptation

9 months ago
Mistral AI's New Reasoning LLMs and Over $1 Billion in Funding
Funding Round

Mistral AI's New Reasoning LLMs and Over $1 Billion in Funding

9 months ago
Aampe Has $18M to Deploy 100 Million AI Agents with Reinforcement Learning
Funding Round

Aampe Has $18M to Deploy 100 Million AI Agents with Reinforcement Learning

<p>Their agents are managing on the order of 15-200 billion decisions every week that determine product surface interactions.</p><p>Each AI agent learns and adapts in real time, helping their user manage their attention and make complex choices in a world of material and content abundance.</p>

over 1 year ago
ETH Zurich Creates Deep Reinforcement Learning Based Robot that Plays Labyrinth Marble Game
Press Release

ETH Zurich Creates Deep Reinforcement Learning Based Robot that Plays Labyrinth Marble Game

over 2 years ago