#LLM Agents

13 articles with this tag

Self-Orchestration Outperforms External Frameworks
AI Research

Self-Orchestration Outperforms External Frameworks

New research reveals frontier LLMs' self-orchestration capabilities surpass external agent frameworks for procedural tasks, leading to higher quality and fewer failures. A key agent orchestration frameworks comparison.

3 days ago
Beyond Text: Rethinking Docs for AI Agents
AI Research

Beyond Text: Rethinking Docs for AI Agents

The OBJECTGRAPH file format redefines documents as traversable knowledge graphs, slashing token usage for AI agents while maintaining accuracy.

3 days ago
CARE: Disciplined LLM Agent Engineering
AI Research

CARE: Disciplined LLM Agent Engineering

CARE introduces a disciplined, artifact-driven methodology for LLM agent engineering in scientific domains, enhancing efficiency and performance.

4 days ago
Workflow Agents Lag Behind Demand
AI Research

Workflow Agents Lag Behind Demand

New Claw-Eval-Live benchmark reveals LLM agents struggle with dynamic workflows and verifiable execution, with top models failing over a third of tasks.

4 days ago
ClawGuard Secures LLM Agents
AI Research

ClawGuard Secures LLM Agents

ClawGuard offers a deterministic runtime security framework to prevent indirect prompt injection in LLM agents by enforcing user-confirmed rules at tool-call boundaries.

21 days ago
Claude's Corner: Rubric AI — The Agent Reliability Layer Every Vertical AI Company Needs
Claude's Corner

Claude's Corner: Rubric AI — The Agent Reliability Layer Every Vertical AI Company Needs

Rubric AI (YC W2026) builds runtime reasoning infrastructure for vertical AI agents — turning expert judgment into training signals and runtime guidance. Deep technical breakdown, difficulty score, and moat analysis.

24 days ago
Agentic RLHF Needs New Benchmarks
AI Research

Agentic RLHF Needs New Benchmarks

New benchmark Plan-RewardBench reveals current RMs struggle with agentic tool use and long-horizon tasks, highlighting the need for specialized trajectory-level reward modeling.

24 days ago
LLMs Learn to Play Tic-Tac-Toe with Reinforcement Learning
Artificial Intelligence

LLMs Learn to Play Tic-Tac-Toe with Reinforcement Learning

Stefano Fiorucci discusses the power of reinforcement learning for training LLMs, showcasing Tic-Tac-Toe as a case study for building interactive environments and improving model capabilities.

28 days ago
Agent-Designing Agents Emerge
AI Research

Agent-Designing Agents Emerge

Memento-Skills introduces an agent-designing agent that autonomously creates and refines specialized LLM agents through skill evolution, bypassing core LLM retraining.

about 2 months ago
AgentFactory: Executable Code for LLM Agents
AI Research

AgentFactory: Executable Code for LLM Agents

AgentFactory revolutionizes LLM agent self-evolution by creating executable Python subagents, fostering continuous learning and reducing task execution effort.

about 2 months ago
OpenSearch Democratizes Frontier LLM Search
AI Research

OpenSearch Democratizes Frontier LLM Search

OpenSeeker, a fully open-source search agent, breaks LLM search data scarcity with novel synthesis techniques, achieving state-of-the-art performance.

about 2 months ago
Pydantic AI's Samuel Colvin on Building Better LLM Agents
Artificial Intelligence

Pydantic AI's Samuel Colvin on Building Better LLM Agents

Pydantic AI founder Samuel Colvin discusses building LLM agents, highlighting type safety, code execution environments, and the future of AI tooling.

about 2 months ago
LiveCultureBench: Evaluating LLMs in Simulated Societies
AI Research

LiveCultureBench: Evaluating LLMs in Simulated Societies

LiveCultureBench is a new benchmark evaluating LLMs as agents in simulated societies for task success and cultural norm adherence.

2 months ago