#AI Agents
50 articles with this tag

Validating AI Agents: Beyond Rigid Tests
GitHub Blog explores how to validate AI agent behavior when correctness isn't deterministic, moving beyond rigid scripts to focus on essential outcomes.
OpenAI: AI Advantage Compounds for Frontier Firms
OpenAI's B2B Signals report shows frontier firms are pulling ahead with deeper, more complex AI use and agentic workflows, not just higher message volume.

Superlinked's Filip Makraduli on Small Model Inference Infrastructure
Filip Makraduli of Superlinked discusses the critical need for robust small model inference infrastructure, highlighting Superlinked's open-source solution.

Anthropic Unleashes Finance AI Agents
Anthropic launches ten new AI agent templates for financial services, integrating with Microsoft 365 and expanding data access to automate key workflows.
OpenAI, PwC Team on Finance AI Agents
OpenAI and PwC are collaborating to build AI agents for finance departments, using OpenAI's internal finance team as a testbed for new automation and decision-making tools.

AI Agents: CLI vs. MCP for Tool Selection
IBM's Martin Keen explains how AI agents use CLI commands or the more descriptive Model Context Protocol (MCP) to select and utilize tools, highlighting the benefits of structured data for AI.

AI Engineers: Context is the New Code
Patrick Debois outlines the 'Context Development Lifecycle' for AI agents, emphasizing that 'context is the new code' and detailing the process from generation to observation.

AI Agent: From Simple Setup to Life OS
Radek Sienkiewicz of VelvetShark details the evolution of his personal AI agent, from a simple tool to a life-managing infrastructure, highlighting key principles for builders.

Chatbase CEO on AI Chatbots and Bootstrapping Growth
Chatbase CEO Yasser Elsaid shares his playbook for bootstrapping an AI agent business to $1M ARR, emphasizing self-serve, content marketing, and iterative product development.

AI Agents are Taking Over More Than Just Coding
AI agents are increasingly handling tasks beyond coding, from design to research and data management, boosting productivity and enabling 'tiny teams'.

Cursor's Agent Harness Gets Smarter
Cursor is meticulously refining its AI agent harness, focusing on dynamic context, rigorous evaluation, and model-specific customization to boost software development capabilities.

Stripe Unleashes AI Agents for Commerce
Stripe is expanding its Agentic Commerce Suite and forging partnerships with Meta and Google to enable businesses to sell directly through AI agents.

AI Agents Failures & How To Stop Them
Danilo Campagna from Posthog discusses common LLM code generation failures and strategies for improvement, focusing on context, architecture, and human error.

Cursor's AI Agents Get Worktree Boost
David Gomes of Cursor detailed the integration of Git worktrees into AI agents, enabling isolated task execution and reducing code complexity.

Sakana AI, SMBC Automate Deal Proposals
Sakana AI and SMBC Group have launched an AI application that uses multiple agents to automate proposal generation for wholesale banking clients.

OpenAI Builds Workspace Agents for Seamless Teamwork
OpenAI's latest workspace agents can automate complex tasks, integrate with tools, and boost team productivity. Learn about the 'Meeting Prep' and 'Software Review' agents.

AI Agents Are Changing E-commerce
AI agents are reshaping e-commerce by automating product research and purchases, offering efficiency gains and personalized experiences for consumers.
Stripe Projects Powers AI Database Creation
Databricks partners with Stripe Projects to enable AI agents to autonomously provision Neon Postgres databases, streamlining app development.
AI Agents Need a New Foundation
AI agents are ready, but most enterprise architectures aren't. Databricks argues for a foundational shift to transactional data infrastructure for true AI value.

Cursor SDK Lets Developers Build Agents
Cursor's new SDK allows developers to build and deploy custom AI coding agents, abstracting infrastructure complexity and offering flexible deployment options.
Public Sector AI Fraud Fight Gets Real
Public sector agencies are adopting AI to combat rising fraud, but require integrated data, governance, and workflows for effective implementation.
Healthcare Data: From Months to Minutes
Databricks and Redox cut clinical data integration times from months to minutes with natural language prompts and subsecond data streaming.

AWS and OpenAI Forge Deeper AI Partnership
AWS and OpenAI are deepening their AI collaboration, integrating OpenAI's frontier models into AWS Bedrock and launching new managed agent capabilities, signaling a major push to meet enterprise demand for advanced AI.

OpenAI Models Hit AWS Bedrock
AWS and OpenAI expand partnership, bringing OpenAI's latest models and Codex to Amazon Bedrock, alongside new managed agent capabilities.

Building Better AI Agents: The Eval Platform Challenge
Phil Hetzel of Braintrust discusses the challenges and best practices for building effective evaluation platforms for AI agents, emphasizing a systems-level approach.

AI Agents Lack Identity, Risking Enterprise Trust
Enterprises are struggling with the AI agent identity problem, a critical gap in governance and accountability that hinders trust and adoption.

AI Agents Need Human Oversight, Not Just Code
Experts Steven Sinofsky, Martin Casado, and Aaron Levie discuss the complexities of AI agent integration in enterprises, emphasizing the need for human oversight and adaptable solutions.

Kane CLI bridges AI code to verified browser actions
TestMu AI launches Kane CLI, a terminal-based tool enabling AI agents and developers to verify browser automation, aiming to close the loop between code generation and execution.
Killing the Builder's Tax for AI Apps
Tech leaders are cutting development costs and speeding up AI deployment by unifying data and applications on a single platform.
Choco Supercharges Food Supply Chains with OpenAI
Choco taps OpenAI's AI to automate millions of food orders, slashing manual work and boosting efficiency across global supply chains.

AI Agents: Beyond Chatbots with Open Source
Cedric Clyburn from Red Hat explores the evolution from chatbots to AI agents, detailing the architecture of OpenCopilot and its security implications.
Agentic AI Needs Smarter Guardrails
LangGuard's agentic workflow governance engine, powered by Databricks Lakebase, provides critical runtime control for enterprise AI deployments.

Matt Carey on AI Agents and Cloudflare's API
AI Engineer Matt Carey discusses how APIs are empowering AI agents, moving towards shared services and the future of Cloudflare's MCP.

AgentCraft: Gaming the AI Agent Workflow
Ido Salomon unveils AgentCraft, a platform that visualizes AI agent workflows using game-like interfaces, fostering human-AI collaboration and task management.

IBM Experts on Building With AI Agents
IBM experts Katie McDonald and Brianne Zavala discuss the strategic choices for implementing AI agents: build, reuse, or hybrid. They stress the importance of orchestration.

Google Cloud's AI Vision: Full Stack Approach to Agents
Google Cloud's Riyaz Habibibhai details the company's full-stack AI strategy, focusing on interoperability, governance, and simplified adoption of AI agents.

OpenAI Agents Automate Weekly Metrics Reporting
OpenAI demonstrates how its agents, like 'Tally', can automate weekly product metrics reporting by integrating with Google Sheets and ChatGPT.

OpenAI's Trove Agent Automates Third-Party Risk Analysis
OpenAI unveils Trove, a custom GPT agent that automates third-party risk analysis using web search, Google Drive, and Google Docs.
OpenAI Unveils Codex: Your AI Work Assistant
OpenAI introduces Codex, an AI agent designed to execute delegated tasks, integrate tools, and automate workflows, differentiating itself from ChatGPT.
OpenAI Unveils GPT-5.5
OpenAI launches GPT-5.5, boasting enhanced intelligence, autonomy, and speed for complex tasks, alongside advanced safety features.

OpenAI's GPT-5.5: A New Era of AI Agents
OpenAI's GPT-5.5 demo reveals an AI agent capable of complex, multi-application tasks, from coding to presentation creation.

Glif Raises Seed Funding
a16z leads seed funding for Glif, an AI creative super agent designed to unify fragmented generative AI tools into a single, directorial interface.

Kitze's AI Agent Journey: From "Life OS" to "Wolf"
AI creator Kitze reflects on his decade-long quest for a 'Life OS' powered by AI agents, detailing his journey from early to-do apps to self-hosted solutions and his vision for the future of human-computer interaction.

Garry Tan on Building with AI Agents
Garry Tan, CEO of Y Combinator, discusses the emerging era of AI agents in software development, showcasing his G stack tool and the 'conductor pattern' for efficient, AI-assisted development.

OpenAI's Slate Agent Streamlines Software Review
OpenAI introduces Slate, an AI agent designed to streamline software reviews and reduce tool sprawl in businesses.
Databricks Activates Documents with AI Agents
Databricks introduces a multi-agent workflow using AI/BI Genie and Agent Bricks to automate document data extraction and activation.

OpenAI Agents Automate Weekly Product Metrics Reporting
OpenAI agents can now automate the generation of weekly product metrics reports, integrating with Google Drive and using custom skills developed with ChatGPT.

OpenAI Unveils 'Trove': AI for Third-Party Risk Analysis
OpenAI unveils Trove, a custom ChatGPT agent designed to automate third-party risk analysis, streamlining vendor due diligence with AI-powered reporting.

OpenAI's Spark Automates Sales Tasks
OpenAI's Spark allows users to build custom AI agents, demonstrated with a sales agent automating lead research, outreach, and follow-ups.

ChatGPT Agents Automate Product Feedback Triage
Nikhil Vastravasi demonstrates how to build a 'Scout' ChatGPT agent to automate product feedback analysis and issue routing into Linear.