#LLM

50 articles with this tag

GitHub Cuts Agentic Workflow Costs
Technology

GitHub Cuts Agentic Workflow Costs

GitHub implements new strategies to cut token costs in its automated agentic workflows by enhancing logging and optimizing tool usage.

about 3 hours ago
OpenAI's New Voice API Models
Artificial Intelligence

OpenAI's New Voice API Models

OpenAI introduces GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper to its API, enhancing voice intelligence for developers.

about 7 hours ago
Parloa AI Agents Mimic Human Service
Artificial Intelligence

Parloa AI Agents Mimic Human Service

Parloa's AI Agent Management Platform uses OpenAI models to build, simulate, and deploy voice-driven customer service agents, prioritizing real-world performance and reliability.

about 13 hours ago
Uber Taps OpenAI for Smarter Driving, Faster Booking
Artificial Intelligence

Uber Taps OpenAI for Smarter Driving, Faster Booking

Uber integrates OpenAI models to boost driver earnings with an AI assistant and enhance rider experiences through faster booking and new voice features.

1 day ago
Automating Multi-Agent System Creation
AI Research

Automating Multi-Agent System Creation

A new framework automates the creation of multi-agent systems, significantly improving agent recall and system robustness through LLM-driven planning and a critique agent.

1 day ago
Superlinked's Filip Makraduli on Small Model Inference Infrastructure
Artificial Intelligence

Superlinked's Filip Makraduli on Small Model Inference Infrastructure

Filip Makraduli of Superlinked discusses the critical need for robust small model inference infrastructure, highlighting Superlinked's open-source solution.

2 days ago
Google DeepMind Accelerates AI on Edge Devices
AI Research

Google DeepMind Accelerates AI on Edge Devices

Google DeepMind unveils Gemma 4 models and the LiteRT framework to accelerate AI on edge devices, emphasizing performance, privacy, and cross-platform capabilities.

2 days ago
RAG's Evolution: From Keywords to Agentic AI
Artificial Intelligence

RAG's Evolution: From Keywords to Agentic AI

Explore the evolution of Retrieval Augmented Generation (RAG) from basic keyword search to sophisticated agentic AI systems.

3 days ago
Claude's Corner: Sonarly — Your On-Call Engineer Just Called In Sick (Permanently)
Claude's Corner

Claude's Corner: Sonarly — Your On-Call Engineer Just Called In Sick (Permanently)

Sonarly is an autonomous AI agent that triages production alerts, finds root causes with 78% accuracy, and opens fix PRs—while your on-call engineer sleeps.

5 days ago
Claude's Corner: Compresr — The Token Accountant Your AI Stack Desperately Needs
Claude's Corner

Claude's Corner: Compresr — The Token Accountant Your AI Stack Desperately Needs

Four EPFL researchers built a PhD-backed LLM context compression API that could cut your token bill by 10x — or get eaten alive by Anthropic. Here's the technical breakdown and how to build your own.

7 days ago
IBM Experts on AI Training: Efficiency vs. Scale
Artificial Intelligence

IBM Experts on AI Training: Efficiency vs. Scale

IBM's Marina Danilevsky and Gabe Goodhart discuss the company's new 'Bob' and 'Granite' AI models, highlighting the shift towards specialized, efficient training and the challenges of distributed AI infrastructure.

7 days ago
AI Agents on the Loose: Network Security Risks Emerge
AI Research

AI Agents on the Loose: Network Security Risks Emerge

Microsoft Research reveals how AI agents interacting at scale create new security risks like worms, reputation manipulation, and invisible attacks.

7 days ago
Cross-Architecture dLLM Distillation
AI Research

Cross-Architecture dLLM Distillation

TIDE framework enables cross-architecture distillation for diffusion large language models, achieving significant performance gains with smaller student models.

7 days ago
Cursor's Agent Harness Gets Smarter
Technology

Cursor's Agent Harness Gets Smarter

Cursor is meticulously refining its AI agent harness, focusing on dynamic context, rigorous evaluation, and model-specific customization to boost software development capabilities.

7 days ago
AI Agents Failures & How To Stop Them
Artificial Intelligence

AI Agents Failures & How To Stop Them

Danilo Campagna from Posthog discusses common LLM code generation failures and strategies for improvement, focusing on context, architecture, and human error.

7 days ago
OpenAI's Goblin Problem
Artificial Intelligence

OpenAI's Goblin Problem

OpenAI's GPT-5.1 models developed a peculiar "goblin problem" due to training for a "Nerdy" personality, leading to unexpected creature metaphors.

8 days ago
DeepSeek V4 Pro Hits Together AI
Technology

DeepSeek V4 Pro Hits Together AI

Together AI launches DeepSeek V4 Pro, a 1.6T MoE model with a 512K context window and new cached input pricing for cost-effective long-context reasoning.

8 days ago
Databricks GPT-5.5 Outperforms GPT-4 on OfficeQA Benchmark
AI Research

Databricks GPT-5.5 Outperforms GPT-4 on OfficeQA Benchmark

Databricks Research Engineer Arnav Singhvi reveals GPT-5.5, a new AI model achieving state-of-the-art results on the OfficeQA benchmark and outperforming GPT-4.

8 days ago
AI Engineer: Small Models, Big Impact
Artificial Intelligence

AI Engineer: Small Models, Big Impact

Maxime Labonne of Liquid AI discusses the unique challenges and advantages of small AI models, detailing their architecture, training, and techniques to overcome issues like doom looping.

9 days ago
Open Source AI: Boon or Bane for Security?
Artificial Intelligence

Open Source AI: Boon or Bane for Security?

IBM's Martin Keen and Gabe Goodhart discuss the security implications of open-source AI, balancing innovation with risk.

9 days ago
Together AI Slashes RL Training Time
Technology

Together AI Slashes RL Training Time

Together AI's new distribution-aware speculative decoding slashes RL training time by up to 50%, tackling a major bottleneck in LLM post-training.

13 days ago
Matt Pocock on LLM Planning: "Don't Bite Off More Than You Can Chew"
Artificial Intelligence

Matt Pocock on LLM Planning: "Don't Bite Off More Than You Can Chew"

Matt Pocock, AI expert, shares insights on effective LLM planning, highlighting the 'smart zone' vs. 'dumb zone' and the power of multi-phase plans with the 'grill-me' skill.

13 days ago
Verifiable Reasoning in MLLMs
AI Research

Verifiable Reasoning in MLLMs

The V-tableR1 framework enables verifiable, multi-step reasoning in MLLMs by grounding logic in visual data, achieving SOTA on tabular benchmarks.

14 days ago
LLM Agents Tackle Database Joins
Technology

LLM Agents Tackle Database Joins

Databricks tests LLM agents for SQL join order optimization, achieving significant performance gains over traditional methods.

15 days ago
Databricks Activates Documents with AI Agents
Technology

Databricks Activates Documents with AI Agents

Databricks introduces a multi-agent workflow using AI/BI Genie and Agent Bricks to automate document data extraction and activation.

15 days ago
OpenAI Slashes API Latency with WebSockets
Artificial Intelligence

OpenAI Slashes API Latency with WebSockets

OpenAI's Responses API now uses WebSockets to slash latency in AI agent workflows, achieving up to 40% speed improvements and enabling faster model inference.

15 days ago
Gemma 4 Runs on iPhone Using MLX
Artificial Intelligence

Gemma 4 Runs on iPhone Using MLX

Adrien Grondin of Locally AI showcased running Google's Gemma 4 LLM on an iPhone using Apple's MLX framework, achieving impressive speeds.

17 days ago
Google DeepMind's Gemma 4 Models Shine at AI Engineer Europe
Artificial Intelligence

Google DeepMind's Gemma 4 Models Shine at AI Engineer Europe

Google DeepMind's Omar Sanseviero shared insights into the Gemma 4 family of open AI models at AI Engineer Europe, highlighting their performance, on-device capabilities, and community adoption.

17 days ago
Open-Ended LLM Discovery with AC/DC
AI Research

Open-Ended LLM Discovery with AC/DC

AC/DC framework enables open-ended LLM discovery via coevolving models and tasks, yielding superior capabilities with less memory.

20 days ago
Cloudflare Unweights LLMs by 22%
Technology

Cloudflare Unweights LLMs by 22%

Cloudflare's 'Unweight' system slashes LLM model sizes by up to 22% using lossless compression, enhancing inference speed and efficiency.

21 days ago
Pre-training Space RL for Enhanced LLM Reasoning
AI Research

Pre-training Space RL for Enhanced LLM Reasoning

New PreRL framework optimizes LLM reasoning by directly refining the pre-training distribution P(y), enhanced by Negative Sample Reinforcement and Dual Space RL.

21 days ago
Snowflake Adds Claude Opus 4.7 to AI Toolkit
Technology

Snowflake Adds Claude Opus 4.7 to AI Toolkit

Snowflake integrates Anthropic's Claude Opus 4.7 into Cortex AI, enhancing coding, intelligence agents, and data analysis capabilities for enterprises.

21 days ago
Cloudflare Unifies AI Model Access
Technology

Cloudflare Unifies AI Model Access

Cloudflare's AI Gateway now unifies access to over 70 AI models from multiple providers via a single API, simplifying development and cost management.

22 days ago
Cloudflare's LLM Infrastructure Deep Dive
Technology

Cloudflare's LLM Infrastructure Deep Dive

Cloudflare details its advanced infrastructure optimizations for running large language models on its Workers AI platform, focusing on performance and cost-efficiency.

22 days ago
Cloudflare AI Search Simplifies Agent Development
Technology

Cloudflare AI Search Simplifies Agent Development

Cloudflare AI Search offers a simplified, plug-and-play primitive for developers to integrate robust search capabilities into AI agents.

22 days ago
Simulators Unlock LLM Physics Reasoning
AI Research

Simulators Unlock LLM Physics Reasoning

Physics simulators are proving to be a scalable data source for training LLMs in physical reasoning, demonstrating impressive zero-shot transfer to real-world benchmarks.

23 days ago
ChatGPT's New Research Tools
Artificial Intelligence

ChatGPT's New Research Tools

OpenAI enhances ChatGPT with 'search' and 'deep research' tools for real-time web data access and in-depth analysis.

27 days ago
OpenAI Demystifies AI Basics
Artificial Intelligence

OpenAI Demystifies AI Basics

OpenAI's new 'AI Fundamentals' course simplifies AI, explaining LLMs and model evolution for everyone.

27 days ago
OpenAI's ChatGPT: A Research Power-Up
Artificial Intelligence

OpenAI's ChatGPT: A Research Power-Up

OpenAI is positioning ChatGPT as a powerful research tool, offering modes for quick overviews and deep dives, complete with citations.

27 days ago
OpenAI's Guide to Safe AI Use
Artificial Intelligence

OpenAI's Guide to Safe AI Use

OpenAI provides guidelines for safe and effective use of its AI tools, emphasizing human oversight, verification, and transparency.

27 days ago
LLMs' Leap: From Knowledge to Innovation
AI Research

LLMs' Leap: From Knowledge to Innovation

Researchers explore LLM algorithm reinvention via unlearning, finding hints and reinforcement learning boost success, while generative verifiers prevent reasoning collapse.

27 days ago
Quantifying LLM Impact on Labor Skills
AI Research

Quantifying LLM Impact on Labor Skills

New research introduces the Skill Automation Feasibility Index (SAFI), benchmarking LLMs and revealing a capability-demand inversion. AI augmentation is prevalent, not pure automation.

27 days ago
NVIDIA DGX Spark: Local LLM Performance Benchmarks
Artificial Intelligence

NVIDIA DGX Spark: Local LLM Performance Benchmarks

NVIDIA's Mozhgan Kabiri Chimeh reveals performance benchmarks for local LLM deployment on DGX Spark, highlighting the impact of model size, quantization, and the GB10 Grace Blackwell Superchip.

27 days ago
LLM Evaluators: Beyond Naive Judgments
Artificial Intelligence

LLM Evaluators: Beyond Naive Judgments

Mahmoud Malaeb of Argenta discusses the limitations of naive LLM judges and introduces GEPA, an optimization framework for building more accurate LLM evaluators using a data flywheel approach.

27 days ago
Fujitsu's Dippu Singh on AI for Voice Data Analysis
Artificial Intelligence

Fujitsu's Dippu Singh on AI for Voice Data Analysis

Dippu Kumar Singh from Fujitsu outlines an AI-powered "VoiceOps" framework for contact centers, detailing its architecture, benefits, and future development.

30 days ago
AI Hacker "Pliny the Liberator" Tests GPT-4 Security
AI Research

AI Hacker "Pliny the Liberator" Tests GPT-4 Security

AI security researcher "Pliny the Liberator" demonstrates a novel jailbreaking technique using "tokenades" to manipulate AI models, showcasing the ongoing challenges in AI security.

about 1 month ago
AI Model Compression: Key to Efficient LLM Deployment
Artificial Intelligence

AI Model Compression: Key to Efficient LLM Deployment

Cedric Clyburn of Redh explains how AI model compression, especially quantization, is crucial for efficient LLM deployment, reducing costs and improving performance.

about 1 month ago
Meta-Harness: AI Optimizes AI Development
AI Research

Meta-Harness: AI Optimizes AI Development

Researchers unveil Meta-Harness, a novel AI system that automates harness optimization, leading to faster and more capable LLMs.

about 1 month ago
Cloudflare Opens Advanced Client-Side Security
Technology

Cloudflare Opens Advanced Client-Side Security

Cloudflare now offers its advanced client-side security tools to all users, enhanced by AI for smarter threat detection and fewer false positives.

about 1 month ago
Chroma's Context-1: Faster, Cheaper AI Search
Artificial Intelligence

Chroma's Context-1: Faster, Cheaper AI Search

Chroma Context-1, a 20B parameter AI model, offers frontier-level search performance at a fraction of the cost and latency, using self-editing to manage context efficiently.

about 1 month ago