#LLM

50 articles with this tag

GitHub Cuts Agentic Workflow Costs

GitHub implements new strategies to cut token costs in its automated agentic workflows by enhancing logging and optimizing tool usage.

about 3 hours ago

Artificial Intelligence

OpenAI's New Voice API Models

OpenAI introduces GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper to its API, enhancing voice intelligence for developers.

about 7 hours ago

Artificial Intelligence

Parloa AI Agents Mimic Human Service

Parloa's AI Agent Management Platform uses OpenAI models to build, simulate, and deploy voice-driven customer service agents, prioritizing real-world performance and reliability.

about 13 hours ago

Artificial Intelligence

Uber Taps OpenAI for Smarter Driving, Faster Booking

Uber integrates OpenAI models to boost driver earnings with an AI assistant and enhance rider experiences through faster booking and new voice features.

1 day ago

AI Research

Automating Multi-Agent System Creation

A new framework automates the creation of multi-agent systems, significantly improving agent recall and system robustness through LLM-driven planning and a critique agent.

1 day ago

Artificial Intelligence

Superlinked's Filip Makraduli on Small Model Inference Infrastructure

Filip Makraduli of Superlinked discusses the critical need for robust small model inference infrastructure, highlighting Superlinked's open-source solution.

2 days ago

AI Research

Google DeepMind Accelerates AI on Edge Devices

Google DeepMind unveils Gemma 4 models and the LiteRT framework to accelerate AI on edge devices, emphasizing performance, privacy, and cross-platform capabilities.

2 days ago

Artificial Intelligence

RAG's Evolution: From Keywords to Agentic AI

Explore the evolution of Retrieval Augmented Generation (RAG) from basic keyword search to sophisticated agentic AI systems.

3 days ago

Claude's Corner

Claude's Corner: Sonarly — Your On-Call Engineer Just Called In Sick (Permanently)

Sonarly is an autonomous AI agent that triages production alerts, finds root causes with 78% accuracy, and opens fix PRs—while your on-call engineer sleeps.

5 days ago

Claude's Corner

Claude's Corner: Compresr — The Token Accountant Your AI Stack Desperately Needs

Four EPFL researchers built a PhD-backed LLM context compression API that could cut your token bill by 10x — or get eaten alive by Anthropic. Here's the technical breakdown and how to build your own.

7 days ago

Artificial Intelligence

IBM Experts on AI Training: Efficiency vs. Scale

IBM's Marina Danilevsky and Gabe Goodhart discuss the company's new 'Bob' and 'Granite' AI models, highlighting the shift towards specialized, efficient training and the challenges of distributed AI infrastructure.

7 days ago

AI Research

AI Agents on the Loose: Network Security Risks Emerge

Microsoft Research reveals how AI agents interacting at scale create new security risks like worms, reputation manipulation, and invisible attacks.

7 days ago

AI Research

Cross-Architecture dLLM Distillation

TIDE framework enables cross-architecture distillation for diffusion large language models, achieving significant performance gains with smaller student models.

7 days ago

Technology

Cursor's Agent Harness Gets Smarter

Cursor is meticulously refining its AI agent harness, focusing on dynamic context, rigorous evaluation, and model-specific customization to boost software development capabilities.

7 days ago

Artificial Intelligence

AI Agents Failures & How To Stop Them

Danilo Campagna from Posthog discusses common LLM code generation failures and strategies for improvement, focusing on context, architecture, and human error.

7 days ago

Artificial Intelligence

OpenAI's Goblin Problem

OpenAI's GPT-5.1 models developed a peculiar "goblin problem" due to training for a "Nerdy" personality, leading to unexpected creature metaphors.

8 days ago

Technology

DeepSeek V4 Pro Hits Together AI

Together AI launches DeepSeek V4 Pro, a 1.6T MoE model with a 512K context window and new cached input pricing for cost-effective long-context reasoning.

8 days ago

AI Research

Databricks GPT-5.5 Outperforms GPT-4 on OfficeQA Benchmark

Databricks Research Engineer Arnav Singhvi reveals GPT-5.5, a new AI model achieving state-of-the-art results on the OfficeQA benchmark and outperforming GPT-4.

8 days ago

Artificial Intelligence

AI Engineer: Small Models, Big Impact

Maxime Labonne of Liquid AI discusses the unique challenges and advantages of small AI models, detailing their architecture, training, and techniques to overcome issues like doom looping.

9 days ago

Artificial Intelligence

Open Source AI: Boon or Bane for Security?

IBM's Martin Keen and Gabe Goodhart discuss the security implications of open-source AI, balancing innovation with risk.

9 days ago

Technology

Together AI Slashes RL Training Time

Together AI's new distribution-aware speculative decoding slashes RL training time by up to 50%, tackling a major bottleneck in LLM post-training.

13 days ago

Artificial Intelligence

Matt Pocock on LLM Planning: "Don't Bite Off More Than You Can Chew"

Matt Pocock, AI expert, shares insights on effective LLM planning, highlighting the 'smart zone' vs. 'dumb zone' and the power of multi-phase plans with the 'grill-me' skill.

13 days ago

AI Research

Verifiable Reasoning in MLLMs

The V-tableR1 framework enables verifiable, multi-step reasoning in MLLMs by grounding logic in visual data, achieving SOTA on tabular benchmarks.

14 days ago

Technology

LLM Agents Tackle Database Joins

Databricks tests LLM agents for SQL join order optimization, achieving significant performance gains over traditional methods.

15 days ago

Technology

Databricks Activates Documents with AI Agents

Databricks introduces a multi-agent workflow using AI/BI Genie and Agent Bricks to automate document data extraction and activation.

15 days ago

Artificial Intelligence

OpenAI Slashes API Latency with WebSockets

OpenAI's Responses API now uses WebSockets to slash latency in AI agent workflows, achieving up to 40% speed improvements and enabling faster model inference.

15 days ago

Artificial Intelligence

Gemma 4 Runs on iPhone Using MLX

Adrien Grondin of Locally AI showcased running Google's Gemma 4 LLM on an iPhone using Apple's MLX framework, achieving impressive speeds.

17 days ago

Artificial Intelligence

Google DeepMind's Gemma 4 Models Shine at AI Engineer Europe

Google DeepMind's Omar Sanseviero shared insights into the Gemma 4 family of open AI models at AI Engineer Europe, highlighting their performance, on-device capabilities, and community adoption.

17 days ago

AI Research

Open-Ended LLM Discovery with AC/DC

AC/DC framework enables open-ended LLM discovery via coevolving models and tasks, yielding superior capabilities with less memory.

20 days ago

Technology

Cloudflare Unweights LLMs by 22%

Cloudflare's 'Unweight' system slashes LLM model sizes by up to 22% using lossless compression, enhancing inference speed and efficiency.

21 days ago

AI Research

Pre-training Space RL for Enhanced LLM Reasoning

New PreRL framework optimizes LLM reasoning by directly refining the pre-training distribution P(y), enhanced by Negative Sample Reinforcement and Dual Space RL.

21 days ago

Technology

Snowflake Adds Claude Opus 4.7 to AI Toolkit

Snowflake integrates Anthropic's Claude Opus 4.7 into Cortex AI, enhancing coding, intelligence agents, and data analysis capabilities for enterprises.

21 days ago

Technology

Cloudflare Unifies AI Model Access

Cloudflare's AI Gateway now unifies access to over 70 AI models from multiple providers via a single API, simplifying development and cost management.

22 days ago

Technology

Cloudflare's LLM Infrastructure Deep Dive

Cloudflare details its advanced infrastructure optimizations for running large language models on its Workers AI platform, focusing on performance and cost-efficiency.

22 days ago

Technology

Cloudflare AI Search Simplifies Agent Development

Cloudflare AI Search offers a simplified, plug-and-play primitive for developers to integrate robust search capabilities into AI agents.

22 days ago

AI Research

Simulators Unlock LLM Physics Reasoning

Physics simulators are proving to be a scalable data source for training LLMs in physical reasoning, demonstrating impressive zero-shot transfer to real-world benchmarks.

23 days ago

Artificial Intelligence

ChatGPT's New Research Tools

OpenAI enhances ChatGPT with 'search' and 'deep research' tools for real-time web data access and in-depth analysis.

27 days ago

Artificial Intelligence

OpenAI Demystifies AI Basics

OpenAI's new 'AI Fundamentals' course simplifies AI, explaining LLMs and model evolution for everyone.

27 days ago

Artificial Intelligence

OpenAI's ChatGPT: A Research Power-Up

OpenAI is positioning ChatGPT as a powerful research tool, offering modes for quick overviews and deep dives, complete with citations.

27 days ago

Artificial Intelligence

OpenAI's Guide to Safe AI Use

OpenAI provides guidelines for safe and effective use of its AI tools, emphasizing human oversight, verification, and transparency.

27 days ago

AI Research

LLMs' Leap: From Knowledge to Innovation

Researchers explore LLM algorithm reinvention via unlearning, finding hints and reinforcement learning boost success, while generative verifiers prevent reasoning collapse.

27 days ago

AI Research

Quantifying LLM Impact on Labor Skills

New research introduces the Skill Automation Feasibility Index (SAFI), benchmarking LLMs and revealing a capability-demand inversion. AI augmentation is prevalent, not pure automation.

27 days ago

Artificial Intelligence

NVIDIA DGX Spark: Local LLM Performance Benchmarks

NVIDIA's Mozhgan Kabiri Chimeh reveals performance benchmarks for local LLM deployment on DGX Spark, highlighting the impact of model size, quantization, and the GB10 Grace Blackwell Superchip.

27 days ago

Artificial Intelligence

LLM Evaluators: Beyond Naive Judgments

Mahmoud Malaeb of Argenta discusses the limitations of naive LLM judges and introduces GEPA, an optimization framework for building more accurate LLM evaluators using a data flywheel approach.

27 days ago

Artificial Intelligence

Fujitsu's Dippu Singh on AI for Voice Data Analysis

Dippu Kumar Singh from Fujitsu outlines an AI-powered "VoiceOps" framework for contact centers, detailing its architecture, benefits, and future development.

30 days ago

AI Research

AI Hacker "Pliny the Liberator" Tests GPT-4 Security

AI security researcher "Pliny the Liberator" demonstrates a novel jailbreaking technique using "tokenades" to manipulate AI models, showcasing the ongoing challenges in AI security.

about 1 month ago

Artificial Intelligence

AI Model Compression: Key to Efficient LLM Deployment

Cedric Clyburn of Redh explains how AI model compression, especially quantization, is crucial for efficient LLM deployment, reducing costs and improving performance.

about 1 month ago

AI Research

Meta-Harness: AI Optimizes AI Development

Researchers unveil Meta-Harness, a novel AI system that automates harness optimization, leading to faster and more capable LLMs.

about 1 month ago

Technology

Cloudflare Opens Advanced Client-Side Security

Cloudflare now offers its advanced client-side security tools to all users, enhanced by AI for smarter threat detection and fewer false positives.

about 1 month ago

Artificial Intelligence

Chroma's Context-1: Faster, Cheaper AI Search

Chroma Context-1, a 20B parameter AI model, offers frontier-level search performance at a fraction of the cost and latency, using self-editing to manage context efficiently.

about 1 month ago