#Large Language Models
50 articles with this tag

Memora: Microsoft's AI Memory Upgrade
Microsoft's Memora AI memory system revolutionizes long-term AI interactions by balancing detailed recall with efficient retrieval, outperforming existing solutions.

AI Coding Token Reduction: Rajkumar Sakthivel on Local Code Index
Rajkumar Sakthivel from Tesco discusses how a local code index reduced AI coding tokens by 94%, optimizing costs and performance by focusing on context over model improvements.

Erik Hanchett: Cut AI Agent Token Costs
AWS Developer Advocate Erik Hanchett shares five essential strategies to cut AI agent token costs, including caching prompts, routing by difficulty, and managing conversation history.

Isadora Martin-Dye: Layering AI Tone Instructions
Isadora Martin-Dye explains why simple tone instructions for AI are insufficient, advocating for a four-layered approach to prompt engineering.
AI Agents Need More Than Just Brains
AI agents require more than just powerful LLMs; they need a robust harness infrastructure for reliable real-world task execution.

Uber Eats Tries Agentic Shopping
Uber Eats' Cart Assistant uses AI to translate natural language requests into draft grocery carts, simplifying the shopping process.

Simulating Humans at Scale: Simile's Joon Sung Park
Simile's Joon Sung Park discusses simulating human behavior at scale using LLMs to understand societal dynamics and emergent phenomena.

TCS Taps Anthropic's Claude for Regulated Industries
TCS partners with Anthropic to bring Claude AI to regulated industries like finance and healthcare, integrating it into its own operations and client solutions.

DeepMind's Kilpatrick on AI Models Eating Harnesses
Google DeepMind's Logan Kilpatrick delves into the AI concept of models "eating the harness," explaining how over-specialization hinders generalization and what can be done to prevent it.
Steering LRMs Beyond Output Degradation
A new probe-based method, FPCG, distinguishes prediction from detection features to enable precise large reasoning models steering with minimal output quality degradation.

Google DeepMind Discusses Open Models & AI Ownership
Google DeepMind's Gus Martins and Ian Ballantyne discuss the benefits of open AI models like Gemma for ownership, control, and custom applications.
Claude Fable 5 Arrives on Databricks
Anthropic's Claude Fable 5 is now available on Databricks, offering advanced AI capabilities with enterprise-grade governance and cost controls.

Alex Bowcut on RAG: Accuracy Over Obsolescence
Alex Bowcut of Sphere discusses why Retrieval Augmented Generation (RAG) remains vital for AI applications demanding accuracy, especially in specialized fields like tax compliance.
Images as the New Reasoning Medium
This paper introduces optical reasoning, enabling images to serve as the primary medium for LLM and MLLM reasoning, achieving higher token efficiency and competitive performance.

Google's Gemma 4 12B: AI on Your Laptop
Google's Gemma 4 12B model brings efficient, multimodal AI directly to laptops with a novel unified architecture.

AI Agents Get Dumber With More Context, Expert Warns
Nupur Sharma of Qodo explains how too much context can hinder AI agents, leading to the 'lost in the middle' problem, and discusses solutions like context engines and hybrid orchestration.
NF-CoT: High-Bandwidth Latent Reasoning
NF-CoT framework enables high-bandwidth latent reasoning using normalizing flows, boosting LLM performance and efficiency while preserving autoregressive strengths.

Brendon Dillon on Text Diffusion at Google DeepMind
Brendon Dillon from Google DeepMind discusses the advancements and potential of text diffusion models in language generation, highlighting advantages over autoregressive models.
Databricks Search Gets 3x Faster
Databricks' Instructed-Retriever-1 model uses parallel test-time scaling to boost Knowledge Assistant search speed by over 3x.

Together AI Masters MiniMax M3 Inference
Together AI details engineering feats enabling efficient MiniMax M3 inference, unlocking 1M-token context and multimodality.
Claude Code Embraces Opus 4.8
Claude Code updates its default model to Opus 4.8, introduces dynamic workflows, security plugins, and faster, cheaper Opus options.

Snowflake Adds Claude Opus 4.8
Snowflake integrates Anthropic's Claude Opus 4.8 into its Cortex AI platform, boosting agentic workflows, data analysis, and code generation for enterprises.

Anthropic Debuts Claude Opus 4.8
Anthropic unveils Claude Opus 4.8, boosting AI performance with new features like 'effort control' and 'dynamic workflows' for complex coding.

CAG vs. Long Context: AI's Memory Explained
IBM's Martin Keen explains how AI models use Long Context and Cache Augmented Generation (CAG) to process information, highlighting the trade-offs and efficiency gains of each approach.

Angus McLean on Bounded Autonomy in AI
Angus J. McLean of Oliver discusses 'Bounded Autonomy' in AI, exploring the shift to agentic processes in advertising and offering practical advice for building AI agents.
Unlocking LLM Recall: Data Composition is Key
New research reveals a sigmoid scaling law for LLM factual recall, driven by model size and training data composition, explaining up to 94% of performance variance.

Google Launches Gemini 3.5 Flash
Google unveils Gemini 3.5 Flash, a fast and intelligent AI model optimized for agentic tasks, now powering consumer and developer tools.

Spotify's Shivam Verma on LLMs and Personalization
Shivam Verma from Spotify discusses how LLMs are transforming personalization in recommendation systems, moving towards steerable and context-aware content discovery.

AI Delegation: Reliability Concerns Emerge
New Microsoft Research highlights how AI can degrade document fidelity in long, delegated tasks, stressing the need for better verification and orchestration.
ChatGPT Gets Smarter on Sensitive Chats
OpenAI's latest ChatGPT safety updates help the AI better understand context in sensitive conversations, improving its response to potential harm.

AI Agents Flunk Social Reasoning Test
Microsoft's SocialReasoning-Bench reveals AI agents struggle to negotiate effectively in users' best interests, prioritizing task completion over optimal outcomes.

Sally-Ann Delucia on AI Agent Context Management
Sally-Ann Delucia of Arize discusses the challenges and strategies for context management in AI agents, highlighting the importance of memory and sub-agents.
Databricks' Genie Data Agent
Databricks unveils Genie, a sophisticated data agent designed to navigate complex enterprise data, leveraging specialized search, parallel thinking, and multi-LLM designs for enhanced accuracy.
JACTUS AI Unifies Compression and Adaptation
JACTUS AI unifies parameter compression and task adaptation, outperforming sequential methods with fewer retained parameters across vision and language tasks.
OpenAI boosts ChatGPT with GPT-5.5 Instant
OpenAI upgrades ChatGPT with GPT-5.5 Instant, boosting accuracy, personalization, and user control over AI memory.

Training LLMs Locally: ElevenLabs Expert Shares How-To
Angelos Perivolaropoulos of ElevenLabs shares a practical guide to training Large Language Models (LLMs) from scratch on local hardware.

Andrej Karpathy: AI Models Need Human-Like Reasoning
Andrej Karpathy discusses the evolution of AI from programming to prompting, emphasizing the current need for models to develop human-like reasoning.

Perplexity CTO on GPT-5.5 Efficiency
Perplexity CTO Denis Yarats reveals GPT-5.5's impressive efficiency, using 56% fewer tokens for complex tasks and enabling faster user feedback.

Anthropic Delays 'Myths' AI Model Amid Security Concerns
Anthropic delays release of its 'Myths' AI model after a security researcher found it could be prompted to simulate a bank robbery, raising safety concerns.
OpenAI Unveils GPT-5.5
OpenAI launches GPT-5.5, boasting enhanced intelligence, autonomy, and speed for complex tasks, alongside advanced safety features.

AI's Memory Problem
AI models currently struggle to learn and adapt post-deployment, relying on external memory. Continual learning research aims to change that.

Sunil Pai on AI Agents & the Future of Software
Cloudflare's Sunil Pai discusses the future of AI agents, moving from tool-calling to code generation for more efficient and powerful interactions.

Anthropic Unveils Opus 4.7: A Leap in AI Coding and Vision
Anthropic unveils its updated Opus 4.7 AI model, boasting enhanced coding and computer vision capabilities, with a key focus on cybersecurity.

Anthropic's Claude Opus 4.7 Arrives, Sharper Than Ever
Anthropic unveils Claude Opus 4.7, boosting AI's coding prowess, multimodal input, and safety features for enterprise use.
OpenAI Demystifies AI Basics
OpenAI's new 'AI Fundamentals' course simplifies AI, explaining LLMs and model evolution for everyone.
AI Agents Need Better Memories
Databricks research explores how AI agents can improve by accessing vast stores of past interactions and organizational knowledge, moving beyond just larger models.
LLM Adaptation Without Retraining
In-Place Test-Time Training enables LLMs to adapt to new data at inference without retraining, enhancing performance and paving the way for continual learning.

LLMs Learn to Play Tic-Tac-Toe with Reinforcement Learning
Stefano Fiorucci discusses the power of reinforcement learning for training LLMs, showcasing Tic-Tac-Toe as a case study for building interactive environments and improving model capabilities.

AI Hacker "Pliny the Liberator" Tests GPT-4 Security
AI security researcher "Pliny the Liberator" demonstrates a novel jailbreaking technique using "tokenades" to manipulate AI models, showcasing the ongoing challenges in AI security.

China's AI Surge: Open Source Models Face Scrutiny
Bloomberg Opinion columnist Catherine Thorbecke discusses China's booming AI sector, the rise of open-source models, and the critical need for security and data privacy.