#LLM
50 articles with this tag
Databricks Tackles Code Complexity for AI Assistants
Databricks details how AST-based chunking and MLflow evaluation improve AI assistants' understanding of complex codebases.

Sakana AI Maps Social Media 'Cognitive Warfare'
Sakana AI's new system analyzes social media for state-sponsored 'cognitive warfare,' processing millions of posts to uncover narratives and generate hypotheses.
Mazda's GenAI Leap in Service Ops
Mazda built a governed GenAI assistant on Databricks Lakehouse in 8 weeks to improve technical service operations, integrating RAG and Unity Catalog.

Data Agents Need Context Layer
AI data agents are failing due to a lack of context. A new 'context layer' aims to provide the necessary business understanding for agents to function effectively.
OpenAI shrinks GPT-5.4 for speed
OpenAI unveils GPT-5.4 mini and nano, smaller, faster AI models optimized for coding and subagent tasks, offering improved efficiency and lower costs.

AI's Consciousness Debate
Vishal Misra and Martin Casado discuss LLM functionality, the path to AGI, and the role of data in AI development.

Perplexity's Agent API Unifies LLM Access
Perplexity's new Agent API offers a unified interface to multiple LLM providers, simplifying development with integrated search and tools.
OpenAI Buys Promptfoo
OpenAI is acquiring AI security platform Promptfoo to enhance the security, safety, and evaluation features within its Frontier platform for AI coworkers.
OpenAI Tames AI Chaos with Instruction Hierarchy
OpenAI's new IH-Challenge dataset trains AI models to prioritize instructions, enhancing safety and mitigating risks like prompt injection.

IBM's Martin Keen on LLM Context Windows
IBM's Martin Keen explains how larger context windows in LLMs simplify deployments and improve reasoning by reducing reliance on complex RAG systems.
Supermemory CEO on AI Memory: "We need to get this right"
Supermemory CEO Dhravya Shah discusses the evolution of AI memory, the company's innovative approach to personalizing AI experiences, and the critical importance of getting memory systems right for the future of AI.

AI Agents Now Do Overnight Research
An automated system uses AI agents to conduct overnight LLM training experiments, modifying code and iterating on models autonomously.
Databricks Automates PII Discovery with LLMs
Databricks deploys LogSentinel, an LLM-powered system, to automate PII detection and data governance across its platform, slashing review times and enhancing security.

OpenAI GPT-5.4: The Unified AI Powerhouse?
OpenAI unveils GPT-5.4, its most advanced AI model for professional tasks, boasting enhanced reasoning, coding, and computer interaction capabilities.
Databricks Streamlines Lakehouse Migrations
Databricks updates Lakebridge with AI-powered SQL conversion and enhanced assessment tools to simplify data warehouse migrations.
OpenAI Details GPT-5.4 Thinking Safety
OpenAI details safety measures for its new GPT-5.4 Thinking model, with a focus on high-capability cybersecurity risks.

ChatGPT 5.4 Demonstrates Enhanced Contextual Understanding
ChatGPT 5.4 demonstrates advanced contextual understanding, interpreting emotional cues and location data to provide tailored volunteer recommendations in NYC.
Databricks' KARL Cuts Agent Costs
Databricks' new KARL AI agent drastically cuts costs and latency for enterprise knowledge tasks using custom reinforcement learning.

Exa Deep: Search Agents Get Smarter
Exa Deep gets a major upgrade, functioning as an intelligent agent for complex searches with faster speeds, lower costs, and structured, grounded results.

Microsoft's Phi-4-reasoning-vision-15B compact AI model
Microsoft Research's Phi-4-reasoning-vision-15B offers efficient multimodal AI, excelling in reasoning and vision tasks with less data and compute.

Google's Interactions API Evolves Gemini
Google's new Interactions API for Gemini models offers a unified interface for complex AI tasks, supporting multimodal inputs, agents, and tool integration.
CHIMERA Dataset Boosts LLM Reasoning
Researchers introduce CHIMERA, a synthetic dataset enabling LLMs to achieve strong cross-domain reasoning capabilities with efficient training.

LLMs Lost in Transmission: Why Global Reasoning Fails
A new paper reveals transformer LLMs struggle with complex global reasoning due to limited 'effective bandwidth,' solvable by Chain of Thought.

Claude Sonnet 4.6 Ups the AI Ante
Anthropic's Claude Sonnet 4.6 launches with major upgrades in coding, reasoning, and computer use, plus a 1M token context window.

OpenClaw v2 Enhances Agent Interactions
OpenClaw Components v2 rolls out enhanced Discord interactions, nested sub-agents, and a broad range of security fixes for AI agent platforms.

PicoClaw: AI on a Shoestring Budget
PicoClaw, an ultra-lightweight AI assistant in Go, runs on $10 hardware with <10MB RAM, boasting AI-driven development and broad portability.

Karpathy's microGPT: AI's minimalist masterpiece
Andrej Karpathy's microGPT is a minimalist, dependency-free Python implementation of a GPT language model, designed as an educational art project to showcase core AI mechanics.

Any-LLM Integrates Go for Unified Model Access
Mozilla.ai releases any-llm-go, a library enabling unified access to diverse LLMs via a single Go API, simplifying integration and provider switching.

OpenAI Ads: The Inevitable Future of AI
OpenAI's introduction of ads for free users signals an inevitable monetization strategy for mass AI accessibility, mirroring the internet's ad-supported model.

Cracking OpenAI's Training Data Secrets
A novel emoji-based technique allows researchers to infer the composition of OpenAI's training data, suggesting the inclusion of reasoning traces.

Prompt Caching: Turbocharging AI Transformers
Prompt caching dramatically reduces LLM latency and costs by storing and reusing intermediate computations, making AI transformers faster for applications like chatbots.

OpenClaw Sparks App Extinction Fears
OpenClaw's system-native AI agents are poised to disrupt the software landscape, potentially rendering many existing apps obsolete by offering direct system control and emergent problem-solving.
Multilingual LLM Guardrails Tested
Researchers tested how LLM guardrails perform across languages and policy phrasings, revealing significant variations that impact AI safety assessments.

Google Explores Specific AI Overviews Website Controls
Google is exploring dedicated controls that allow websites to specifically opt out of Search generative AI features, including AI Overviews, driven by CMA regulatory pressure.

Mozilla AI Future: The Open Source Counter-Manifesto
Mozilla’s 2025/26 report confirms major investment in open source AI and privacy tech, positioning the organization against centralized Big Tech control.

Arm AI infrastructure is now the cloud default
The architectural shift to Arm AI infrastructure is accelerating, positioning it as the foundational layer for next-generation, energy-efficient AI workloads in the cloud.

Agentforce Slashes AI Latency by 70%
Agentforce achieved a 70% AI latency reduction by rearchitecting its agent runtime, consolidating four sequential LLM calls down to two and deploying specialized SLMs.

Make videos with Claude Code: Remotion AI video makes production code from plain prompts
The Remotion AI video integration with Claude Code shifts video production from manual editing to scalable, deterministic code generation via natural language instructions.
Make videos with Claude Code: Remotion AI video makes production code from plain prompts
\n In the rapidly maturing landscape of AI content tools, the most significant shifts are often found not in the flashy generative models, but in the programmat...
The Humans in the Loop AI Model is the Only Way to Scale
The Humans in the Loop AI framework ensures that while AI handles repetitive tasks and data, human experts provide the final layer of judgment, empathy, and strategic oversight.

Gemini for education: Google democratizes AI access
Google is strategically democratizing access to Gemini for education, embedding advanced AI capabilities into the core Workspace platform at no additional cost.

Gemini in Chrome: Google’s AI Strategy Locks Down the Classroom
The introduction of Gemini in Chrome fundamentally changes how digital content is consumed and taught by embedding generative AI directly into the browser.

X's 'For You' Feed is Now Grok AI News, Ditching Old Rules
X’s core recommendation system has eliminated all hand-engineered features, relying instead on a Grok-based transformer model for content ranking.

Brex’s Multi-Agent Network Replaces Dashboards with Executive Assistants

OpenAI’s ChatGPT ads are here, and the trust claims are thin
OpenAI is moving aggressively to monetize its massive user base, but the introduction of ChatGPT ads immediately clashes with its stated principles of trust and independence.

Cloudflare acquires Astro, betting on content and AI monetization
Cloudflare is making a dual strategic investment, acquiring the Astro web framework and the Human Native AI data marketplace to control the full content lifecycle from creation to AI monetization.

Anthropic Economic Index: AI Speeds Up Complex Tasks, But Deskills
Anthropic's new Economic Index shows that while Claude excels at complex, high-skill tasks, this capability could lead to a net deskilling effect across many professions.

No code AI agents Get Reliable Consistency Controls
The Agentforce Builder platform now enables the creation of reliable No code AI agents using plain language and deterministic conditional steps.

TranslateGemma models redefine open source translation efficiency
The new TranslateGemma models achieve state-of-the-art translation quality while requiring significantly fewer parameters than previous open source baselines.

Agentic time-series forecasting: Context is the new data.
MoiraiAgent shifts time-series forecasting from static numerical models to an intelligent, agentic framework that reasons over real-world context and selects optimal expert models.