#Large Language Models
50 articles with this tag

Tiiny AI Pocket Lab Hits $1M on Kickstarter
Tiiny AI's Pocket Lab, a personal AI supercomputer, raised over $1 million in five hours on Kickstarter, signaling demand for local AI processing.

IBM's Martin Keen on LLM Context Windows
IBM's Martin Keen explains how larger context windows in LLMs simplify deployments and improve reasoning by reducing reliance on complex RAG systems.

Microsoft's Compact AI Learns to Reason
Microsoft's new Phi-4-reasoning-vision-15B model offers strong multimodal reasoning capabilities in a compact, efficient package.
Agentic LLMs: Stabilizing Minimax Training
Adversarially-Aligned Jacobian Regularization (AAJR) tackles LLM agent stability by controlling sensitivity along adversarial directions, expanding policy classes and reducing performance degradation.
RLAIF Explained: Latent Values in LLMs
RLAIF explained: Human values are latent directions in LLM representations, activated by constitutional prompts, with alignment ceiling tied to model capacity and data quality.

OpenAI GPT-5.4 Launch Amid AI Race Intensifies
OpenAI is reportedly fast-tracking the launch of GPT-5.4, a new AI model, in response to rapid advancements from competitors like Anthropic.
OpenAI's GPT-5.3 Instant Promises Smoother AI Chat
OpenAI's GPT-5.3 Instant aims for more natural and efficient AI conversations, enhancing web searches and reducing conversational dead ends.

OpenAI's GPT-4.5 Enhances Web Search Integration
OpenAI researcher Josh discusses how GPT-4.5's web search integration is becoming more natural, conversational, and context-aware.
Recursive LLMs Tackle Long-Horizon Reasoning
New research introduces recursive language models to overcome context limitations, showing significant improvements on long-horizon reasoning tasks like Boolean satisfiability.
Decoupling Correctness and Checkability in LLMs
Researchers propose a 'translator' model to overcome the 'legibility tax' in LLMs, decoupling accuracy from output checkability for more trustworthy AI.
LLMs Revolutionize Vehicle Routing Optimization
A new LLM-powered approach, AILS-AHD, significantly advances vehicle routing optimization by dynamically designing heuristics, setting new performance records.
Multimodal LLMs: What's Lost in Translation?
New research reveals multimodal LLMs struggle to utilize non-textual data due to a 'mismatched decoder problem,' impacting their true understanding.

OpenClaw Agents: The Future of AI Autonomy?
OpenClaw Agents, powered by advanced reasoning LLMs, are poised to redefine AI autonomy and potentially disrupt current application paradigms.

OpenAI Lands $110B, Valued at $730B
OpenAI has announced a massive $110 billion funding round at a $730 billion pre-money valuation, backed by Amazon, NVIDIA, and SoftBank.

Etched Secures $500M for AI Chip Battle
Google alum Reiner Pope's startup, Etched, raises $500M to develop specialized AI chips designed to compete with Nvidia.

Intuit Taps Anthropic for AI Partnership
Intuit's stock saw a modest gain following its multi-year partnership with Anthropic, aimed at integrating custom AI agents for businesses and consumers.

Arcee Trinity Large Breaks Cover
Arcee.ai unveils Trinity Large, a 400B-parameter Mixture-of-Experts model engineered for inference efficiency and enterprise long-context use, alongside smaller variants.

Governing Agentic AI by 2026
As agentic AI trends accelerate towards 2026, robust governance frameworks encompassing identity, policy, and enforcement are crucial for safe and ethical autonomous AI deployment.

GPT-OSS-Puzzle-88B: Faster AI, Same Brains
GPT-OSS-Puzzle-88B offers substantial inference speedups for large language models without sacrificing accuracy, utilizing techniques like MoE pruning and window attention.

AI Societies' Safety Problem
Self-evolving AI societies face an impossible trilemma: achieving continuous learning, isolation, and safety alignment simultaneously.
Testing AI Guardrails Across Languages
Researchers tested context-aware AI guardrails across English and Farsi in humanitarian scenarios, finding nuanced performance differences and highlighting the need for language-specific safety evaluations.

AI Coding Tests Flawed by Infrastructure Noise
The infrastructure powering AI coding tests can significantly inflate or deflate model scores, potentially masking true capabilities and misleading deployment decisions.

Claude Opus 4.6: Smarter, Faster, and Longer Context
Anthropic's Claude Opus 4.6 launches with a 1M token context window, enhanced coding, and state-of-the-art benchmark performance.

Uniqueness-Aware RL stops LLMs from getting lazy
Uniqueness-Aware RL prevents LLMs from converging on a single solution path by explicitly rewarding correct answers that employ rare problem-solving strategies.

AI’s Dual Reality: Safety Theater and the Autonomous Arms Race to AGI
\n “I worry a lot about the unknowns.” This sentiment, expressed by Anthropic CEO Dario Amodei, encapsulates the pervasive anxiety defining the current era of a...
AI’s Dual Reality: Safety Theater and the Autonomous Arms Race to AGI
\n “I worry a lot about the unknowns.” This sentiment, expressed by Anthropic CEO Dario Amodei, encapsulates the pervasive anxiety defining the current era of a...

NeuroDiscoveryBench Sets New Standard for Neuroscience AI Benchmarks

A Philosopher's Lens on AI's Evolving Consciousness

Anthropic Unveils Advanced APIs for Agentic AI Development
Claude.ai: Amplifying Human-AI Collaboration Through Intelligent Context and Customization
The evolving landscape of artificial intelligence increasingly points towards a future where AI serves not just as a tool, but as an integrated thinking partner...

Claude.ai: Amplifying Human-AI Collaboration Through Intelligent Context and Customization
The evolving landscape of artificial intelligence increasingly points towards a future where AI serves not just as a tool, but as an integrated thinking partner...

Claude.ai's Projects Feature Elevates Enterprise AI Interaction

GPT-5.1: The Art and Science of Intelligent Personalities
\"Part of the art here is figuring out how to pull out these quirks in the model that can come across as personality without breaking steerability.
GPT-5.1: The Art and Science of Intelligent Personalities
\"Part of the art here is figuring out how to pull out these quirks in the model that can come across as personality without breaking steerability.

Building Cursor Composer – Lee Robinson, Cursor

Claude's Research Feature Redefines Information Synthesis for Elite Professionals

OpenAI's Future Hinges on Enterprise Adoption and Sustained Funding

Meta's AI Investment Pays Off: A Clear Return Amidst the Tech Race

How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

Claude's Agent Skills Unlock Granular AI Expertise

Agentic AI Rewrites the Rules for Real-Time Sports Fan Engagement

Claude Opus 4.5 Unlocks Advanced Reasoning and Efficiency
Anthropic’s latest demonstration of Claude Opus 4.5 tackling a multi-layered puzzle game reveals a profound evolution in how large language models interact with...
Claude Opus 4.5 Unlocks Advanced Reasoning and Efficiency
Anthropic’s latest demonstration of Claude Opus 4.5 tackling a multi-layered puzzle game reveals a profound evolution in how large language models interact with...

Context Engineering: The Graph-Powered Evolution of AI Context

The Shifting Sands of AI Supremacy: ChatGPT's Lightning Bolt Meets Gemini's Insane Leap

Gemini's Ascent: Google's Existential Challenge to OpenAI

Anthropic's Opus 4.5: Redefining AI Capabilities and Efficiency
Claude Opus 4.5 Delivers Actionable Outputs for Complex Business Tasks
The era of generative AI merely producing drafts is rapidly receding, as evidenced by the latest demonstration of Claude Opus 4.5.

Claude Opus 4.5 Delivers Actionable Outputs for Complex Business Tasks
The era of generative AI merely producing drafts is rapidly receding, as evidenced by the latest demonstration of Claude Opus 4.5.
