#Machine Learning
50 articles with this tag
AI Governance: Control, Not Code, Drives Success
Enterprise AI success hinges on robust governance, focusing on control and trust rather than just code, as Databricks leaders explain.

Microsoft Debugs AI Agents with AgentRx
Microsoft Research launches AgentRx, an open-source framework and benchmark for systematically debugging AI agent failures, improving accuracy by over 23%.
Databricks Serverless Simplifies Data Ops
Databricks serverless compute automates infrastructure management, boosting performance and cutting costs for data engineering workflows.
V2M-Zero: Temporal Music Sync Without Paired Data
V2M-Zero revolutionizes video-to-music generation by using event curves to achieve temporal synchronization without paired data, achieving significant performance gains.
Bayesian Uncertainty for Foundation Models
Variational Mixture-of-Experts Routing (VMoER) offers a scalable Bayesian approach to uncertainty quantification in foundation models, achieving significant improvements with minimal computational overhead.
Logos: Bridging Molecular Logic and Chemical Validity
Logos, a new molecular reasoning AI, integrates logical reasoning with chemical validity, outperforming larger models with fewer parameters and offering interpretable outputs.

Databricks CEO on AI Agents and Market Trends
Databricks CEO Ali Ghodsi discusses the launch of 'Genie Code,' an AI agent for non-technical users, and the acquisition of Quotient AI to enhance AI monitoring.
OpenAI Gives Models Computer Brains
OpenAI's Responses API now integrates a computer environment, empowering AI agents with tools, file systems, and secure network access for complex workflows.
Wayfair Taps OpenAI for Catalog and Support Overhaul
Wayfair integrates OpenAI's AI models into its core operations, boosting product catalog accuracy and supplier support efficiency.

AI for Climate: Priya Dhawale on Data & Solutions
MIT's Priya Dhawale discusses AI's role in climate solutions, the energy cost of AI, and the need for democratization in the field.
Databricks Buys Quotient AI
Databricks acquires Quotient AI to enhance AI agent reliability and performance in production environments, integrating its evaluation technology into key products.
Databricks' Genie Code: AI for Data Work
Databricks launches Genie Code, an AI agent designed to automate and optimize complex data workflows, promising to double success rates over traditional coding agents.
Databricks Unleashes Genie Code AI
Databricks launches Genie Code, an AI agent designed to automate data tasks and significantly improve success rates in data science.
Reasoning Nudges LLMs Towards Honesty
New research reveals that LLM reasoning enhances honesty not through content, but by leveraging the geometry of representational spaces, stabilizing honest defaults.

GitHub Copilot SDK: Execution is the New Interface
GitHub's new SDK allows developers to embed AI execution and agentic workflows directly into their applications, moving beyond simple text generation.
Beyond Token Count: Semantic Compression for LLMs
Researchers recast LLM reasoning as lossy compression using the Conditional Information Bottleneck (CIB), employing semantic surprisal for efficient token pruning.
OpenAI Tames AI Chaos with Instruction Hierarchy
OpenAI's new IH-Challenge dataset trains AI models to prioritize instructions, enhancing safety and mitigating risks like prompt injection.

Snowflake Targets Manufacturing with AI
Snowflake is integrating AI into its data cloud to offer manufacturers actionable insights for optimizing operations and improving quality control.

AI Memory Gets a Brain Upgrade
Microsoft Research's PlugMem system transforms AI interaction logs into structured knowledge, boosting agent efficiency and performance.

AI Agents Need Humans: The HITL Advantage
IBM AI Engineer Anna Gutowska explains why human intervention in AI agents is critical for preventing subtle errors and ensuring safe, effective deployment.

LeCun Starts $1B AI Firm
Yann LeCun launches Advanced Machine Intelligence (AMI Labs) with $1.03B seed funding to build AI systems grounded in 'world models'.

AI Agents Now Do Overnight Research
An automated system uses AI agents to conduct overnight LLM training experiments, modifying code and iterating on models autonomously.
Databricks Automates PII Discovery with LLMs
Databricks deploys LogSentinel, an LLM-powered system, to automate PII detection and data governance across its platform, slashing review times and enhancing security.

Microsoft's Compact AI Learns to Reason
Microsoft's new Phi-4-reasoning-vision-15B model offers strong multimodal reasoning capabilities in a compact, efficient package.
Balyasny's AI Engine
Balyasny Asset Management built a powerful AI research engine using OpenAI models, slashing analysis times and boosting investment team confidence.

AI Agents: Memory, Ownership, and the Future
AI experts Chris Hay and Aaron Baughman discuss the evolution of AI agents, focusing on memory, open vs. closed systems, and the future of agent-based AI.
Standardizing Survival HTE Evaluation
Introducing SurvHTE-Bench, the first comprehensive benchmark for evaluating heterogeneous treatment effects in survival data, promoting reproducible and rigorous research.

Copilot Code Review Hits 60 Million
GitHub's AI code review tool has processed over 60 million reviews, evolving to provide high-signal feedback that accelerates development.
OpenAI Unveils GPT-5.4 for Pro Work
OpenAI releases GPT-5.4, its most advanced model for professional tasks, integrating enhanced reasoning, coding, and computer-use capabilities.
AI Reasoning Flaws Are a Safety Feature
AI models' inability to control their "chains of thought" when monitored is a positive for AI safety, preventing them from easily deceiving oversight systems.
Databricks' KARL Cuts Agent Costs
Databricks' new KARL AI agent drastically cuts costs and latency for enterprise knowledge tasks using custom reinforcement learning.

Microsoft's Phi-4-reasoning-vision-15B compact AI model
Microsoft Research's Phi-4-reasoning-vision-15B offers efficient multimodal AI, excelling in reasoning and vision tasks with less data and compute.
DynFormer: Smarter AI for Complex Physics
DynFormer, a new dynamics-informed neural operator, significantly reduces error and memory usage in complex PDE simulations by using scale-aware Transformers.
Robots Learn to Peel Like Humans
Researchers developed a two-stage robot learning framework that uses imitation and human feedback to master complex, subjective manipulation tasks like peeling produce.
LM Agents Still Prone to Goal Drift
New research reveals that even state-of-the-art language models are susceptible to goal drift, particularly when influenced by weaker agents' trajectories.

AI Steals AI's Own Secrets: Distillation Attacks
New research reveals how 'distillation attacks' can steal proprietary AI models, creating significant intellectual property and security risks for businesses.

Google's Interactions API Evolves Gemini
Google's new Interactions API for Gemini models offers a unified interface for complex AI tasks, supporting multimodal inputs, agents, and tool integration.

Google's Gemini 3.1 Flash-Lite Targets Scale, Cuts Costs
Google DeepMind's Gemini 3.1 Flash-Lite arrives as its most cost-effective AI model, designed for scale and speed.
CHIMERA Dataset Boosts LLM Reasoning
Researchers introduce CHIMERA, a synthetic dataset enabling LLMs to achieve strong cross-domain reasoning capabilities with efficient training.
New Models Tackle Reasoning Puzzles with Symmetry
New Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs) offer improved performance and generalization on reasoning tasks like Sudoku and ARC-AGI by explicitly encoding symmetry.
Recursive LLMs Tackle Long-Horizon Reasoning
New research introduces recursive language models to overcome context limitations, showing significant improvements on long-horizon reasoning tasks like Boolean satisfiability.
DCDP: Dynamic Diffusion Policies for Robotics
The DCDP framework enhances robotic adaptability in dynamic environments by integrating real-time environmental dynamics for improved action correction, achieving significant performance gains with minimal computational overhead.
Spark Ditches Dual Engines for Real-Time Mode
Databricks' new Real-Time Mode for Spark aims to deliver sub-second streaming speeds, eliminating the need for separate processing engines.
Decoupling Correctness and Checkability in LLMs
Researchers propose a 'translator' model to overcome the 'legibility tax' in LLMs, decoupling accuracy from output checkability for more trustworthy AI.
LLMs Revolutionize Vehicle Routing Optimization
A new LLM-powered approach, AILS-AHD, significantly advances vehicle routing optimization by dynamically designing heuristics, setting new performance records.
Certified Circuits for Stable AI Explanations
New 'Certified Circuits' framework provides provable stability for AI model explanations, yielding more accurate and compact circuits.
Edge AI Acceleration Gets Flexible
Researchers developed a novel FPGA-based accelerator that dynamically adjusts neural network precision at runtime, boosting inference speed for edge AI.
AI Drives Safely Without Expert Data
Researchers introduce Risk-aware World Model Predictive Control (RaWMPC), enabling autonomous driving without expert data by predicting and avoiding risks.
AI Governance: Optimization's Normative Limits
A new paper on arXiv argues that optimization-based AI, including RLHF LLMs, are formally incapable of normative governance due to inherent structural limitations.
Predicting Transformer Training Instability
Researchers introduce RKSP, a method to predict transformer training divergence from a single forward pass, and KSS, a technique to actively prevent it, saving compute and enabling higher learning rates.