LLM Agents Tackle Database Joins

Databricks is exploring the potential of Large Language Model (LLM) agents to solve one of the most persistent challenges in database performance: join order optimization. This research, detailed on the Databricks blog, applies frontier LLMs to a problem that has long vexed traditional query optimizers.

The core issue lies in the combinatorial explosion of possible execution plans as the number of tables in a SQL query increases. Traditional systems often rely on heuristics and cardinality estimators, which can misjudge subquery sizes, leading to inefficient query execution. This is where LLM agents aim to step in, acting as data-driven DBAs.

The Join Order Dilemma

Consider a query joining multiple tables like Actors, Movies, and Companies. The order in which these tables are joined significantly impacts performance. For instance, finding movies starring Scarlett Johansson first and then filtering for Sony productions might be faster or slower than the reverse, depending entirely on the data distribution.

Estimating the optimal order is notoriously difficult, especially as analytics queries can involve dozens of tables. Current optimizers use cardinality estimators, cost models, and search procedures, a complex stack requiring extensive engineering.

Agentic Optimization: A New Approach

Human experts can often diagnose and fix poor join orders, a process involving iterative testing and analysis. The Databricks prototype aims to automate this manual tuning with an LLM agent.

This agent isn't integrated into the high-speed query execution path. Instead, it functions as an offline experimenter. It leverages a tool that executes potential join orders, returning runtime and subplan sizes within the original query's time limit.

The agent iterates up to 50 times, balancing exploration of new plans with exploitation of promising ones. Structured outputs ensure only valid join reorderings are generated.

Promising Experimental Results

Tested on the Join Order Benchmark (JOB) with a scaled IMDb dataset, the LLM agent prototype demonstrated significant improvements. It enhanced query latency by a geometric mean factor of 1.288, outperforming the standard Databricks optimizer in 80% of cases.

The gains were particularly notable in the tail of the distribution, with P90 query latency dropping by 41%. This showcases the potential for LLM agents to improve SQL query optimization.

One notable success involved a query with difficult LIKE predicates, where the agent discovered a far more efficient plan than the default optimizer.

Future Questions and Directions

This research opens several avenues for future work, including expanding the agent's tools to include cardinality queries and proactive optimization triggers.

The goal is to harness LLMs' generalizability to enhance data systems themselves.

LLM Agents Tackle Database Joins

The Join Order Dilemma

Related startups

Agentic Optimization: A New Approach

Promising Experimental Results

Future Questions and Directions

AI Daily Digest