GitHub's AI Analyst

GitHub's Qubot, an AI agent powered by Copilot, democratizes data access for employees, allowing natural language queries of complex datasets.

8 min read
Illustration of the Qubot AI agent interface within GitHub's ecosystem.
Qubot streamlines data analysis for GitHub employees using AI.· Github Blog

GitHub has built an internal data analytics agent, dubbed Qubot, leveraging GitHub Copilot to allow any employee to query the company's vast data warehouse using plain language. This move aims to tackle the long-standing industry challenge of making data truly self-serve, a problem that AI is now poised to solve more effectively.

Visual TL;DR. Data Access Challenge solves GitHub Qubot. GitHub Qubot uses Natural Language Queries. Natural Language Queries enables Democratized Data Access. GitHub Qubot includes Context Layer. GitHub Qubot includes Evaluation Framework. Democratized Data Access leads to Fast Answers. Fast Answers supports Ad-hoc Inquiries.

Related startups

  1. Data Access Challenge: traditional data access requires deep knowledge of models and query languages
  2. GitHub Qubot: AI agent powered by GitHub Copilot for data analytics
  3. Natural Language Queries: allows employees to ask exploratory questions in plain English
  4. Democratized Data Access: enables any employee to query vast data warehouse
  5. Context Layer: provides relevant information to the AI for better answers
  6. Evaluation Framework: ensures accuracy and reliability of AI-generated answers
  7. Fast Answers: receive answers to complex data questions within seconds
  8. Ad-hoc Inquiries: designed for exploratory questions, not replacing dashboards
Visual TL;DR
Visual TL;DR, startuphub.ai Data Access Challenge solves GitHub Qubot. GitHub Qubot uses Natural Language Queries. Natural Language Queries enables Democratized Data Access. Democratized Data Access leads to Fast Answers solves uses enables leads to Data Access Challenge GitHub Qubot Natural Language Queries Democratized Data Access Fast Answers From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai Data Access Challenge solves GitHub Qubot. GitHub Qubot uses Natural Language Queries. Natural Language Queries enables Democratized Data Access. Democratized Data Access leads to Fast Answers solves uses enables leads to Data AccessChallenge GitHub Qubot Natural LanguageQueries Democratized DataAccess Fast Answers From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai Data Access Challenge solves GitHub Qubot. GitHub Qubot uses Natural Language Queries. Natural Language Queries enables Democratized Data Access. Democratized Data Access leads to Fast Answers solves uses enables leads to Data Access Challenge traditional data access requires deepknowledge of models and query languages GitHub Qubot AI agent powered by GitHub Copilot fordata analytics Natural Language Queries allows employees to ask exploratoryquestions in plain English Democratized Data Access enables any employee to query vast datawarehouse Fast Answers receive answers to complex data questionswithin seconds From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai Data Access Challenge solves GitHub Qubot. GitHub Qubot uses Natural Language Queries. Natural Language Queries enables Democratized Data Access. Democratized Data Access leads to Fast Answers solves uses enables leads to Data AccessChallenge traditional dataaccess requiresdeep knowledge of… GitHub Qubot AI agent powered byGitHub Copilot fordata analytics Natural LanguageQueries allows employees toask exploratoryquestions in plain… Democratized DataAccess enables anyemployee to queryvast data warehouse Fast Answers receive answers tocomplex dataquestions within… From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai Data Access Challenge solves GitHub Qubot. GitHub Qubot uses Natural Language Queries. Natural Language Queries enables Democratized Data Access. GitHub Qubot includes Context Layer. GitHub Qubot includes Evaluation Framework. Democratized Data Access leads to Fast Answers. Fast Answers supports Ad-hoc Inquiries solves uses enables includes includes leads to supports Data Access Challenge traditional data access requires deepknowledge of models and query languages GitHub Qubot AI agent powered by GitHub Copilot fordata analytics Natural Language Queries allows employees to ask exploratoryquestions in plain English Democratized Data Access enables any employee to query vast datawarehouse Context Layer provides relevant information to the AIfor better answers Evaluation Framework ensures accuracy and reliability ofAI-generated answers Fast Answers receive answers to complex data questionswithin seconds Ad-hoc Inquiries designed for exploratory questions, notreplacing dashboards From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai Data Access Challenge solves GitHub Qubot. GitHub Qubot uses Natural Language Queries. Natural Language Queries enables Democratized Data Access. GitHub Qubot includes Context Layer. GitHub Qubot includes Evaluation Framework. Democratized Data Access leads to Fast Answers. Fast Answers supports Ad-hoc Inquiries solves uses enables includes includes leads to supports Data AccessChallenge traditional dataaccess requiresdeep knowledge of… GitHub Qubot AI agent powered byGitHub Copilot fordata analytics Natural LanguageQueries allows employees toask exploratoryquestions in plain… Democratized DataAccess enables anyemployee to queryvast data warehouse Context Layer provides relevantinformation to theAI for better… EvaluationFramework ensures accuracyand reliability ofAI-generated… Fast Answers receive answers tocomplex dataquestions within… Ad-hoc Inquiries designed forexploratoryquestions, not… From startuphub.ai · The publishers behind this format

Traditionally, accessing and understanding product telemetry required deep knowledge of data models, query languages, and validation processes, often necessitating support from dedicated data analysts. Qubot bypasses these hurdles, enabling 'Hubbers' (GitHub employees) to ask exploratory questions and receive answers within seconds.

Qubot is designed for ad-hoc inquiries, not as a replacement for dashboards or reporting tools. Examples include questions like "Which user cohort shows the highest retention on this feature?" or "What product drove the most metric movement last week?"

How Qubot Works

The architecture comprises three core components: a user interface, a context layer, and a query engine.

User Interface

Qubot is accessible via Slack, Visual Studio Code, and the Copilot CLI. The Slack integration requires no setup and allows for collaborative refinement of queries directly within threads. Results are also saved as markdown reports in pull requests for easier reference and potential integration into dashboards.

For developers preferring a more integrated workflow, Qubot functions as a plugin within VS Code and the Copilot CLI, alongside other custom agents and tools.

Context Layer

GitHub's data warehouse is organized into bronze (raw events), silver (conformed facts/dimensions), and gold (curated datasets) layers. The context layer is federated and tailored to each data type.

For bronze data, product teams provide telemetry context, schema, and metadata. Silver data includes query examples, usage guidance, and mandatory filters maintained by the data and analytics team. Gold data features business rules and metric definitions from dataset owners. ETL pipelines systematically enrich this context with additional signals.

This context is loaded at runtime via the GitHub MCP Server, fetched from the context layer. A dedicated context agent continuously enriches this knowledge, primarily using markdown documentation stored across repositories, streamlining contributions through standardized templates or repository references.

Evaluation Framework

Every modification to the context layer or agent configuration undergoes rigorous evaluation. When new knowledge is added, a pull request initiates an offline framework that measures response accuracy, latency, and detects regressions before deployment.

The benchmarking framework includes curated test cases with known answers and ground-truth SQL, automated orchestration via the GitHub CLI for running multiple trials, and a reporting script for aggregating metrics like completion rate, accuracy, and duration.

Query Engine

Qubot connects to Kusto and Trino, GitHub's primary query engines, via an MCP server. A custom Trino MCP server implementation was developed, while a local version of the Fabric RTI MCP Server was deployed for Kusto.

Kusto excels at fast, exploratory queries on recent event data. Trino handles complex joins and historical analysis. Qubot defaults to Kusto but seamlessly switches to Trino when a query necessitates it, abstracting this complexity from the user.

Adoption and Learnings

Qubot has seen significant adoption, with hundreds of users running thousands of queries. This has dramatically reduced the volume of basic questions directed to data and analytics Slack channels, empowering employees to explore data more autonomously.

The tool also enables employees who previously hesitated to engage with the data warehouse to access crucial information for decision-making. The availability of Qubot across multiple interfaces like Slack, Copilot CLI, and VS Code caters to the technical proficiency of Hubbers while offering an accessible entry point.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.