Artificial Intelligence

Preferred on Google

Abed Matini on Hybrid RAG, SQL, and UI Telemetry

Abed Matini from Ogilvy discusses hybrid RAG, SQL, and UI telemetry for efficient document chatbots, highlighting challenges and solutions.

Jun 28 at 10:03 PM8 min read

Presentation slide titled "Bypassing the Multimodal Tax: Framework-Free Hybrid RAG, Raw SQL RRF, and Live UI Telemetry" with speaker Abed Matini. — Abed Matini presenting on advanced RAG techniques.· AI Engineer

Abed Matini, Senior Backend Developer at Ogilvy, recently presented on "Bypassing the Multimodal Tax: Hybrid RAG, SQL RRF & UI Telemetry" at AI Engineer World's Fair 2026. Matini's talk explored the challenges and solutions for building robust and efficient Retrieval-Augmented Generation (RAG) systems, particularly those dealing with multimodal data and complex user queries.

Abed Matini on Hybrid RAG, SQL, and UI Telemetry - AI Engineer — Abed Matini on Hybrid RAG, SQL, and UI Telemetry — from AI Engineer

Visual TL;DR. Multimodal Tax leads to Document Chatbot Problems. Document Chatbot Problems addressing Hybrid RAG Solutions. Hybrid RAG Solutions using Parse First, Prompt Later. Parse First, Prompt Later involves Database Schema Strategies. Hybrid RAG Solutions integrates UI Telemetry. Hybrid RAG Solutions enables Efficient Chatbots. Ogilvy Stack features Hybrid RAG Solutions.

Related startups

Multimodal Tax: cost and complexity of integrating diverse data formats
Document Chatbot Problems: high cost of reading docs, scattered search capabilities
Hybrid RAG Solutions: combining RAG with SQL for efficient retrieval
Parse First, Prompt Later: optimizing data ingest for better performance
Database Schema Strategies: structuring data for effective search
UI Telemetry: monitoring user interactions for system improvement
Efficient Chatbots: robust and performant document chatbots
Ogilvy Stack: framework for building advanced RAG systems

Visual TL;DRQuickExplainDeeper

Understanding the Multimodal Tax

Matini began by defining the "multimodal tax" as the inherent cost and complexity associated with processing and integrating various data formats, such as text, images, and structured data, within a single RAG system. He highlighted two primary problems that document chatbots face: the cost of reading documents multiple times and the scattering of search capabilities across too many tools.

He elaborated on the cost factor, explaining that cloud vision APIs can charge between 500 to 1000 tokens per page just to convert a PDF into text. Before a user even asks a question, this process alone can incur significant costs, especially with large documents. For instance, a 200-page manual could cost up to 100,000 tokens, with tables often proving to be particularly challenging and costly to process accurately.

The "Search Split" Problem and Hybrid RAG Solutions

The second major challenge Matini addressed was the "search split across too many tools." He noted that good answers often require a combination of semantic understanding (via embeddings) and exact keyword matching. This often leads to teams implementing a vector database for semantic search and a traditional keyword search engine separately, requiring complex wrapper code to combine the results. This separation can lead to suboptimal outcomes when results are wrong, necessitating extensive configuration file adjustments not directly related to the core query.

To combat these issues, Matini proposed a framework-free hybrid RAG approach. He detailed his talk's focus on parsing documents locally, utilizing a single Postgres database, and employing hybrid search in plain Python. This approach aims to parse documents efficiently, extract meaningful chunks, and then leverage both keyword and semantic search capabilities within a unified system.

Optimizing Data Ingest: Parse First, Prompt Later

Matini discussed optimizing the data ingest pipeline with a "parse first, prompt later" strategy. He presented a comparison of "vision-first" versus "structural-first" approaches to data processing.

Vision-first: This approach involves using cloud vision APIs to process documents, often incurring high token costs per page. The quality risk here includes layout hallucinations and potential table breakage, leading to less accurate data extraction.
Structural-first: This method focuses on parsing the document's inherent structure, such as headings, paragraphs, and sentences, to create cleaner chunks and more cost-effective prompts. This approach also reduces the risk of hallucinations and improves the overall accuracy of the extracted information.

Matini emphasized that when dealing with large documents, especially those with complex layouts or tables, a structural-first approach is generally more efficient and reliable.

Database Schema and Search Strategies

He then presented a schema for a PostgreSQL database designed to handle both vectors and text. This schema includes tables for storing document chunks, their metadata, and embeddings. He specifically highlighted the use of PostgreSQL's `pgvector` extension for efficient vector similarity search, enabling both dense (semantic) and sparse (keyword) search capabilities.

Matini demonstrated how to create indices for both dense and sparse searches, explaining the importance of these indices for retrieving relevant information quickly. He also touched upon the use of `RRF` (Reciprocal Rank Fusion) for merging ranked lists from different search strategies, a technique that can significantly improve the relevance of the final results.

The Ogilvy Stack and Live Demos

The presentation showcased the tools used in their stack, including Python, Docker, Fast API, React, PostgreSQL, Ollama, and LangChain. Matini stressed the "deliberately boring on purpose" nature of their stack, highlighting a preference for mature, reliable technologies over cutting-edge, unproven ones. He also mentioned the absence of cloud vision for data ingestion, reinforcing the cost-efficiency of their local processing approach.

Matini concluded by demonstrating a live pipeline, showing how documents are parsed, chunked, embedded, and then queried. He walked through the process of uploading a document, extracting its text, and then using the RAG system to retrieve relevant information based on user queries. The live demo illustrated how the system can handle various document types and provide accurate, context-aware answers, showcasing the practical application of the discussed concepts.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#Abed Matini #Ogilvy #RAG #SQL #PostgreSQL #Python #LangChain #LLM #AI Engineer World's Fair

AI Daily Digest

Get the most important AI news daily.

+40k readers