An overview of the pipeline powering LinkedIn's semantic search.· LinkedIn Engineering

LinkedIn's AI Search Upgrade

LinkedIn is leveraging LLMs for semantic search, transforming how users find jobs and people by understanding intent over keywords.

May 22 at 1:23 AM9 min read

LinkedIn is overhauling its search infrastructure with large language models (LLMs) to deliver a more intuitive and personalized experience. This shift, detailed on the LinkedIn Engineering blog, aims to move beyond simple keyword matching towards understanding user intent through natural language processing.

Visual TL;DR. Keyword Search Limitations leads to LLM Semantic Search. LLM Semantic Search uses Embedding-Based Retrieval. LLM Semantic Search uses LLM Judges for Relevance. Embedding-Based Retrieval enables Improved Query Understanding. LLM Judges for Relevance enhances Improved Query Understanding. Improved Query Understanding enables Intuitive Job/People Search. Intuitive Job/People Search results in Personalized Results.

Keyword Search Limitations: traditional keyword matching struggles to understand user intent
LLM Semantic Search: leveraging large language models for deeper understanding of queries
Embedding-Based Retrieval: representing queries and content in vector space for similarity
LLM Judges for Relevance: using LLMs to evaluate and rank search result quality
Improved Query Understanding: interpreting natural language to infer user goals and preferences
Intuitive Job/People Search: users find jobs and people more effectively
Personalized Results: search results better align with career ambitions

Visual TL;DRQuickExplainDeeper

The company has introduced AI Job Search and AI-powered People Search, features that interpret queries semantically. Instead of relying on exact word matches, these tools infer user goals and preferences, overcoming vocabulary gaps to better align search results with how professionals articulate their career ambitions.

This significant upgrade to LinkedIn's search tech stack utilizes LLMs to create a semantic search experience. It allows for more flexible and accurate retrieval by interpreting natural language to infer user goals and preferences.

Semantic Search Infrastructure at Scale

At its core, LinkedIn's semantic search employs a multi-stage process. User queries are first processed by a query understanding module, which generates embeddings. These embeddings are then used for embedding-based retrieval (EBR) on GPUs to identify a broad set of candidate documents.

A subsequent ranking stage refines these candidates using a Cross-Encoder Small Language Model (SLM). This model, running on SGLang, combines query, job, and member features to score relevance and engagement.

To maintain efficiency at scale, the ranking pipeline incorporates score caching, a ranking-depth controller, and traffic shaping. These optimizations aim to enhance latency and result quality for millions of real-time queries.

The features and job representations fed into the SLM are generated via a hybrid inference pipeline, combining large-scale offline processing with a low-latency nearline system. Embeddings and summaries are stored for on-demand retrieval.

An auction layer then balances user relevance, engagement, and business metrics to ensure optimal results.

Measuring Relevance with LLM Judges

Ensuring search quality is paramount. LinkedIn is using LLM judges to measure relevance at an unprecedented scale, far exceeding manual evaluation capabilities.

These judges are aligned with product managers through iterative feedback, grading millions of query-document pairs daily. They also generate labeled data essential for training retrieval and ranking systems.

The development of these LLM judges begins with clear product policies and high-quality "golden" grades from product managers. These grades serve as ground truth, refined through regular calibration sessions among product managers to ensure consistency.

To build comprehensive datasets, queries are categorized, and stratified samples of query-document pairs are graded by product managers. This meticulous process ensures the LLM judges accurately reflect desired search outcomes.

While state-of-the-art LLMs provide high-quality judgments, their throughput is insufficient for LinkedIn's needs. To scale, these large models are distilled into smaller, 8B-parameter evaluator LLMs. Through supervised fine-tuning, these distilled models achieve massive efficiency gains while maintaining high agreement with human judgment, verified via Kappa scores.

This scalable LLM judge enables continuous relevance measurement of the search system, evaluating experiments and distilling student ranking and retrieval models. This workflow provides continuous monitoring of system relevance and supports the evaluation of A/B tests for ranking and retrieval subsystems, vital for optimizing the LLM search relevance measurement.

Embedding-Based Retrieval

The retrieval stage identifies a broad set of potential results efficiently. LinkedIn's system is built on GPU-enabled embedding-based retrieval (EBR).

An open-source LLM embedding model was fine-tuned to encode queries and jobs into dense vectors. Training utilized millions of real query-job pairs, with relevance labels provided by the LLM judges.

This EBR model demonstrates a practical path for deploying LLM components in high-scale, real-time search systems, enabling a more intuitive AI-powered search technology.

The model uses a dual-tower architecture, projecting queries and jobs into a shared semantic space. Training employs a combination of contrastive InfoNCE loss and margin-based ranking loss, enhanced with hard positives and negatives mined from LLM-judged data.

Multiple evaluation pipelines, including counterfactual log analysis and offline KNN simulations, are used to assess the model's performance before integration into the live serving stack.

Query Understanding and Ranking

A unified LLM-based understanding layer interprets user intent from free text queries, converting it into structured signals for both job and people search.

Fine-tuned models, ranging from 1.5B to 4B parameters, meet LinkedIn's latency requirements while delivering high-precision outputs. This layer replaces multiple previous components with a single, robust model.

An intelligent routing layer classifies query types and performs safety checks, directing queries to either LLM-powered semantic interpretation or efficient keyword retrieval.

The ranking module uses a Small Language Model (SLM) to estimate the relevance of retrieved jobs to a user's query. For job search, this involves structured job attributes; for people search, it uses member profile information.

Structured prompts guide the SLM to determine match relevance, producing logits that are further processed for final ranking.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#LLM #Semantic Search #LinkedIn #AI #Natural Language Processing #Machine Learning #Generative AI #Search Technology #Engineering