LinkedIn has overhauled its Sales Navigator search system, cutting down the time it takes to process data and deliver fresh prospect insights. The engineering team detailed how they moved the core data manipulation pipeline from the older MapReduce framework to Apache Spark, implementing a series of targeted optimizations.
This initiative focused on the search system that underpins critical Sales Navigator features like Lead Search, Relationship Explorer, and Lead Recommendations. The goal was to accelerate the delivery of updated search results, a crucial factor for sales professionals making timely decisions.
The complex data manipulation pipeline, comprising over 100 individual Spark jobs, was a prime candidate for optimization. Engineers successfully reduced the total execution time from a lengthy 6-7 hours down to roughly three hours.
Under the Hood: Sales Navigator's Search Architecture
The search system operates in three tiers: offline, nearline, and serving. The offline component handles large-scale batch processing, transforming raw data into immutable base indexes using Spark heavily. A nearline component captures real-time updates, building a live index that's periodically flushed to disk.