58 Million Keys: The HashMap Freeze

A HashMap resize at 58.7 million keys triggered a 15-second freeze on LinkedIn's Feed platform due to kernel lock contention. Pre-allocation fixed it.

May 22 at 1:22 AM10 min read

Diagram illustrating the cascade of events leading to a system freeze caused by HashMap resizing and kernel lock contention. — The technical chain reaction that led to a system-wide freeze.· LinkedIn Engineering

Visual TL;DR. Feed Availability Drops leads to FishDB Engine. FishDB Engine leads to Memory Spikes. Memory Spikes leads to Off-CPU Profiling. Off-CPU Profiling leads to HashMap Resize. HashMap Resize triggers Kernel Lock Contention. Kernel Lock Contention causes System Freeze. System Freeze fixed by Pre-allocation Fix.

Feed Availability Drops: intermittent availability drops on LinkedIn's Feed platform
FishDB Engine: Rust-based FishDB engine powering the Feed Retrieval platform
Memory Spikes: elusive issue initially linked to unexpected memory spikes
Off-CPU Profiling: building the trap using off-CPU profiling tools
HashMap Resize: a single HashMap resizing event at 58.7 million keys
Kernel Lock Contention: triggered cascade of kernel-level lock contentions
System Freeze: freezing the system's entire asynchronous runtime for 15 seconds
Pre-allocation Fix: pre-allocation of HashMap capacity resolved the issue

Visual TL;DRQuickExplainDeeper

LinkedIn's Feed, serving over a billion members, experienced intermittent availability drops due to a critical infrastructure component freezing for up to 15 seconds. The Feed Retrieval platform, powered by the Rust-based FishDB engine, saw entire shards breach their Service Level Objectives (SLOs) without clear logs or reproducible triggers.

The elusive issue, affecting different shards sporadically for brief periods, was eventually traced to a single HashMap resizing event. This resize, occurring at approximately 58.7 million keys, triggered a cascade of kernel-level lock contentions, ultimately freezing the system's entire asynchronous runtime. The fix, a single line of code, belied the complex investigation that uncovered the root cause. This incident highlights critical challenges in memory allocation at scale.

FishDB and the Feed's Foundation

FishDB, the storage and retrieval layer for LinkedIn's Feed, is built in Rust with jemalloc as its memory allocator and Tokio as its async runtime. It maintains several in-memory index structures for low-latency retrieval.

The document reference index, a HashMap mapping primary keys to document references, was central to the problem. At the time of the incident, this map held roughly 56, 59 million entries per shard, consuming about 1.75 GB of memory.

The Mystery: Elusive Availability Drops

FishDB experienced recurring 1-minute breaches of its availability SLO, characterized by brief, self-resolving outages with minimal digital footprint.

The outages were ephemeral, lasting only 10-15 seconds, making them impossible to catch with conventional monitoring. During these freezes, the application produced zero logs, and health checks went unanswered, creating the illusion of a complete process pause.

The sporadic nature and lack of discernible triggers like deployment changes or traffic spikes complicated the investigation.

The First Clue: Memory Spikes

Correlation analysis revealed a critical pattern: every availability drop coincided with a significant spike in Resident Set Size (RSS) memory. RSS would momentarily jump about 4 GB above baseline, then settle to a persistent ~2 GB increase.

This simultaneous spiking across all hosts in an affected shard ruled out individual hot queries or traffic issues, pointing to a systemic, data-driven problem.

Eliminating Possibilities

Before resorting to advanced profiling, common culprits were systematically ruled out.

CPU throttling was negligible, with ample container headroom confirmed via cgroup metrics.

Linux kernel memory defragmentation stalls were also ruled out, as allocstall counters in /proc/vmstat remained at zero.

The Envoy proxy sidecar was cleared; application-layer metrics from FishDB and transport-layer metrics from Envoy showed simultaneous dips, confirming the application itself was freezing.

Memory-mapped file I/O from RocksDB was also excluded, as mmap-based read/write operations were disabled in the configuration.

Building the Trap: Off-CPU Profiling

With conventional methods exhausted, deeper OS and runtime-level analysis was required.

Since application threads were frozen, not actively working, traditional CPU profiling was insufficient. Off-CPU profiling, which tracks what threads are waiting on, became necessary.

The challenge was the ephemeral nature of the freezes. A novel automated profiling script was developed to capture an off-CPU profile the instant a freeze was detected.

This script continuously pinged a FishDB endpoint. A timed-out health check triggered an eBPF-based tool (offcputime) to record kernel stack traces of blocked threads for the freeze's duration.

Deployed across numerous hosts, this setup successfully captured a profile during a live freeze, providing the crucial breakthrough.

The Eureka Moment: mmap_lock Contention

The captured off-CPU profile revealed three distinct kernel stack trace patterns.

Multiple threads were blocked acquiring a write lock in the `mmap` path, specifically within `rwsem_down_write_slowpath`. This indicated a large memory allocation requiring exclusive access to the process-wide `mmap_lock` (also known as `mmap_sem` or VMA semaphore).

Other threads were blocked in `rwsem_down_read_slowpath`, waiting to acquire the same lock in read mode. Call paths traced through `madvise`, used by jemalloc to purge unused memory pages, and `do_user_addr_fault`, which handles page faults during normal memory access.

The mechanism became clear: the `mmap_lock` protects virtual memory area (VMA) data structures. A large `mmap` allocation (triggered by jemalloc needing more memory than its free lists could provide) acquired the write lock. While this lock was held, all other memory operations, including jemalloc's purging via `madvise` and page fault handling by Tokio worker threads, were blocked because they required the lock in read mode.

With Tokio's worker threads blocked on this kernel lock, no asynchronous tasks could be processed, resulting in a complete application freeze.

The Root Cause: The Magic Number 58,720,256

The investigation zeroed in on the document reference index HashMap. Rust's standard HashMap doubles its capacity when it runs out of room.

The map grew by 2-3 million keys daily. When it surpassed 58,720,256 entries, it triggered a resize, doubling its capacity to 117,440,512 entries.

Each entry consumed ~32 bytes. The resize required allocating a new buffer (~3.5 GB) while keeping the old one (~1.75 GB) in memory temporarily. This dual-buffer state caused the observed ~4 GB momentary RSS spike.

The subsequent freeing of the old buffer led to the persistent ~2 GB RSS increase. This resize event occurred only once per host's lifecycle at this specific capacity boundary.

The DocRefIndexKeyCount metric confirmed this: each RSS spike and availability breach coincided with the HashMap reaching ~58.7 million keys.

Hosts within a shard, ingesting similar data, reached this threshold nearly simultaneously, explaining the cascading freeze pattern observed across the shard.

The Fix

The solution was deceptively simple: pre-allocate the HashMap with sufficient capacity at startup.

Using `HashMap::with_capacity(base_index_size * 3)` ensures the map has ample room from the outset, preventing the mid-operation resize at the critical 58,720,256-key boundary.

The trade-off was an acceptable ~3 GB increase in RSS at startup, a minor cost compared to periodic 15-second application freezes.

Post-deployment monitoring confirmed the fix: as shards crossed the 58.7 million key threshold, there were zero SLO impacts, no RSS spikes, and no availability breaches.

Conclusion

A single line of pre-allocation code resolved a cascading failure that spanned user-space data structures to the kernel's virtual memory subsystem and an async runtime.

This incident underscores the importance of anticipating memory allocation needs for large data structures, especially in high-memory services, and leveraging APIs like Rust's `HashMap::with_capacity()` to proactively manage capacity and avoid costly runtime resizes.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#HashMap #Rust #jemalloc #Tokio #eBPF #Linux Kernel #Scalability #Performance Engineering #System Design #LinkedIn