Go Stack Allocation Boosts Uber CPU Efficiency

Uber engineers optimized Go stack allocation, cutting CPU usage by up to 10% in critical services through static pre-allocation and disabling dynamic growth.

7 min read
Abstract visualization of data flow and CPU processes, representing Go stack allocation optimization.
Visualizing the impact of Go stack allocation tuning on CPU performance.· Uber Engineering

Uber is squeezing more performance out of its massive Go deployment. The ride-sharing giant has detailed how tweaking Go's stack allocation mechanisms slashed CPU usage by as much as 10% in key services, translating to significant operational cost reductions across millions of cores. This breakthrough, reported by Uber Engineering, underscores the critical role of granular optimizations for large-scale cloud infrastructure.

Visual TL;DR. Go Stack Expansion causes CPU Usage. CPU Usage impacts Uber's Go Deployment. Uber's Go Deployment applies Static Pre-allocation. Static Pre-allocation via Tuning Runtime. Tuning Runtime leads to CPU Efficiency Boost. CPU Efficiency Boost results in Cost Reduction. Uber's Go Deployment highlights Granular Optimizations.

  1. Go Stack Expansion: goroutines dynamically doubling stack size and copying data
  2. CPU Usage: repeated stack growth consumes significant CPU cycles at scale
  3. Uber's Go Deployment: massive scale where efficiency gains mean millions of dollars
  4. Static Pre-allocation: disabling dynamic stack growth for critical services
  5. Tuning Runtime: optimizing Go's stack allocation mechanisms for performance
  6. CPU Efficiency Boost: cutting CPU usage by up to 10% in key services
  7. Cost Reduction: translating to significant operational cost reductions across millions of cores
  8. Granular Optimizations: critical role for large-scale cloud infrastructure
Visual TL;DR
Visual TL;DR — startuphub.ai Go Stack Expansion causes CPU Usage. CPU Usage impacts Uber's Go Deployment. Uber's Go Deployment applies Static Pre-allocation. CPU Efficiency Boost results in Cost Reduction causes impacts applies results in Go Stack Expansion CPU Usage Uber's Go Deployment Static Pre-allocation CPU Efficiency Boost Cost Reduction From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Go Stack Expansion causes CPU Usage. CPU Usage impacts Uber's Go Deployment. Uber's Go Deployment applies Static Pre-allocation. CPU Efficiency Boost results in Cost Reduction causes impacts applies results in Go StackExpansion CPU Usage Uber's GoDeployment StaticPre-allocation CPU EfficiencyBoost Cost Reduction From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Go Stack Expansion causes CPU Usage. CPU Usage impacts Uber's Go Deployment. Uber's Go Deployment applies Static Pre-allocation. CPU Efficiency Boost results in Cost Reduction causes impacts applies results in Go Stack Expansion goroutines dynamically doubling stack sizeand copying data CPU Usage repeated stack growth consumes significantCPU cycles at scale Uber's Go Deployment massive scale where efficiency gains meanmillions of dollars Static Pre-allocation disabling dynamic stack growth forcritical services CPU Efficiency Boost cutting CPU usage by up to 10% in keyservices Cost Reduction translating to significant operationalcost reductions across millions of cores From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Go Stack Expansion causes CPU Usage. CPU Usage impacts Uber's Go Deployment. Uber's Go Deployment applies Static Pre-allocation. CPU Efficiency Boost results in Cost Reduction causes impacts applies results in Go StackExpansion goroutinesdynamicallydoubling stack size… CPU Usage repeated stackgrowth consumessignificant CPU… Uber's GoDeployment massive scale whereefficiency gainsmean millions of… StaticPre-allocation disabling dynamicstack growth forcritical services CPU EfficiencyBoost cutting CPU usageby up to 10% in keyservices Cost Reduction translating tosignificantoperational cost… From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Go Stack Expansion causes CPU Usage. CPU Usage impacts Uber's Go Deployment. Uber's Go Deployment applies Static Pre-allocation. Static Pre-allocation via Tuning Runtime. Tuning Runtime leads to CPU Efficiency Boost. CPU Efficiency Boost results in Cost Reduction. Uber's Go Deployment highlights Granular Optimizations causes impacts applies via leads to results in highlights Go Stack Expansion goroutines dynamically doubling stack sizeand copying data CPU Usage repeated stack growth consumes significantCPU cycles at scale Uber's Go Deployment massive scale where efficiency gains meanmillions of dollars Static Pre-allocation disabling dynamic stack growth forcritical services Tuning Runtime optimizing Go's stack allocationmechanisms for performance CPU Efficiency Boost cutting CPU usage by up to 10% in keyservices Cost Reduction translating to significant operationalcost reductions across millions of cores Granular Optimizations critical role for large-scale cloudinfrastructure From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Go Stack Expansion causes CPU Usage. CPU Usage impacts Uber's Go Deployment. Uber's Go Deployment applies Static Pre-allocation. Static Pre-allocation via Tuning Runtime. Tuning Runtime leads to CPU Efficiency Boost. CPU Efficiency Boost results in Cost Reduction. Uber's Go Deployment highlights Granular Optimizations causes impacts applies via leads to results in highlights Go StackExpansion goroutinesdynamicallydoubling stack size… CPU Usage repeated stackgrowth consumessignificant CPU… Uber's GoDeployment massive scale whereefficiency gainsmean millions of… StaticPre-allocation disabling dynamicstack growth forcritical services Tuning Runtime optimizing Go'sstack allocationmechanisms for… CPU EfficiencyBoost cutting CPU usageby up to 10% in keyservices Cost Reduction translating tosignificantoperational cost… GranularOptimizations critical role forlarge-scale cloudinfrastructure From startuphub.ai · The publishers behind this format

Go’s runtime, designed for efficiency, uses goroutines which have much smaller initial stack sizes (2KB) compared to OS threads (2MB). When a goroutine's stack fills up, Go dynamically doubles its size and copies the data—a process that consumes CPU cycles. At Uber's scale, where 1% efficiency gains equate to millions of dollars, minimizing this 'stack expansion' is paramount.

Related startups

The Stack Expansion Problem

While Go 1.19 introduced adaptive stack sizing to improve initial allocation, it wasn't enough for all workloads. Uber found that repeated stack growth remained a significant CPU drain in some services.

One service, in particular, showed nearly 10% of its CPU consumption attributed to stack growth, despite having ample memory available.

Tuning the Runtime

Uber explored two main paths: goroutine pooling and customizing the Go runtime. Goroutine pooling requires substantial code changes and introduces its own overhead. Instead, Uber opted to modify the Go runtime directly.

By disabling Go's adaptive stack sizing and stack shrinking features, and then pre-allocating goroutine stacks to a static, larger size, they aimed to eliminate the costly dynamic expansion process. This involved patching the Go source code to expose and control internal variables related to stack size.

The team developed an analysis tool that inspects Go binaries to determine the actual stack usage of functions. This allowed them to statically set optimal stack sizes, preventing runtime overrides and minimizing the performance impact.

Impact and Future

The results were dramatic. One service saw its stack growth CPU cost drop from nearly 10% to under 1% after increasing the stack size from the default 2KB to 32KB. Memory usage increased, but remained well within container limits.

Uber plans to expand this optimization across more services, focusing on those with high CPU usage from stack growth and relatively low memory footprints. This approach, while requiring internal runtime modifications, demonstrates the potential for significant gains by optimizing core language features for specific operational demands.

This effort highlights how even mature platforms like Go can benefit from deep, specialized tuning for extreme scale. This focus on CPU efficiency in Go services is reminiscent of other efforts, such as mimalloc: Microsoft's Speed Boost for Apps.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.