GitHub Enterprise Server HA Search Overhaul

GitHub revamps GitHub Enterprise Server's search for high availability using Elasticsearch's Cross Cluster Replication, simplifying management and boosting resilience.

2 min read
Diagram illustrating GitHub Enterprise Server's new high availability search architecture.
Image credit: Github Blog

GitHub has fundamentally rebuilt the search architecture underpinning GitHub Enterprise Server (GHES) to achieve robust high availability. This overhaul addresses long-standing complexities related to search index maintenance and upgrade stability, aiming to free up administrator time and enhance user experience. The changes are detailed in a recent post on the GitHub Blog.

Search is a critical component of GitHub, powering everything from repository exploration to issue tracking and release pages. Historically, maintaining search indexes in GHES, especially in high availability (HA) configurations, presented significant challenges. Administrators often faced issues with index corruption or lock-ups during maintenance and upgrades if specific procedures weren't followed meticulously.

Previous HA setups for GHES relied on a leader/follower pattern for Elasticsearch, the search engine of choice. This architecture, while functional, led to complexities when replicating data across primary and replica nodes. Elasticsearch's limitations in supporting a true primary/replica model for its clusters meant GitHub engineers had to engineer workarounds, which eventually introduced their own set of problems, including potential system lock-downs.

The core of the solution lies in adopting Elasticsearch's Cross Cluster Replication (CCR) feature. This allows GHES to operate multiple, independent single-node Elasticsearch clusters. CCR then natively handles the replication of durably persisted data from these nodes. This shift eliminates the problematic clustering across primary and replica servers, ensuring that critical data is never stranded on read-only nodes.

Implementing CCR required custom engineering to manage the index lifecycle, including failover, deletion, and upgrades, as Elasticsearch's built-in auto-follow API only applies to newly created indexes. GitHub developed specific workflows to bootstrap CCR for existing indexes and set up auto-follow for future data, ensuring a seamless transition.

To adopt this new HA search mode, organizations should contact GitHub support to obtain the necessary license. After applying the license and configuring GHES with <code>ghe-config app.elasticsearch.ccr true, a cluster configuration apply or upgrade to version 3.19.1 will migrate the installation to the new replication method. This process consolidates data, breaks cross-node clustering, and initiates CCR, though the migration time will vary based on instance size.