System Design — The Google Cluster Architecture

3 min readOct 15, 2023

This post is a reading note of paper Web Search for a Planet: The Google Cluster Architecture. Most of the content are copied directly from the paper.

Introduction

The work done by search engines is computationally intensive. Fortunately, search engines can benefit from request-level parallelism. The application is essentially stateless thus requires little synchronization. This allows the system to process requests independently by allocating the work to different servers.

The paper highlighted two basic insights:

Provide reliability in software rather than in server-class hardware.
Tailor the design for best aggregate request throughput, not peak server response time, since we can manage response times by parallelizing individual requests.

The following general strategy is used in many places in the architecture. This strategy is well-known now.

randomly distributing data/documents into smaller shards
having multiple server replicas responsible for handling each shard
routing requests through a load balancer
providing reliability at software level, by replicating services across many different machines and automatically detecting and handling failures

Serving a Google Query

When a user enters a query to Google, the user’s browser first performs a domain name system (DNS) lookup to map www.google.com to a particular IP address.

In this step, a DNS-based load-balancing system selects a cluster by accounting for the user’s geographic proximity to each physical cluster.

2. The user’s browser then sends a HTTP request to one of the clusters, and thereafter, the processing of that query is entirely local to that cluster.

Query execution consists of two major phases:

The index servers consult an inverted index that maps each query word to a matching list of documents. The index servers then determine a set of relevant documents by intersecting the hit lists of the individual query words, and they compute a relevance score for each document. This relevance score determines the order of results on the output page.
The first step produces an ordered list of document identifiers. The second phase involves taking this list of doc Ids and computing the actual title and uniform resource locator of these documents, along with a query specific document summary. Document servers handle this job, fetching each document from disk to extract the title and the keyword-in-context snippet.

When all phases are complete, a Google web server generates the appropriate HTML for the output page and returns it to the user’s browser.

This process also follows a general pattern: split, process and combine.

Overall Architecture

There are three main components in this architecture

The main activity in the index server consists of

decoding compressed information in the inverted index, and
finding matches against a set of documents that could satisfy a query.

The main activity in the document servers is fetching documents from disk.

Note that index servers typically have less disk space than document servers because the former have a more CPU-intensive workload.

Index Related Architecture

The overall index is partitioned so that a single query can use multiple processors. Index is divided into multiple shards and each shard has a pool of machines as the support. A request will be routed to one of the machines in the pool.

Originally published at https://ryansblog.xyz.

System Design — The Google Cluster Architecture

Introduction

Serving a Google Query

Overall Architecture

Written by Ryan