How does a distributed hash table work?

First, what is a hash table? Usually, a hash table is an implementation of a map. That is, a hash table maps keys to values. For example, a hash table h might store the map { "jim": "male", "alice": "female", "bob": "male" }. Maps provide an API, importantly including update(k,v) and lookup(k).

However, unlike a normal hash table, a distributed hash table does not really represent an arbitrary map. Instead, it represents a content-addressable set, meaning it provides this different API: insert(v), and lookup(h(v)). Notice that the API assumes a specific hash function h, such as SHA-256.

Unlike keys in a map, items in a content-addressable set are immutable; the ordering of insert operations does not matter. This greatly simplifies a distributed implementation, where operations usually have a partial order. Unlike mutable maps, the results of lookup can be verified: h(lookup(h(k))) == h(k). This too greatly simplifies a distributed implementation, where the structure must be shared with untrusted parties.

The hash table implements these operations efficiently by grouping keys into buckets with an underlying hash function. The hash table might have 8 buckets in memory, numbered 0 through 7, and a hash function that says hash("jim") = 3, hash("alice") = 3, and hash("bob") = 7. In other words, the hash function, applied to a key, tells us the key’s approximate location in memory. If the hash function gives a good estimate of the memory location, and direct memory lookups are fast, then lookups and insertions will be fast.

Notice I said “approximate” location, and “location estimate”. Hash functions have collisions, e.g. "jim" and "alice" both hash to bucket 3. There are many ways to deal with collisions. For example, open addressing says, “if bucket 3 is already full, try bucket 4, then 5, and so on”. Then when the table becomes “full”, we do an expensive hash table resize, moving every key to a new, bigger table.

Suppose our hash table must be accessible by many machines. Our hash table is in the memory of one machine, which must serve all those requests. At some point, we may not be able to buy a machine big enough to serve all requests, or to store all that data. And if this machine dies, the entire hash table dies with it. We need more machines! How do we put our hash table on those multiple machines?

One way is to clone our hash table, so that every machine has a copy. When a node wishes to look up a value, it can look in its local copy. When a machine wishes to insert a value at a key, it broadcasts this to all other nodes. This scheme provides redundancy, and distributes lookups over all copies: with n nodes, each node processes 1/n lookups. But all nodes still store all data, and all nodes still process all insert operations.

Worse, this scheme leads to inconsistencies, because different nodes can process updates in different orders. There are many ways to fix this, such as leader election and consensus algorithms. But DHTs fix this.

To fix this, they can elect a “leader” machine. To do an insert, it’s sent to the leader, and the leader then broadcasts the update to all other nodes. This fixes inconsistency, but re-introduces centralization. The leader node must be powerful enough to process all updates, and if the leader node dies, who will continue the work? A new leader must be “elected”. There are many ways to do this election, but election is a topic for another day, and it doesn’t fix our two big problems: reducing storage and operations per node.

To reduce storage per node, and number of operations per node, each node must store a smaller subset of the data.

For example, each key can be stored on one node, instead of on all nodes. This reduces each node’s storage to 1/n, and its number of processed inserts to 1/n. How do we determine a key’s owner? We use a hash function again, but the hash function, instead of giving us the number of a bucket in memory, gives us machine addresses. For example, our hash could give us hash("jim") = "234.127.23.56", hash("alice") = "244.32.167.165", and hash("bob") = "23.45.231.12". To look up the value for "jim", we don’t do a memory lookup at bucket 3; we instead send the lookup request to the address 234.127.23.56, which then does the memory lookup and responds. Under this scheme, with n nodes, each node stores 1/n keys, processes 1/n lookups, and processes 1/n inserts. Our hash table scales to the moon! But this brings new problems.

Our hash can’t really give us random IP addresses. Most of those addresses won’t be machines serving the hash table. Instead, let’s give each node the list of all nodes, e.g. nodes = ["234.127.23.56", "244.32.167.165", "23.45.231.12"]. Then our hash can index into this list, modulo the number of nodes, e.g. if hash("jim") = 72452 then we go to node nodes[72452 % 3] = "23.45.231.12".

By reducing storage per node, we have removed redundancy. There is a fundamental tension between increased redundancy and reduced number of operations per node. We must find a compromise. Let’s say instead of a key having a single owner, each key is owned by two nodes. For example, we could store key k on node hash(k) % num_nodes and on node (hash(k)+1) % num_nodes. The work required by each node is now num_nodes/n, but this is only a constant factor increase.

When an insert happens at key k, this update must be sent to node hash(k) % num_nodes and to node (hash(k)+1) % num_nodes. But now our inconsistency problem is back! These owners of k can receive updates in different orders. To fix this, we can fall back to our leader/slave relationship. Each key must have a defined leader to whom we send updates, and two slaves which are issued updates from the leader.

But remember that a leader/slave relationship requires a re-election algorithm. It’s difficult to think about how re-election works per-key.

But this scheme does not help reduce memory consumption per machine.

Which IP addresses are nodes in the DHT?
Can we trust responses from other nodes?

Notice that if a node stores a key, that node must process all updates for that key, so each key demands both memory and CPU.

Tagged #distributed-hash-table, #hash-table, #data-structures, #algorithms, #distributed-systems, #programming, #computer-science.

More by Jim

What does the dot do in JavaScript?

foo.bar, foo.bar(), or foo.bar = baz - what do they mean? A deep dive into prototypical inheritance and getters/setters. 2020-11-01

Smear phishing: a new Android vulnerability

Trick Android to display an SMS as coming from any contact. Convincing phishing vuln, but still unpatched. 2020-08-06

A probabilistic pub quiz for nerds

A “true or false” quiz where you respond with your confidence level, and the optimal strategy is to report your true belief. 2020-04-26

Time is running out to catch COVID-19

Simulation shows it’s rational to deliberately infect yourself with COVID-19 early on to get treatment, but after healthcare capacity is exceeded, it’s better to avoid infection. Includes interactive parameters and visualizations. 2020-03-14

The inception bar: a new phishing method

A new phishing technique that displays a fake URL bar in Chrome for mobile. A key innovation is the “scroll jail” that traps the user in a fake browser. 2019-04-27

The hacker hype cycle

I got started with simple web development, but because enamored with increasingly esoteric programming concepts, leading to a “trough of hipster technologies” before returning to more productive work. 2019-03-23

Project C-43: the lost origins of asymmetric crypto

Bob invents asymmetric cryptography by playing loud white noise to obscure Alice’s message, which he can cancel out but an eavesdropper cannot. This idea, published in 1944 by Walter Koenig Jr., is the forgotten origin of asymmetric crypto. 2019-02-16

How Hacker News stays interesting

Hacker News buried my post on conspiracy theories in my family due to overheated discussion, not censorship. Moderation keeps the site focused on interesting technical content. 2019-01-26

My parents are Flat-Earthers

For decades, my parents have been working up to Flat-Earther beliefs. From Egyptology to Jehovah’s Witnesses to theories that human built the Moon billions of years in the future. Surprisingly, it doesn’t affect their successful lives very much. For me, it’s a fun family pastime. 2019-01-20

The dots do matter: how to scam a Gmail user

Gmail’s “dots don’t matter” feature lets scammers create an account on, say, Netflix, with your email address but different dots. Results in convincing phishing emails. 2018-04-07

The sorry state of OpenSSL usability

OpenSSL’s inadequate documentation, confusing key formats, and deprecated interfaces make it difficult to use, despite its importance. 2017-12-02

I hate telephones

I hate telephones. Some rational reasons: lack of authentication, no spam filtering, forced synchronous communication. But also just a visceral fear. 2017-11-08

The Three Ts of Time, Thought and Typing: measuring cost on the web

Businesses often tout “free” services, but the real costs come in terms of time, thought, and typing required from users. Reducing these “Three Ts” is key to improving sign-up flows and increasing conversions. 2017-10-26

Granddad died today

Granddad died. The unspoken practice of death-by-dehydration in the NHS. The Liverpool Care Pathway. Assisted dying in the UK. The importance of planning in end-of-life care. 2017-05-19

How do I call a program in C, setting up standard pipes?

A C function to create a new process, set up its standard input/output/error pipes, and return a struct containing the process ID and pipe file descriptors. 2017-02-17

Your syntax highlighter is wrong

Syntax highlighters make value judgments about code. Most highlighters judge that comments are cruft, and try to hide them. Most diff viewers judge that code deletions are bad. 2014-05-11

Want to build a fantastic product using LLMs? I work at Granola where we're building the future IDE for knowledge work. Come and work with us! Read more or get in touch!

This page copyright James Fisher 2017. Content is not associated with my employer. Found an error? Edit this page.

How does a distributed hash table work?

Similar posts

More by Jim