What is the Metropolis algorithm?

We could estimate Bob’s lunch tomorrow by counting the previous lunches you’ve seen him eating in the cafeteria:

Lunch	Count
Apple	2
Banana	3

Now we want to simulate Bob’s future lunches. Assume Bob randomly picks a lunch each morning, with the relative proportions you’ve observed. Then we can simulate Bob’s lunch with:

type Meal = "Apple" | "Banana";

function sample(): Meal {
  return Math.random() < 2 / 5 ? "Apple" : "Banana";
}

$\tfrac25$ of the samples will be Apple, and $\tfrac35$ will be Banana.

The Metropolis algorithm gives a different way to get samples with those correct proportions. It looks like this:

function nextMeal(currentMeal: Meal): Meal {
  if (currentMeal === "Apple") {
    return "Banana";
  } else {
    if (Math.random() < 2 / 3) {
      return "Apple";
    } else {
      return "Banana";
    }
  }
}

class Chain {
  currentMeal: Meal;

  constructor() {
    this.currentMeal = "Apple";
  }

  sample(): Meal {
    const current = this.currentMeal;
    this.currentMeal = nextMeal(current);
    return current;
  }
}

Yes: this algorithm is more complicated and performs worse! But the underlying technique can help us sample from more complex distributions. So let’s see what it’s doing here, and why it works at all.

Instead of each sample call being independent, the Metropolis algorithm initializes a Chain which maintains a “current meal”. Each call to sample sets the next meal.

Here’s one run of one chain:

const chain = new Chain();
for (let i = 0; i < 10; i++) {
  console.log(`Day ${i}: `, chain.sample());
}

Day 0:  Apple
Day 1:  Banana
Day 2:  Apple
Day 3:  Banana
Day 4:  Banana
Day 5:  Apple
Day 6:  Banana
Day 7:  Banana
Day 8:  Apple
Day 9:  Banana

Notice Bob’s first meal is always Apple. And if Bob’s current meal is Apple, the next meal is always Banana, so Bob’s second meal is always Banana. Here are two problems with the Metropolis algorithm:

The first few samples are not in the correct distribution.
Samples are dependent on the previous samples.

Let’s run $10{,}000$ chains, and then log the distribution of apples on each day:

const chains: Chain[] = [];
for (let i = 0; i < 10000; i++) chains.push(new Chain());

for (let i = 0; i < 10; i++) {
  const samples = chains.map((chain) => chain.sample());
  const numApples = samples.filter((sample) => sample === "Apple").length;
  console.log(`Num apples on day ${i}:`, numApples);
}

Num apples on day 0: 0
Num apples on day 1: 6673
Num apples on day 2: 2202
Num apples on day 3: 5186
Num apples on day 4: 3184
Num apples on day 5: 4563
Num apples on day 6: 3600
Num apples on day 7: 4255
Num apples on day 8: 3846
Num apples on day 9: 4041

By day $9$, approximately $4{,}000$ of the $10{,}000$ chains have apples. This is the correct proportion of $\tfrac25$. But the number bounces around before reaching this equilibrium.

This is called convergence. To analyze this more mathematically, we can instead simulate the distribution of chains for the current meal, and calculate the distribution for the next meal:

type MealDist = [number, number];
function nextMealDist(currentMealDist: MealDist): MealDist {
  const [numApples, numBananas] = currentMealDist;

  // Consider those eating apples today. What will they eat tomorrow?
  let a2a = numApples * 0; // None will eat apples.
  let a2b = numApples * 1; // All will eat bananas.

  // Then consider those eating bananas today. What will they eat tomorrow?
  let b2a = numBananas * (2 / 3); // Two-thirds will eat apples.
  let b2b = numBananas * (1 / 3); // The rest will eat bananas.

  const numApplesTomorrow = a2a + b2a;
  const numBananasTomorrow = a2b + b2b;

  return [numApplesTomorrow, numBananasTomorrow];
}

// To start, all 10,000 chains are eating apples.
let currentMealDist: MealDist = [10_000, 0];

for (let i = 0; i < 10; i++) {
  currentMealDist = nextMealDist(currentMealDist);
  console.log(`Num apples on day ${i}: ${currentMealDist[0].toFixed(3)}`);
}

Again, we see that by day $9$, around $\tfrac25$ of the chains are eating apples:

Num apples on day 0: 0.000
Num apples on day 1: 6666.667
Num apples on day 2: 2222.222
Num apples on day 3: 5185.185
Num apples on day 4: 3209.877
Num apples on day 5: 4526.749
Num apples on day 6: 3648.834
Num apples on day 7: 4234.111
Num apples on day 8: 3843.926
Num apples on day 9: 4104.049

Actually, we don’t need to start the distribution at [10_000, 0]. We can just start with [1, 0]. This is then a probability distribution, because the sum of the two numbers is $1$. Then the output is the probability distribution of the each meal:

let currentMealDist: MealDist = [1, 0];

for (let i = 0; i < 10; i++) {
  currentMealDist = nextMealDist(currentMealDist);
  console.log(
    `Probability of apples on day ${i}: ${currentMealDist[0].toFixed(3)}`,
  );
}

Probability of apples on day 0: 0.000
Probability of apples on day 1: 0.667
Probability of apples on day 2: 0.222
Probability of apples on day 3: 0.519
Probability of apples on day 4: 0.321
Probability of apples on day 5: 0.453
Probability of apples on day 6: 0.365
Probability of apples on day 7: 0.423
Probability of apples on day 8: 0.384
Probability of apples on day 9: 0.410

After 93 days, the probability of apples reaches a stable state, at least in 64-bit floating-point. And that stable state is the correct distribution: $\tfrac25$ apple and $\tfrac35$ banana.

Because the first samples are not in the correct distribution, it’s common to discard them. This is called burn-in.

The precise claim of the Metroplis algorithm is: the correct distribution is a stable distribution. To prove this, evaluate nextMealDist([2/5, 3/5]), and you’ll see that it’s [2/5, 3/5]. How many will eat Apple for the next meal? None of the $\tfrac25$ currently eating apples will eat apples tomorrow. Of the $\tfrac35$ currently eating bananas, $\tfrac23$ will eat apples tomorrow. for a total of $\tfrac35 \times \tfrac23 = \tfrac25$. And so the probability of $\tfrac25$ is maintained.

The example algorithm above was hard-coded to generate the stable state $\tfrac25$ and $\tfrac35$. But time passes, after which we’ve counted $3$ apple meals, and $6$ banana meals. Let’s update the algorithm to work with any counts:

// Our observed counts.
// We want to generate more meals in this proportion.
const A = 3; // Count of apples
const B = 6; // Count of bananas

function nextMeal(currentMeal: Meal): Meal {
  if (currentMeal === "Apple") {
    return "Banana";
  } else {
    if (Math.random() < A / B) {
      return "Apple";
    } else {
      return "Banana";
    }
  }
}

Let’s show that, for any counts $A$ and $B$, this converges to the correct probability distribution, $\tfrac{A}{A+B}$ and $\tfrac{B}{A+B}$.

\[ \begin{aligned} \texttt{numApples} &= \tfrac{A}{A+B} \\ \texttt{numBananas} &= \tfrac{B}{A+B} \\ \\ \texttt{a2a} &= 0 \\ \texttt{b2a} &= \texttt{numBananas} \times \tfrac{A}{B} \\ &= \tfrac{B}{A+B} \times \tfrac{A}{B} \\ &= \tfrac{BA}{(A+B)B} \\ &= \tfrac{A}{A+B} \\ \\ \texttt{numApplesTomorrow} &= \texttt{a2a} + \texttt{b2a} \\ &= 0 + \tfrac{A}{A+B} \\ &= \tfrac{A}{A+B} \\ \end{aligned} \]

So far, we’ve only observed Bob eating Apple or Banana. But then one day Bob’s in the cafeteria eating Chips! We need to handle more states. We can record our observed frequencies with a function f:

function f(meal: Meal): number {
  return {
    Apple: 3,
    Banana: 6,
    Chips: 1,
  }[meal];
}

The true Metropolis sample algorithm actually starts by proposing a new meal. Then it decides whether to change to that meal, or eat the current meal again. Here’s a proposal function that picks from possible meals with uniform probability:

function proposeMeal(): Meal {
  const meals: Meal[] = ["Apple", "Banana", "Chips"];
  const i = Math.floor(Math.random() * 3);
  return meals[i]!;
}

Then the true nextMeal function looks like:

function nextMeal(currentMeal: Meal): Meal {
  const proposedMeal = proposeMeal();

  const proposedMealFreq = f(proposedMeal);
  const currentMealFreq = f(currentMeal);

  // The key line!
  const transitionProb = Math.min(proposedMealFreq / currentMealFreq);

  if (Math.random() < transitionProb) {
    return proposedMeal;
  } else {
    return currentMeal;
  }
}

This works, but why does it work? The key point in the proof is that, for any two states $A$ and $B$ in the steady state, the probability mass transferred from $A$ to $B$ is the same as the probability mass transferred from $B$ to $A$.

Let’s prove that. If we’re in steady state, every state $S$ has mass proportional to $f(S)$. For simplicity, just say the mass at $S$ is $f(S)$. Without loss of generality, let’s assume $f(A) \leq f(B)$.

How much mass is transferred from state $A$ to $B$? With our uniform proposal function, $\tfrac1N^{th}$ of the mass at $A$ is proposed to move to $B$. The probability of accepting this proposal is $\text{min}(1,\tfrac{f(B)}{f(A)})$. Since $f(A) \leq f(B)$, this is $1$, i.e. the proposal is always accepted. So the mass moving from $A$ to $B$ is $\tfrac{f(A)}{N}$.

How much mass is transferred from state $B$ to $A$? Again, $\tfrac1N^{th}$ of the mass at $B$ is proposed to move to $A$. The probability of accepting this proposal is $\text{min}(1,\tfrac{f(A)}{f(B)})$. Since $f(A) \leq f(B)$, this is $\tfrac{f(A)}{f(B)}$. So the mass moving from $B$ to $A$ is $\tfrac{f(B)}{N} \times \tfrac{f(A)}{f(B)} = \tfrac{f(A)}{N}$.

The same amount of mass, $\tfrac{f(A)}{N}$, is transferred from $A$ to $B$ as from $B$ to $A$. This condition is called detailed balance, and it implies that we are in a steady state.

The proposeMeal function above just picks a meal uniformly at random. But we want to propose meals in proportion to their frequency. The only requirement is that the proposal function is symmetric: the probability of proposing $A$ from $B$ is the same as the probability of proposing $B$ from $A. (Try to prove that this results in detailed balance.)

So far, we’ve been using discrete states. But the Metropolis algorithm is most useful for continuous distributions. Here’s a weird distribution over the real numbers:

function sinFreq(x: number): number {
  if (0 < x && x < Math.PI * 2) {
    return Math.abs(Math.sin(x));
  }
  return 0;
}

We can sample from this distribution using the Metropolis algorithm in its full generality:

class Chain<State> {
  constructor(
    // Initial state
    private state: State,

    // Function to calculate the frequency of any state
    private f: (state: State) => number,

    // Function to propose a new state - must be symmetric
    private propose: (state: State) => State,

    // Number of initial samples to discard
    burnIn = 1000,
  ) {
    for (let i = 0; i < burnIn; i++) {
      this.sample();
    }
  }

  sample(): State {
    const current = this.state;
    const proposed = this.propose(current);
    const prob = Math.min(1, this.f(proposed) / this.f(current));
    const next = Math.random() < prob ? proposed : current;
    this.state = next;
    return current;
  }
}

function propose(currentState: number): number {
  return currentState + (Math.random() - 0.5);
}

const chain = new Chain(0.5, sinFreq, propose);

const numSamples = 1000000;

const buckets: Record<number, number> = {};
for (let i = 0; i < numSamples; i++) {
  const sample = chain.sample();
  const bucket = Math.floor(sample * 10);
  buckets[bucket] = (buckets[bucket] ?? 0) + 1;
}

for (const bucket in buckets) {
  buckets[bucket] /= numSamples;
}

for (const bucket in buckets) {
  const len = Math.floor(buckets[bucket]! * 1000);
  console.log("#".repeat(len));
}

Sure enough, here’s that weird lumpy distribution:

#
###
######
########
##########
#############
##############
################
##################
####################
######################
#######################
########################
########################
#########################
#########################
#########################
#########################
########################
#######################
######################
#####################
###################
##################
################
##############
############
#########
#######
####
##

##
#####
#######
#########
###########
##############
###############
#################
###################
#####################
######################
#######################
########################
########################
#########################
########################
########################
########################
#######################
######################
#####################
###################
#################
################
##############
############
##########
#######
#####
###

Tagged #metropolis-algorithm, #mcmc, #monte-carlo, #markov-chain, #probability, #statistics, #math, #programming.

More by Jim

What does the dot do in JavaScript?

foo.bar, foo.bar(), or foo.bar = baz - what do they mean? A deep dive into prototypical inheritance and getters/setters. 2020-11-01

Smear phishing: a new Android vulnerability

Trick Android to display an SMS as coming from any contact. Convincing phishing vuln, but still unpatched. 2020-08-06

A probabilistic pub quiz for nerds

A “true or false” quiz where you respond with your confidence level, and the optimal strategy is to report your true belief. 2020-04-26

Time is running out to catch COVID-19

Simulation shows it’s rational to deliberately infect yourself with COVID-19 early on to get treatment, but after healthcare capacity is exceeded, it’s better to avoid infection. Includes interactive parameters and visualizations. 2020-03-14

The inception bar: a new phishing method

A new phishing technique that displays a fake URL bar in Chrome for mobile. A key innovation is the “scroll jail” that traps the user in a fake browser. 2019-04-27

The hacker hype cycle

I got started with simple web development, but because enamored with increasingly esoteric programming concepts, leading to a “trough of hipster technologies” before returning to more productive work. 2019-03-23

Project C-43: the lost origins of asymmetric crypto

Bob invents asymmetric cryptography by playing loud white noise to obscure Alice’s message, which he can cancel out but an eavesdropper cannot. This idea, published in 1944 by Walter Koenig Jr., is the forgotten origin of asymmetric crypto. 2019-02-16

How Hacker News stays interesting

Hacker News buried my post on conspiracy theories in my family due to overheated discussion, not censorship. Moderation keeps the site focused on interesting technical content. 2019-01-26

My parents are Flat-Earthers

For decades, my parents have been working up to Flat-Earther beliefs. From Egyptology to Jehovah’s Witnesses to theories that human built the Moon billions of years in the future. Surprisingly, it doesn’t affect their successful lives very much. For me, it’s a fun family pastime. 2019-01-20

The dots do matter: how to scam a Gmail user

Gmail’s “dots don’t matter” feature lets scammers create an account on, say, Netflix, with your email address but different dots. Results in convincing phishing emails. 2018-04-07

The sorry state of OpenSSL usability

OpenSSL’s inadequate documentation, confusing key formats, and deprecated interfaces make it difficult to use, despite its importance. 2017-12-02

I hate telephones

I hate telephones. Some rational reasons: lack of authentication, no spam filtering, forced synchronous communication. But also just a visceral fear. 2017-11-08

The Three Ts of Time, Thought and Typing: measuring cost on the web

Businesses often tout “free” services, but the real costs come in terms of time, thought, and typing required from users. Reducing these “Three Ts” is key to improving sign-up flows and increasing conversions. 2017-10-26

Granddad died today

Granddad died. The unspoken practice of death-by-dehydration in the NHS. The Liverpool Care Pathway. Assisted dying in the UK. The importance of planning in end-of-life care. 2017-05-19

How do I call a program in C, setting up standard pipes?

A C function to create a new process, set up its standard input/output/error pipes, and return a struct containing the process ID and pipe file descriptors. 2017-02-17

Your syntax highlighter is wrong

Syntax highlighters make value judgments about code. Most highlighters judge that comments are cruft, and try to hide them. Most diff viewers judge that code deletions are bad. 2014-05-11

Want to build a fantastic product using LLMs? I work at Granola where we're building the future IDE for knowledge work. Come and work with us! Read more or get in touch!

This page copyright James Fisher 2024. Content is not associated with my employer. Found an error? Edit this page.

What is the Metropolis algorithm?

Similar posts

More by Jim