How does a Morris approximate counter work?

If you want to count to a very big number, then your standard 32-bit or 64-bit types will limit you. The ordinary solution is to use more memory: switch 128 bits, or to a “bignum” type. But what if you don’t have the space to store this number, and you actually don’t care about the exact value of the counter, only an approximation? In this case, you could use an “approximate counter”, invented by Robert Morris:

#include <math.h>
#include <stdio.h>
#include <stdint.h>
#include <limits.h>
#include <stdlib.h>

typedef uint8_t morris_t;
morris_t morris_new(uint64_t v) { return round(log2(v + 2)); }
uint64_t morris_estimate(morris_t c) { return (2 << c) - 2; }
morris_t morris_increment(morris_t c) { return (rand() % (2<<c)) ? c : c+1; }

Here’s an example of usage which re-creates the unix tool wc -c, counting the number of characters from stdin:

#include "morris.c"
#include <unistd.h>
int guard(int n, char* err) { if (n == -1) { perror(err); exit(1); } return n; }
int main(void) {
  sranddev();
  char buf[1024];
  morris_t bytes_seen = morris_new(0);
  for (;;) {
    ssize_t bytes_read = guard(read(0, buf, sizeof(buf)), "could not read stdin");
    if (bytes_read == 0) break;
    while (bytes_read--) bytes_seen = morris_increment(bytes_seen);
  }
  printf("     %llu\n", morris_estimate(bytes_seen));
  return 0;
}

$ cc wc.c
$ cat wc.c | wc -c    # first, the exact answer ...
     471
$ cat wc.c | ./a.out  # the estimated answer - a bit off, but in the ballpark
     254
$ cat wc.c | ./a.out  # try it again - a different answer, also in the ballpark
     510

But how does the Morris counter work? Forget counting for the moment, and concentrate on just storing large numbers. The floating-point standard IEEE 754 shows how to do this: it stores numbers in the form sig * 2 ^ exp. The 32-bit float type dedicates 8 bits to the exponent, exp. A float can represent much larger numbers than an int; the price it pays for this is accuracy. Our Morris counter is very similar: its 8 bits are just like the 8-bit exponent part of a float. Just like the float, the Morris counter can represent very large numbers, but it sacrifices accuracy.

To initialize a Morris counter with a value, we find the exponent, using round(log2(v)). For example, instead of storing the exact value 32, instead store the exponent 5, since 2 ^ 5 = 32. To get the actual counter, calculate pow(2, 5), or 2 << 5 in C.

Then, to increment the counter, do so probabilistically. If the counter currently stores 5, this represents a range of 2 ^ 32 possible values. The counter should only be incremented if it represents the maximum value in that range. The probability of this is 1/32, and so we increment with this probability.

Actually, the implementation I showed is slightly different: the implementation has a mysterious + 2 in the initialization, and a mysterious - 2 in the estimate function. Where does this constant come from? Honestly, I don’t know. Here’s a comprehensive analysis by Flajolet.

(I learned about this at Redis Day London 2018, in a talk that Elena Kolevska gave about Redis cache eviction. You can see the Redis implementation here.)

Tagged #c, #programming.

More by Jim

What does the dot do in JavaScript?

foo.bar, foo.bar(), or foo.bar = baz - what do they mean? A deep dive into prototypical inheritance and getters/setters. 2020-11-01

Smear phishing: a new Android vulnerability

Trick Android to display an SMS as coming from any contact. Convincing phishing vuln, but still unpatched. 2020-08-06

A probabilistic pub quiz for nerds

A “true or false” quiz where you respond with your confidence level, and the optimal strategy is to report your true belief. 2020-04-26

Time is running out to catch COVID-19

Simulation shows it’s rational to deliberately infect yourself with COVID-19 early on to get treatment, but after healthcare capacity is exceeded, it’s better to avoid infection. Includes interactive parameters and visualizations. 2020-03-14

The inception bar: a new phishing method

A new phishing technique that displays a fake URL bar in Chrome for mobile. A key innovation is the “scroll jail” that traps the user in a fake browser. 2019-04-27

The hacker hype cycle

I got started with simple web development, but because enamored with increasingly esoteric programming concepts, leading to a “trough of hipster technologies” before returning to more productive work. 2019-03-23

Project C-43: the lost origins of asymmetric crypto

Bob invents asymmetric cryptography by playing loud white noise to obscure Alice’s message, which he can cancel out but an eavesdropper cannot. This idea, published in 1944 by Walter Koenig Jr., is the forgotten origin of asymmetric crypto. 2019-02-16

How Hacker News stays interesting

Hacker News buried my post on conspiracy theories in my family due to overheated discussion, not censorship. Moderation keeps the site focused on interesting technical content. 2019-01-26

My parents are Flat-Earthers

For decades, my parents have been working up to Flat-Earther beliefs. From Egyptology to Jehovah’s Witnesses to theories that human built the Moon billions of years in the future. Surprisingly, it doesn’t affect their successful lives very much. For me, it’s a fun family pastime. 2019-01-20

The dots do matter: how to scam a Gmail user

Gmail’s “dots don’t matter” feature lets scammers create an account on, say, Netflix, with your email address but different dots. Results in convincing phishing emails. 2018-04-07

The sorry state of OpenSSL usability

OpenSSL’s inadequate documentation, confusing key formats, and deprecated interfaces make it difficult to use, despite its importance. 2017-12-02

I hate telephones

I hate telephones. Some rational reasons: lack of authentication, no spam filtering, forced synchronous communication. But also just a visceral fear. 2017-11-08

The Three Ts of Time, Thought and Typing: measuring cost on the web

Businesses often tout “free” services, but the real costs come in terms of time, thought, and typing required from users. Reducing these “Three Ts” is key to improving sign-up flows and increasing conversions. 2017-10-26

Granddad died today

Granddad died. The unspoken practice of death-by-dehydration in the NHS. The Liverpool Care Pathway. Assisted dying in the UK. The importance of planning in end-of-life care. 2017-05-19

How do I call a program in C, setting up standard pipes?

A C function to create a new process, set up its standard input/output/error pipes, and return a struct containing the process ID and pipe file descriptors. 2017-02-17

Your syntax highlighter is wrong

Syntax highlighters make value judgments about code. Most highlighters judge that comments are cruft, and try to hide them. Most diff viewers judge that code deletions are bad. 2014-05-11

Want to build a fantastic product using LLMs? I work at Granola where we're building the future IDE for knowledge work. Come and work with us! Read more or get in touch!

This page copyright James Fisher 2018. Content is not associated with my employer. Found an error? Edit this page.

How does a Morris approximate counter work?

Similar posts

More by Jim