How does a Morris approximate counter work?
If you want to count to a very big number,
then your standard 32-bit or 64-bit types
will limit you.
The ordinary solution is to use more memory:
switch 128 bits, or to a “bignum” type.
But what if you don’t have the space to store this number,
and you actually don’t care about the exact value of the counter,
only an approximation?
In this case,
you could use an “approximate counter”, invented by Robert Morris:
#include <math.h>
#include <stdio.h>
#include <stdint.h>
#include <limits.h>
#include <stdlib.h>
typedef uint8_t morris_t;
morris_t morris_new(uint64_t v) { return round(log2(v + 2)); }
uint64_t morris_estimate(morris_t c) { return (2 << c) - 2; }
morris_t morris_increment(morris_t c) { return (rand() % (2<<c)) ? c : c+1; }
Here’s an example of usage which re-creates the unix tool wc -c
,
counting the number of characters from stdin:
#include "morris.c"
#include <unistd.h>
int guard(int n, char* err) { if (n == -1) { perror(err); exit(1); } return n; }
int main(void) {
sranddev();
char buf[1024];
morris_t bytes_seen = morris_new(0);
for (;;) {
ssize_t bytes_read = guard(read(0, buf, sizeof(buf)), "could not read stdin");
if (bytes_read == 0) break;
while (bytes_read--) bytes_seen = morris_increment(bytes_seen);
}
printf(" %llu\n", morris_estimate(bytes_seen));
return 0;
}
$ cc wc.c
$ cat wc.c | wc -c
471
$ cat wc.c | ./a.out
254
$ cat wc.c | ./a.out
510
But how does the Morris counter work?
Forget counting for the moment, and concentrate on just storing large numbers.
The floating-point standard IEEE 754 shows how to do this:
it stores numbers in the form sig * 2 ^ exp.
The 32-bit float
type dedicates 8 bits to the exponent, exp.
A float
can represent much larger numbers than an int
;
the price it pays for this is accuracy.
Our Morris counter is very similar:
its 8 bits are just like the 8-bit exponent part of a float
.
Just like the float
, the Morris counter can represent very large numbers,
but it sacrifices accuracy.
To initialize a Morris counter with a value,
we find the exponent, using round(log2(v))
.
For example, instead of storing the exact value 32,
instead store the exponent 5, since 2 ^ 5 = 32.
To get the actual counter, calculate pow(2, 5)
, or 2 << 5
in C.
Then, to increment the counter,
do so probabilistically.
If the counter currently stores 5
,
this represents a range of 2 ^ 32 possible values.
The counter should only be incremented if it represents the maximum value in that range.
The probability of this is 1/32,
and so we increment with this probability.
Actually, the implementation I showed is slightly different:
the implementation has a mysterious + 2
in the initialization,
and a mysterious - 2
in the estimate function.
Where does this constant come from?
Honestly, I don’t know.
Here’s a comprehensive analysis by Flajolet.
(I learned about this at Redis Day London 2018,
in a talk that Elena Kolevska
gave about Redis cache eviction.
You can see the Redis implementation here.)
Similar posts
More by Jim
What does the dot do in JavaScript?
foo.bar
, foo.bar()
, or foo.bar = baz
- what do they mean? A deep dive into prototypical inheritance and getters/setters. 2020-11-01
Smear phishing: a new Android vulnerability
Trick Android to display an SMS as coming from any contact. Convincing phishing vuln, but still unpatched. 2020-08-06
A probabilistic pub quiz for nerds
A “true or false” quiz where you respond with your confidence level, and the optimal strategy is to report your true belief. 2020-04-26
Time is running out to catch COVID-19
Simulation shows it’s rational to deliberately infect yourself with COVID-19 early on to get treatment, but after healthcare capacity is exceeded, it’s better to avoid infection. Includes interactive parameters and visualizations. 2020-03-14
The inception bar: a new phishing method
A new phishing technique that displays a fake URL bar in Chrome for mobile. A key innovation is the “scroll jail” that traps the user in a fake browser. 2019-04-27
The hacker hype cycle
I got started with simple web development, but because enamored with increasingly esoteric programming concepts, leading to a “trough of hipster technologies” before returning to more productive work. 2019-03-23
Project C-43: the lost origins of asymmetric crypto
Bob invents asymmetric cryptography by playing loud white noise to obscure Alice’s message, which he can cancel out but an eavesdropper cannot. This idea, published in 1944 by Walter Koenig Jr., is the forgotten origin of asymmetric crypto. 2019-02-16
How Hacker News stays interesting
Hacker News buried my post on conspiracy theories in my family due to overheated discussion, not censorship. Moderation keeps the site focused on interesting technical content. 2019-01-26
My parents are Flat-Earthers
For decades, my parents have been working up to Flat-Earther beliefs. From Egyptology to Jehovah’s Witnesses to theories that human built the Moon billions of years in the future. Surprisingly, it doesn’t affect their successful lives very much. For me, it’s a fun family pastime. 2019-01-20
The dots do matter: how to scam a Gmail user
Gmail’s “dots don’t matter” feature lets scammers create an account on, say, Netflix, with your email address but different dots. Results in convincing phishing emails. 2018-04-07
The sorry state of OpenSSL usability
OpenSSL’s inadequate documentation, confusing key formats, and deprecated interfaces make it difficult to use, despite its importance. 2017-12-02
I hate telephones
I hate telephones. Some rational reasons: lack of authentication, no spam filtering, forced synchronous communication. But also just a visceral fear. 2017-11-08
The Three Ts of Time, Thought and Typing: measuring cost on the web
Businesses often tout “free” services, but the real costs come in terms of time, thought, and typing required from users. Reducing these “Three Ts” is key to improving sign-up flows and increasing conversions. 2017-10-26
Granddad died today
Granddad died. The unspoken practice of death-by-dehydration in the NHS. The Liverpool Care Pathway. Assisted dying in the UK. The importance of planning in end-of-life care. 2017-05-19
How do I call a program in C, setting up standard pipes?
A C function to create a new process, set up its standard input/output/error pipes, and return a struct containing the process ID and pipe file descriptors. 2017-02-17
Your syntax highlighter is wrong
Syntax highlighters make value judgments about code. Most highlighters judge that comments are cruft, and try to hide them. Most diff viewers judge that code deletions are bad. 2014-05-11
Want to build a fantastic product using LLMs? I work at
Granola where we're building the future IDE for knowledge work. Come and work with us!
Read more or
get in touch! This page copyright James Fisher 2018. Content is not associated with my employer. Found an error? Edit this page.