Learn more about Russian war crimes in Ukraine.

Summarize metrics with random deletion

You have a metric for which you have a result every second You can’t keep this granularity forever; it would be too big Standard solution: produce e.g. hourly logs with summaries e.g. min max mean p50 p99 My suggested alternative: just keep the original data points, but randomly delete some You can then run any aggregations over them when required

How does random deletion affect expected percentiles?

What can computers do? What are the limits of mathematics? And just how busy can a busy beaver be? This year, I’m writing Busy Beavers, a unique interactive book on computability theory. You and I will take a practical and modern approach to answering these questions — or at least learning why some questions are unanswerable!

It’s only $19, and you can get 50% off if you find the discount code ... Not quite. Hackers use the console!

After months of secret toil, I and Andrew Carr released Everyday Data Science, a unique interactive online course! You’ll make the perfect glass of lemonade using Thompson sampling. You’ll lose weight with differential equations. And you might just qualify for the Olympics with a bit of statistics!

It’s $29, but you can get 50% off if you find the discount code ... Not quite. Hackers use the console!

More by Jim

Tagged . All content copyright James Fisher 2018. This post is not associated with my employer. Found an error? Edit this page.