Learn more about Russian war crimes in Ukraine.

Run-length encoding in C

Run-length encoding is a “compression” scheme which works well on inputs with lots of consecutive repeated characters, e.g. aaaabbbaaaaaaaa. We run-length encode this string as (a,3), (b,3), (a,8). This is decoded by taking each pair (c,n) and outputting c n times. To run-length encode a bytestring, we can represent each pair with two bytes, cn. This restricts us to a max of 255 bytes repeated per pair; if we reach the max, we can use another pair to continue. The following C program implements this. ./rle encode encodes its stdin on stdout; and ./rle decode does the inverse. An example of usage:

$ cc -o rle rle.c
$ echo "Hello world..." | ./rle encode | hexdump -C
00000000  48 01 65 01 6c 02 6f 01  20 01 77 01 6f 01 72 01  |H.e.l.o. .w.o.r.|
00000010  6c 01 64 01 2e 03 0a 01                           |l.d.....|
$ echo "Hello world..." | ./rle encode | ./rle decode
Hello world...

Here’s the program:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <unistd.h>
int guard(char* err, int x) { if (x == -1) { perror(err); exit(1); } return x; }
void put(uint8_t c) { guard("could not write byte", write(1, &c, 1)); }
int get(uint8_t* buf) { return guard("could not read byte", read(0, buf, 1)); }
void encode(void) {
  uint8_t c;
  if (get(&c) == 0) return;
  uint8_t n = 1;
  uint8_t new_c;
  while (get(&new_c) != 0) {
    if (new_c == c) {
      if (n == UINT8_MAX) { put(n); n = 1; put(c); }
      else { n++; }
    } else {
      put(n); put(new_c);
      c = new_c; n = 1;
void decode(void) {
  uint8_t c; uint8_t n;
  while (get(&c) != 0) {
    for (int i = 0; i < n; i++) put(c);
void usage(char* pname) { fprintf(stderr, "Usage: %s (encode|decode)\n", pname); exit(1); }
int main(int argc, char* argv[]) {
  if (argc < 2) usage(argv[0]);
  else if (strcmp(argv[1], "encode") == 0) encode();
  else if (strcmp(argv[1], "decode") == 0) decode();
  else usage(argv[0]);
  return 0;

What can computers do? What are the limits of mathematics? And just how busy can a busy beaver be? This year, I’m writing Busy Beavers, a unique interactive book on computability theory. You and I will take a practical and modern approach to answering these questions — or at least learning why some questions are unanswerable!

It’s only $19, and you can get 50% off if you find the discount code ... Not quite. Hackers use the console!

After months of secret toil, I and Andrew Carr released Everyday Data Science, a unique interactive online course! You’ll make the perfect glass of lemonade using Thompson sampling. You’ll lose weight with differential equations. And you might just qualify for the Olympics with a bit of statistics!

It’s $29, but you can get 50% off if you find the discount code ... Not quite. Hackers use the console!

More by Jim

Tagged . All content copyright James Fisher 2018. This post is not associated with my employer. Found an error? Edit this page.