Run-length encoding in C

Question 1.5 of Cracking the Coding Interview:

Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2b1c5a3. If the “compressed” string would not become smaller than the original string, your method should return the original string. You can assume the string has only upper and lower case letters (a-z).

The question doesn’t name it, but this is run-length encoding. (The exception for longer strings is not traditional run-length encoding, however. It’s also a rather ugly API.)

Here’s a solution in C. An interesting property of the C implementation is that it needs to calculate the length of the returned string before it actually generates that string, in order to malloc a string of the appropriate size. The algorithm consists of two passes which look very similar, but where the second pass actually adds characters to the string, the first pass just increments a string length counter.

A handy API for this is snprintf(NULL, 0, ...), which returns the length of the string that sprintf would have printed if given a real string to print to.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>

int run_length(char* s) {
  char c = s[0];
  if (c == '\0') return 0;
  int length = 1;
  for(; s[length] == c; length++);
  return length;
}

int strlen_i(int i) {
  return snprintf(NULL, 0, "%i", i);
}

int rle_len(char* s) {
  int len = 0;
  int rl = run_length(s);
  while (rl > 0) {
    len += 1 + strlen_i(rl);
    s += rl;
    rl = run_length(s);
  }
  return len;
}

char* rle(char* s) {
  int old_len = strlen(s);
  int new_len = rle_len(s);

  if (old_len <= new_len) {
    return strdup(s); // provides free()able copy
  }

  char* out = malloc((new_len+1) * sizeof(char));

  int out_next = 0;
  int rl = run_length(s);
  while (rl > 0) {
    out_next += sprintf(&out[out_next], "%c%i", s[0], rl);
    s += rl;
    rl = run_length(s);
  }
  out[out_next] = '\0';

  return out;
}

void test_rle_len(char* input, int expected) {
  assert(rle_len(input) == expected);
}

void test_rle(char* input, char* expected) {
  char* output = rle(input);
  assert(strcmp(output, expected) == 0);
  free(output);
}

int main(int argc, char** argv) {
  test_rle_len("aabcccccaaa", 8);
  test_rle_len("aaaaaaaaaabbbbbbbbbbbbbbbbbbbb", 6);
  test_rle_len("", 0);
  test_rle("aabcccccaaa", "a2b1c5a3");
  test_rle("aaaaaaaaaabbbbbbbbbbbbbbbbbbbb", "a10b20");
  test_rle("", "");
  test_rle("abcd", "abcd");
  return 0;
}

This implementation is constant-time in the length of the input, which is optimal.

I just released Vidrio, a free app for macOS and Windows to make your screen-sharing awesomely holographic. Vidrio shows your webcam video on your screen, just like a mirror. Then you just share or record your screen with Zoom, QuickTime, or any other app. Vidrio makes your presentations effortlessly engaging, showing your gestures, gazes, and expressions. #1 on Product Hunt. Available for macOS and Windows.

With Vidrio

With generic competitor

More by Jim

Tagged #ctci, #programming, #c. All content copyright James Fisher 2020. This post is not associated with my employer. Found an error? Edit this page.