How to percent-encode strings in-place

Question 1.4 of Cracking the Coding Interview:

Write a method to replace all spaces in a string with "%20". You may assume that the string has sufficient space at the end of the string to hold the additional characters, and that you are given the “true” length of the string.

The question doesn’t state it, but this is a simplification of percent-encoding, but where we only replace the space character.

The solution is trickier than it sounds, because the question is asking us to update the string in-place. The replacement string, "%20", is longer than the string to replace, " ". Iterating from the start of the string doesn’t work well: you either have to over-write the next characters in the string (which is just wrong), or you have to make space by shifting the remaining characters to the right (which is inefficient).

The solution is to start from the end of the string, where we’re told we have space. So in a first pass, we find the index that will be the new end of the string. Then in a second pass, we work from there backwards.

Here’s a solution in C.

#include <stdlib.h>
#include <string.h>
#include <assert.h>

void replace_spaces(char* s, int true_length) {
  int new_true_length = 0;
  for (int i = 0; i < true_length; i++) {
    new_true_length += ((s[i] == ' ') ? 3 : 1);
  }

  int end = true_length - 1;
  int new_end = new_true_length - 1;

  for (; end >= 0; end--) {
    if (s[end] == ' ') {
      s[new_end] = '0';
      s[new_end-1] = '2';
      s[new_end-2] = '%';
       new_end -= 3;
    } else {
      s[new_end] = s[end];
      new_end--;
    }
  }
}

void test(char* input, int true_length, char* expected) {
  char* input2 = strdup(input);
  replace_spaces(input2, true_length);
  assert(strcmp(input2, expected) == 0);
  free(input2);
}

int main(int argc, char** argv) {
  test("hello", 5, "hello");
  test("hello  ", 5, "hello  ");
  test("hello world  ", 11, "hello%20world");
  test("      ", 2, "%20%20");
  test("", 0, "");
  return 0;
}

This runs in linear time, This is optimal, because we at least have to look at every character, which is already linear time.

Get updates on Twitter


More by Jim

Tagged #ctci, #programming, #c. All content copyright James Fisher 2020. This post is not associated with my employer. Found an error? Edit this page.