When in doubt, don’t blur it out

Yesterday, The Guardian published an article about a victim, with an photo of a letter that had been sent to them. To preserve the privacy of the victim, the address on the letter had been blurred. However, I was able to completely recover the address, complete with the superscript “th” in the street number! The Guardian doxxed the victim they were writing about.

Blurring is often used to redact sensitive content. There’s apparently even a phrase, “if in doubt, blur it out”. But, counterintuitively, blur can be completely inverted to recover the original image! I won’t show you The Guardian’s example; instead, here’s an example I created:

You see the original, then a blurred version, then a version recovered from this. I used a tool called SmartDeblur, and the author, Vladimir Yuzhikov, has a great blog post on how it works. But it’s complicated, so below I look at a simpler model for how deblur can work.

Consider a blur function that works on one-dimensional images. Each pixel b[i] in the output blurred image is generated by taking the average of three pixels in the source image: the corresponding pixel s[i] in the source, and its two nearest neighbors s[i-1] and s[i+1]. This is a one-dimensional equivalent of a “bokeh” or lens blur, which averages all the pixels in a circle. This is the blur type that The Guardian used. To deal with the edges, we say that out-of-bounds pixels in the source are white.

Given the blurred output from this function, can we recover the original? Yes, if only we make a guess at the border of the source image. Let’s work from the left-hand side of the image. We know s[-1] is white, because it’s out of bounds. Let’s assume s[0] is white; this is our border guess. Then we can recover s[1] from b[0]. We march left to right to recover the rest, using s[i] = 3*b[i-1] - s[i-1] - s[i-2].

This model generalizes to any size blur. We just have to guess more border, e.g. if each blurred pixel comes from n=7 input pixels, we must guess at a 3-pixel border. Here’s the general algorithm in JavaScript:

function deblur(n, borderGuess, blurred) {
  const m = (n-1)/2;
  const out = [];
  for (let i = 0; i < m; i++) out[i] = borderGuess[i];
  for (let i = m; i < blurred.length; i++) {
    out[i] = n*blurred[i-m];
    for (let j = (i-n)+1; j < i; j++) {
      out[i] -= (
        j < 0 ? 1 :
        j < borderGuess.length ? borderGuess[j] :
        out[j]
      );
    }
  }
  return out;
}

The SmartDeblur tool is designed for real-world, arbitrary photos. But we can probably recover a much better image if we know that the source is text! Usually, blurred text is given as part of a larger unblurred image, from which we can make very strong assumptions about the blurred source. For instance, we can be confident that the border of the source is white. We can assume the source is black and white, rather than greyscale. In the extreme, we could assume that the text is 12-point Times New Roman, and recover the source text by generating characters that minimize error. A demo of this would be a fun future blogpost ...

Discussion on Hacker News.
Tagged #security, #image-processing, #programming.

Similar posts

More by Jim

👋 I'm Jim, a full-stack product engineer. Want to build an amazing product and a profitable business? Read more about me or Get in touch!

This page copyright James Fisher 2020. Content is not associated with my employer. Found an error? Edit this page.