Using BodyPix segmentation in a WebGL shader

In the previous post I showed how to run BodyPix on a video stream, displaying the segmentation using the library’s convenience functions. But if you want to use the segmentation as part of your WebGL rendering pipeline, you need to access the segmentation from your shader. In this post, I demo a pixel shader that sets the alpha channel of a canvas based on a BodyPix segmentation. The demo shows your webcam feed in the bottom-right corner of this page with alpha-transparency taken from BodyPix.

A call to net.segmentPerson returns something like this:

{
  allPoses: [...],
  data: Uint8Array(307200) [...],
  height: 480,
  width: 640,
}

There is one byte for each pixel: note 640*480 == 307200. These are in row-major order, so pixel (x,y) is at y*640 + x, where (0,0) is the top-left of the image. For example, here’s a silly debugging function that renders the segmentation in the console:

function renderSegmentation(segmentation) {
  let s = "";
  const xStride = Math.max(1, Math.floor(segmentation.width/30));   // ~30 wide
  const yStride = xStride*2; // chars are ~twice as tall as they are wide
  for (let y = 0; y < segmentation.height; y += yStride) {
    for (let x = 0; x < segmentation.width; x += xStride) {
      s += segmentation.data[segmentation.width*y + x] == 1 ? "X" : " ";
    }
    s += "\n";
  }
  console.log(s);
}

It will give you output like this if you wave at the camera:

            XXXXX    XX
           XXXXXXX  XXXXX
           XXXXXXX  XXXXXX
           XXXXXXX   XXXXX
            XXXXX     XXXXX
           XXXXXX      XXXXX
       XXXXXXXXXXXXXXX   XXXX
   XXXXXXXXXXXXXXXXXXXXX XXXXX
  XXXXXXXXXXXXXXXXXXXXXXXXXXXXX

To access this data in a WebGL shader, we need to get it into a texture using gl.texImage2D. When you pass an array to gl.texImage2D, you tell it which format to interpret it as. One possible format is gl.ALPHA, which has one byte per pixel -- the same as the format given to us by BodyPix. This byte interpreted as the alpha channel when the texture is accessed by a shader. Here’s how to load the segmentation data into a texture:

gl.texImage2D(
  gl.TEXTURE_2D,        // target
  0,                    // level
  gl.ALPHA,             // internalformat
  segmentation.width,   // width
  segmentation.height,  // height
  0,                    // border, "Must be 0"
  gl.ALPHA,             // format, "must be the same as internalformat"
  gl.UNSIGNED_BYTE,     // type of data below
  segmentation.data     // pixels
);

Unfortunately, the byte values given by BodyPix are 0 and 1, rather than the ideal 0 and 255. But we can correct for this in our fragment shader:

precision mediump float;

uniform sampler2D frame;
uniform sampler2D mask;

uniform float texWidth;
uniform float texHeight;

void main(void) {
  vec2 texCoord = vec2(gl_FragCoord.x/texWidth, 1.0 - (gl_FragCoord.y/texHeight));
  gl_FragColor = vec4(texture2D(frame, texCoord).rgb, texture2D(mask, texCoord).a * 255.);
}

Here’s what I get when I run the demo against my own webcam feed:

As you can see, BodyPix still has a number of quality issues. In priority order:

BodyPix doesn’t realize my body extends beyond the bottom of the image. It might be possible to improve this by fudging the input or output.
It’s really bad at recognizing fingers. It might be possible to improve this by running Handpose on the detected palms.

Tagged #programming, #web, #webgl, #machine-learning.

More by Jim

What does the dot do in JavaScript?

foo.bar, foo.bar(), or foo.bar = baz - what do they mean? A deep dive into prototypical inheritance and getters/setters. 2020-11-01

Smear phishing: a new Android vulnerability

Trick Android to display an SMS as coming from any contact. Convincing phishing vuln, but still unpatched. 2020-08-06

A probabilistic pub quiz for nerds

A “true or false” quiz where you respond with your confidence level, and the optimal strategy is to report your true belief. 2020-04-26

Time is running out to catch COVID-19

Simulation shows it’s rational to deliberately infect yourself with COVID-19 early on to get treatment, but after healthcare capacity is exceeded, it’s better to avoid infection. Includes interactive parameters and visualizations. 2020-03-14

The inception bar: a new phishing method

A new phishing technique that displays a fake URL bar in Chrome for mobile. A key innovation is the “scroll jail” that traps the user in a fake browser. 2019-04-27

The hacker hype cycle

I got started with simple web development, but because enamored with increasingly esoteric programming concepts, leading to a “trough of hipster technologies” before returning to more productive work. 2019-03-23

Project C-43: the lost origins of asymmetric crypto

Bob invents asymmetric cryptography by playing loud white noise to obscure Alice’s message, which he can cancel out but an eavesdropper cannot. This idea, published in 1944 by Walter Koenig Jr., is the forgotten origin of asymmetric crypto. 2019-02-16

How Hacker News stays interesting

Hacker News buried my post on conspiracy theories in my family due to overheated discussion, not censorship. Moderation keeps the site focused on interesting technical content. 2019-01-26

My parents are Flat-Earthers

For decades, my parents have been working up to Flat-Earther beliefs. From Egyptology to Jehovah’s Witnesses to theories that human built the Moon billions of years in the future. Surprisingly, it doesn’t affect their successful lives very much. For me, it’s a fun family pastime. 2019-01-20

The dots do matter: how to scam a Gmail user

Gmail’s “dots don’t matter” feature lets scammers create an account on, say, Netflix, with your email address but different dots. Results in convincing phishing emails. 2018-04-07

The sorry state of OpenSSL usability

OpenSSL’s inadequate documentation, confusing key formats, and deprecated interfaces make it difficult to use, despite its importance. 2017-12-02

I hate telephones

I hate telephones. Some rational reasons: lack of authentication, no spam filtering, forced synchronous communication. But also just a visceral fear. 2017-11-08

The Three Ts of Time, Thought and Typing: measuring cost on the web

Businesses often tout “free” services, but the real costs come in terms of time, thought, and typing required from users. Reducing these “Three Ts” is key to improving sign-up flows and increasing conversions. 2017-10-26

Granddad died today

Granddad died. The unspoken practice of death-by-dehydration in the NHS. The Liverpool Care Pathway. Assisted dying in the UK. The importance of planning in end-of-life care. 2017-05-19

How do I call a program in C, setting up standard pipes?

A C function to create a new process, set up its standard input/output/error pipes, and return a struct containing the process ID and pipe file descriptors. 2017-02-17

Your syntax highlighter is wrong

Syntax highlighters make value judgments about code. Most highlighters judge that comments are cruft, and try to hide them. Most diff viewers judge that code deletions are bad. 2014-05-11

Want to build a fantastic product using LLMs? I work at Granola where we're building the future IDE for knowledge work. Come and work with us! Read more or get in touch!

This page copyright James Fisher 2020. Content is not associated with my employer. Found an error? Edit this page.

Using BodyPix segmentation in a WebGL shader

Similar posts

More by Jim