Running BodyPix on a video stream

In this previous post, I briefly showed the BodyPix API for segmenting a person in an image. In this post, I show this applied to the video stream from your webcam. :

Here’s the performance of this on your machine:

This uses fairly low-quality settings for segmentation. On my machine, I get around 7 frames segmented per second. This is not good enough for realtime video. To make the performance acceptable, we must separate the render loop from the segmentation loop. The segmentation loop samples from the stream less frequently than the render loop, and the latest segmentation is used by multiple renders. Here’s the core program flow:

const webcamEl = document.getElementById('webcam');
const canvasEl = document.getElementById('canvas');
const stream = await navigator.mediaDevices.getUserMedia({ video: { facingMode: 'user' } });
webcamEl.srcObject = stream; webcamEl.play();
const net = await bodyPix.load(...);

let mask = null;

function renderLoop(now, metadata) {
  canvasEl.width = metadata.width;
  canvasEl.height = metadata.height;
  if (mask) bodyPix.drawMask(canvasEl, webcamEl, mask, ...);
  webcamEl.requestVideoFrameCallback(renderLoop);
}
webcamEl.requestVideoFrameCallback(renderLoop);

async function segmentLoop(now, metadata) {
  webcamEl.width = metadata.width;
  webcamEl.height = metadata.height;
  const segmentation = await net.segmentPerson(webcamEl, ...);
  mask = bodyPix.toMask(segmentation);
  webcamEl.requestVideoFrameCallback(segmentLoop);
}
webcamEl.requestVideoFrameCallback(segmentLoop);

One oddity is that we must set the width and height properties of the <video> element explicitly, otherwise TensorFlow complains. In truth, I don’t really know what the width and height properties of a <video> element mean, in general, or how they interact with the size of each frame of the video (which can vary during the video stream!). I’ll cover this in a future post.

We shouldn’t really use bodyPix.drawMask for rendering. This is just an afterthought helper function from the library, and doesn’t provide what we need for background replacement, or running any other effects. In a future post, I’ll show how to feed the segmentation mask into a custom WebGL renderer.

Tagged #programming, #web, #machinelearning.
👋 I'm Jim, a full-stack product engineer. Want to build an amazing product and a profitable business? Read more about me or Get in touch!

More by Jim

This page copyright James Fisher 2020. Content is not associated with my employer. Found an error? Edit this page.