Running BodyPix on a video stream

In this previous post, I briefly showed the BodyPix API for segmenting a person in an image. In this post, I show this applied to the video stream from your webcam. :

Here’s the performance of this on your machine:

This uses fairly low-quality settings for segmentation. On my machine, I get around 7 frames segmented per second. This is not good enough for realtime video. To make the performance acceptable, we must separate the render loop from the segmentation loop. The segmentation loop samples from the stream less frequently than the render loop, and the latest segmentation is used by multiple renders. Here’s the core program flow:

const webcamEl = document.getElementById('webcam');
const canvasEl = document.getElementById('canvas');
const stream = await navigator.mediaDevices.getUserMedia({ video: { facingMode: 'user' } });
webcamEl.srcObject = stream; webcamEl.play();
const net = await bodyPix.load(...);

let mask = null;

function renderLoop(now, metadata) {
  canvasEl.width = metadata.width;
  canvasEl.height = metadata.height;
  if (mask) bodyPix.drawMask(canvasEl, webcamEl, mask, ...);
  webcamEl.requestVideoFrameCallback(renderLoop);
}
webcamEl.requestVideoFrameCallback(renderLoop);

async function segmentLoop(now, metadata) {
  webcamEl.width = metadata.width;
  webcamEl.height = metadata.height;
  const segmentation = await net.segmentPerson(webcamEl, ...);
  mask = bodyPix.toMask(segmentation);
  webcamEl.requestVideoFrameCallback(segmentLoop);
}
webcamEl.requestVideoFrameCallback(segmentLoop);

One oddity is that we must set the width and height properties of the <video> element explicitly, otherwise TensorFlow complains. In truth, I don’t really know what the width and height properties of a <video> element mean, in general, or how they interact with the size of each frame of the video (which can vary during the video stream!). I’ll cover this in a future post.

We shouldn’t really use bodyPix.drawMask for rendering. This is just an afterthought helper function from the library, and doesn’t provide what we need for background replacement, or running any other effects. In a future post, I’ll show how to feed the segmentation mask into a custom WebGL renderer.

I just released Vidrio, a free app for macOS and Windows to make your screen-sharing awesomely holographic. Vidrio shows your webcam video on your screen, just like a mirror. Then you just share or record your screen with Zoom, QuickTime, or any other app. Vidrio makes your presentations effortlessly engaging, showing your gestures, gazes, and expressions. #1 on Product Hunt. Available for macOS and Windows.

With Vidrio

With generic competitor

More by Jim

Tagged #programming, #web, #machinelearning. All content copyright James Fisher 2020. This post is not associated with my employer. Found an error? Edit this page.