Media Capture Depth Stream Extensions

Abstract

This specification extends the Media Capture and Streams specification [ GETUSERMEDIA ] to allow a depth-only stream or combined depth+color stream to be requested from the web platform using APIs familiar to web authors.

5. Terminology

The term depth+color stream means a MediaStream object that contains one or more MediaStreamTrack objects whose videoKind of Settings is " depth " ( depth stream track ) and one or more MediaStreamTrack objects whose videoKind of Settings is " color " ( color stream track ).

The term depth-only stream means a MediaStream object that contains one or more MediaStreamTrack objects whose videoKind of Settings is " depth " ( depth stream track ) only.

The term color-only stream means a MediaStream object that contains one or more MediaStreamTrack objects whose videoKind of Settings is " color " ( color stream track ) only, and optionally of kind " audio ".

The term depth stream track means a MediaStreamTrack object whose videoKind of Settings is " depth ". It represents a media stream track whose source is a depth camera.

The term color stream track means a MediaStreamTrack object whose videoKind of Settings is " color ". It represents a media stream track whose source is a color camera.

5.1 Depth map

A depth map is an abstract representation of a frame of a depth stream track . A depth map is an image that contains information relating to the distance of the surfaces of scene objects from a viewpoint. A depth map consists of pixels referred to as depth map values . An invalid depth map value is 0 (the user agent is unable to acquire depth information for the given pixel for any reason).

A depth map has an associated near value which is a double. It represents the minimum range in meters.

A depth map has an associated far value which is a double. It represents the maximum range in meters.

A depth map has an associated horizontal focal length which is a double. It represents the horizontal focal length of the depth camera, in pixels.

A depth map has an associated vertical focal length which is a double. It represents the vertical focal length of the depth camera, in pixels.

A depth map has an associated principal point , specified by principal point x and principal point y coordinates which are double. It is a concept defined in the pinhole camera model; a projection of perspective center to the image plane.

A depth map has an associated transformation from depth to video , which is a transformation matrix represented by a Transformation dictionary. It is used to translate position in depth camera 3D coordinate system to RGB video stream's camera (identified by videoDeviceId ) 3D coordinate system. After projecting depth 2D pixel coordinates to 3D space, we use this matrix to transform depth camera 3D space coordinates to RGB video camera 3D space.

Both depth and color cameras usually introduce significant distortion caused by the camera and lens. While in some cases, the effects are not noticeable, these distortions cause errors in image analysis. To map depth map pixel values to corresponding color video track pixels, we use two DistortionCoefficients dictionaries: deprojection distortion coefficients and projection distortion coefficients .

Deprojection distortion coefficients are used for compensating camera distortion when deprojecting 2D pixel coordinates to 3D space coordinates. Projection distortion coefficients are used in the opposite case, when projecting camera 3D space points to pixels. One track doesn't have both of the coefficients specified. The most common scenario is that the depth track has deprojection distortion coefficients or that the color video track has projection distortion coefficients . For the details, see algorithm to map depth pixels to color pixels .

7. Extensions

7.1 `MediaTrackSupportedConstraints` dictionary

partial dictionary MediaTrackSupportedConstraints {
    boolean videoKind = true;
    boolean depthNear = true;
    boolean depthFar = true;
    boolean focalLengthX = true;
    boolean focalLengthY = true;
    boolean principalPointX = true;
    boolean principalPointY = true;
    boolean deprojectionDistortionCoefficients = false;
    boolean projectionDistortionCoefficients = false;
    boolean depthToVideoTransform = false;
};

7.2 `MediaTrackCapabilities` dictionary

partial dictionary MediaTrackCapabilities {
    DOMString               videoKind;
    (double or DoubleRange) depthNear;
    (double or DoubleRange) depthFar;
    (double or DoubleRange) focalLengthX;
    (double or DoubleRange) focalLengthY;
    (double or DoubleRange) principalPointX;
    (double or DoubleRange) principalPointY;
    boolean                 deprojectionDistortionCoefficients;
    boolean                 projectionDistortionCoefficients;
    boolean                 depthToVideoTransform;
};

7.3 `MediaTrackConstraintSet` dictionary

partial dictionary MediaTrackConstraintSet {
    ConstrainDOMString videoKind;
    ConstrainDouble    depthNear;
    ConstrainDouble    depthFar;
    ConstrainDouble    focalLengthX;
    ConstrainDouble    focalLengthY;
    ConstrainDouble    principalPointX;
    ConstrainDouble    principalPointY;
    ConstrainBoolean   deprojectionDistortionCoefficients;
    ConstrainBoolean   projectionDistortionCoefficients;
    ConstrainBoolean   depthToVideoTransform;
};

7.4 `MediaTrackSettings` dictionary

partial dictionary MediaTrackSettings {
    DOMString              videoKind;
    double                 depthNear;
    double                 depthFar;
    double                 focalLengthX;
    double                 focalLengthY;
    double                 principalPointX;
    double                 principalPointY;
    DistortionCoefficients deprojectionDistortionCoefficients;
    DistortionCoefficients projectionDistortionCoefficients;
    Transformation         depthToVideoTransform;
};
dictionary DistortionCoefficients {
    double k1;
    double k2;
    double p1;
    double p2;
    double k3;
};
dictionary Transformation {
    Float32Array transformationMatrix;
    DOMString    videoDeviceId;
};

The DistortionCoefficients dictionary has the k1 , k2 , p1 , p2 and k3 dictionary members that represent the deprojection distortion coefficients or projection distortion coefficients . k1 , k2 and k3 are radial distortion coefficients while p1 and p2 are tangential distortion coefficients .

Note

Radial distortion coefficients and tangential distortion coefficients are used when there's need to deproject depth value to 3D space or to project 3D value to 2D video frame coordinates.
See algorithm to map depth pixels to color pixels and Brown-Conrady distortion model implementation in 3D point cloud rendering example GLSL shader.

The Transformation dictionary has the transformationMatrix dictionary member that is a 16 element array that defines the transformation matrix of the depth map 's camera's 3D coordinate system to video track's camera 3D coordinate system.

Note

The first four elements of the array correspond to the first matrix row, followed by four elements of the second matrix row and so on. It is in format suitable for use with WebGL's uniformMatrix4fv.

The videoDeviceId dictionary member represents the deviceId of video camera the depth stream must be synchronized with.

Note

The value of videoDeviceId can be used as deviceId constraint in [ GETUSERMEDIA ] to get the corresponding video and audio streams.

7.5 Constrainable properties for video

The following constrainable properties are defined to apply only to video MediaStreamTrack objects:

Property Name	Values	Notes
`MediaTrackSupportedConstraints` . `videoKind` `MediaTrackCapabilities` . `videoKind` `MediaTrackConstraintSet` . `videoKind` `MediaTrackSettings` . `videoKind`	`ConstrainDOMString`	This string should be one of the members of `VideoKindEnum` . The members describe the kind of video that the camera can capture. Note that `getConstraints` may not return exactly the same string for strings not in this enum. This preserves the possibility of using a future version of WebIDL enum for this property.

enum VideoKindEnum {
    "color",
    "depth"
};

Enumeration description
`color`	The source is capturing color images.
`depth`	The source is capturing depth maps.

Note

If the user agent requests a combined depth+color stream , the devices in the constraint should be satisfied as belonging to the same group or physical device. The decision to select and satisfy which device pair is left up to the implementation.

The MediaStream consumer for the depth-only stream and depth+color stream is the video element [ HTML ].

Note

New consumer s may be added in a future version of this specification.

If a MediaStreamTrack whose videoKind of Settings is muted or disabled , it MUST render frames as if all the pixels would be 0.

7.5.1 Implementation considerations

This section is non-normative.

A color stream track and a depth stream track can be combined into one depth+color stream . The rendering of the two tracks are intended to be synchronized. The resolution of the two tracks are intended to be same. And the coordination of the two tracks are intended to be calibrated. These are not hard requirements, since it might not be possible to synchronize tracks from sources.

Note

This approach is simple to use but comes with the following caveats: it might might not be supported by the implementation and the resolutions of two tracks are intended to be the same that can require downsampling and degrade quality. The alternative approach is that a web developer implements the algorithm to map depth pixels to color pixels . See the 3D point cloud rendering example code.

7.6 Constrainable properties for depth

The following constrainable properties are defined to apply only to depth stream track s:

Property Name	Values	Notes
`MediaTrackSupportedConstraints` . `depthNear` `MediaTrackCapabilities` . `depthNear` `MediaTrackConstraintSet` . `depthNear` `MediaTrackSettings` . `depthNear`	`ConstrainDouble`	The near value , in meters.
`MediaTrackSupportedConstraints` . `depthFar` `MediaTrackCapabilities` . `depthFar` `MediaTrackConstraintSet` . `depthFar` `MediaTrackSettings` . `depthFar`	`ConstrainDouble`	The far value , in meters.
`MediaTrackSupportedConstraints` . `focalLengthX` `MediaTrackCapabilities` . `focalLengthX` `MediaTrackConstraintSet` . `focalLengthX` `MediaTrackSettings` . `focalLengthX`	`ConstrainDouble`	The horizontal focal length , in pixels.
`MediaTrackSupportedConstraints` . `focalLengthY` `MediaTrackCapabilities` . `focalLengthY` `MediaTrackConstraintSet` . `focalLengthY` `MediaTrackSettings` . `focalLengthY`	`ConstrainDouble`	The vertical focal length , in pixels.
`MediaTrackSupportedConstraints` . `principalPointX` `MediaTrackCapabilities` . `principalPointX` `MediaTrackConstraintSet` . `principalPointX` `MediaTrackSettings` . `principalPointX`	`ConstrainDouble`	The principal point x coordinate, in pixels.
`MediaTrackSupportedConstraints` . `principalPointY` `MediaTrackCapabilities` . `principalPointY` `MediaTrackConstraintSet` . `principalPointY` `MediaTrackSettings` . `principalPointY`	`ConstrainDouble`	The principal point y coordinate, in pixels.
`MediaTrackSupportedConstraints` . `deprojectionDistortionCoefficients` `MediaTrackCapabilities` . `deprojectionDistortionCoefficients` `MediaTrackConstraintSet` . `deprojectionDistortionCoefficients` `MediaTrackSettings` . `deprojectionDistortionCoefficients`	`ConstrainDOMDictionary`	The depth map 's deprojection distortion coefficients used when deprojecting from 2D to 3D space.
`MediaTrackSupportedConstraints` . `projectionDistortionCoefficients` `MediaTrackCapabilities` . `projectionDistortionCoefficients` `MediaTrackConstraintSet` . `projectionDistortionCoefficients` `MediaTrackSettings` . `projectionDistortionCoefficients`	`ConstrainDOMDictionary`	The depth map 's projection distortion coefficients used when deprojecting from 2D to 3D space.
`MediaTrackSupportedConstraints` . `depthToVideoTransform` `MediaTrackCapabilities` . `depthToVideoTransform` `MediaTrackConstraintSet` . `depthToVideoTransform` `MediaTrackSettings` . `depthToVideoTransform`	`ConstrainDOMDictionary`	The depth map 's camera's transformation from depth to video camera 3D coordinate system.

The depthNear and depthFar constrainable properties, when set, allow the implementation to pick the best depth camera mode optimized for the range [near, far] and help minimize the error introduced by the lossy conversion from the depth value d to a quantized d _8bit and back to an approximation of the depth value d.

If the depthFar property's value is less than the depthNear property's value, the depth stream track is overconstrained .

If the near value , far value , horizontal focal length or vertical focal length is fixed due to a hardware or software limitation, the corresponding constrainable property's value MUST be set to the value reported by the underlying implementation. (For example, the focal lengths of the lens may be fixed, or the underlying platform may not expose the focal length information.)

7.7 `WebGLRenderingContext` interface

7.7.1 Implementation considerations

This section is non-normative.

Note

This section is currently work in progress, and subject to change.

There are several use-cases which are a good fit to be, at least partially, implemented on the GPU, such as motion recognition, pattern recognition, background removal, as well as 3D point cloud.

This section explains which APIs can be used for some of these mentioned use-cases; the concrete examples are provided in the Examples section.

Upload video frame to WebGL texture

A video element whose source is a MediaStream object containing a depth stream track may be uploaded to a WebGL texture of format RGBA or RED and type FLOAT. See the specification [ WEBGL ] and the upload to float texture example code.

For each pixel of this WebGL texture, the R component represents normalized 16-bit value following the formula: $d_{f l o a t} = \frac{d_{16 b i t}}{65535.0}$ $d_(float) = d_(16bit) / 65535.0$

Read the data from a WebGL texture

This section is non-normative.

Here we list some of the possible approaches.

Synchronous readPixels usage requires the least amount of code and it is available with WebGL 1.0. See the readPixels from float example for further details.
Asynchronous readPixels using pixel buffer objects to avoid blocking the readPixels call.
Transform feedback [ WEBGL2 ] with GetBufferSubData(Async) [ WEBGL-GET-BUFFER-SUB-DATA-ASYNC ] provides synchronous and asynchronous read access to depth and color texture data processed in the vertex shader.

Note

Performance of synchronous readPixels from float example in the current implementation suffice for some of the use cases. The reason is that there is no rendering to the float texture bound to named framebuffer.

8. Synchronizing depth and color video rendering

This section is non-normative.

Note

The algorithms presented in this section explain how a web developer can map depth and color pixels. Concrete example on how to do the mapping is provided in example vertex shader used for 3D point cloud rendering .

When rendering, we want to position a color value from color video frame to corresponding depth map value or 3D point in space defined by depth map value . We use deprojection distortion coefficients to compensate camera distortion when deprojecting 2D pixel coordinates to 3D space coordinates and projection distortion coefficients in the opposite case, when projecting camera 3D space points to pixels.

The algorithm to map depth pixels to color pixels is as follows:

Deproject depth map value to point in depth 3D space.
Transform 3D point from depth camera 3D space to color camera 3D space.
Project from color camera 3D space to color frame 2D pixels.

8.1 Deproject to depth 3D space

The algorithm to deproject depth map value to point in depth camera is as follows:

Let dx and dy be 2D coordinates, in pixels, of a pixel in depth map .

Let dz be depth map value of the same pixel in the depth map .

Let fx and fy be depth map 's horizontal focal length and vertical focal length respectively.

Let cx and cy be depth map 's principal point 2D coordinates.

Let 3D coordinates (Xd, Yd, Zd) be the output of this step - a 3D point in depth camera's 3D coordinate system.

$p x = \frac{d x - c x}{f x}$ $px = (dx - cx) / (fx)$

$p y = \frac{d y - c y}{f y}$ $py = (dy - cy) / (fy)$

If depth map 's deprojection distortion coefficients are not present in MediaTrackSettings dictionary,
3D coordinates (Xd, Yd, Zd) in depth camera space are calculated as:

$X d = d z \cdot p x$ $Xd = dz * px$

$Y d = d z \cdot p x$ $Yd = dz * px$

$Zd = dz$
If depth map 's deprojection distortion coefficients k1 , k2 , k3 , p1 and p2 are present in MediaTrackSettings dictionary, with a note that some of those could be zero,
3D coordinates (Xd, Yd, Zd) in depth camera space are calculated as:

$r2 = px^2 + py^2$

$r = 1 + k1 * r2 + k2 * r2^2 + k3 * r2^3$

$Xd = dz * (px * r + 2 * p1 * px * py + p2 * (r2 + 2 * px^2))$

$Yd = dz * (py * r + 2 * p2 * px * py + p1 * (r2 + 2 * py^2))$

$Zd = dz$

Note

See depth_deproject function in 3D point cloud rendering example.

8.2 Transform from depth to color 3D space

The result of project depth value to 3D point step, 3D point (Xd, Yd, Zd), is in depth camera 3D coordinate system. To transform coordinates of the same point in space, but to color camera 3D coordinate system, we use matrix multiplication of transformation from depth to video matrix by the (Xd, Yd, Zd) 3D point vector.

Let (Xc, Yc, Zc) be the output of this step - a 3D coordinates of projected depth map value to color camera 3D space.

Let M be transformation matrix defined in depth map 's depthToVideoTransform field.

To multiply 4x4 matrix by 3 element vector, we extend the 3D vector by one element to 4 dimensional vector. After multiplication, we use vector's x, y and z coordinates as the result.

$((Xc), (Yc), (Zc)) = ([M] xx ((Xd), (Yd), (Zd), (1))).xyz$

Note

In 3D point cloud rendering example, this is done by: vec4 color_point = u_depth_to_color * vec4(depth_point, 1.0);

8.3 Project from color 3D to pixel

To project from color 3D to 2D coordinate we use the corresponding color track's MediaTrackSettings . The color track we get using depth map 's Transformation . videoDeviceId - it represents the target color video deviceID that should be used as a constraint with [ GETUSERMEDIA ] call to get the corresponding color video stream track. After that, we use color track getSettings() to access MediaTrackSettings .

Let $fx_c$ and $fy_c$ be color track's horizontal focal length and vertical focal length respectively.

Let $cx_c$ and $cy_c$ be color track's principal point 2D coordinates.

The result of this step is 2D coordinate of pixel in color video frame ( x, y ).

If color track's projection distortion coefficients k1 , k2 , k3 , p1 and p2 are present in MediaTrackSettings dictionary,
position of pixel in color frame image (x, y) is calculated as:

$r2_c = (Xc)^2 + (Yc)^2$

$r = 1 + k1 * r2 + k2 * r2^2 + k3 * r2^3$

$px_c = r * (Xc) / (Zc)$

$py_c = r * (Yc) / (Zc)$

$x = (px_c + 2 * p1 * px_c * py_c + p2 * (r2_c + 2 * px_c^2)) * fx_c + cx_c$

$y = (py_c + 2 * p2 * px_c * py_c + p1 * (r2_c + 2 * py_c^2)) * fy_c + cy_c$
If color track's projection distortion coefficients are not present in MediaTrackSettings dictionary,
position of pixel in color frame image (x, y) is calculated as:

$px_c = (Xc) / (Zc)$

$py_c = (Yc) / (Zc)$

$x = px_c * fx_c + cx_c$

$y = py_c * fy_c + cy_c$

Note

See color_project function in 3D point cloud rendering example.

9. Examples

This section is non-normative.

Example 1 : Playback of depth and color streams from same device ~~group. Example 1~~ group

~~navigator.mediaDevices.getUserMedia({ : id}} }).then({ ); video.srcObject = stream; video.play();~~

async function attachVideoStream(el, kind, groupId) {
  const constraints = {
    video: {
      mandatory: {
        groupId: groupId
      },
      videoKind: { exact: kind }
    }

  }
);

navigator.mediaDevices.getUserMedia({
  : id}}
}).then({
    
    
    );
    depthVideo.srcObject = stream;
    depthVideo.play();
  }
);

  const stream = await navigator.mediaDevices.getUserMedia(constraints);
  el.srcObject = stream;
  el.play();
  return el;
}
async function play() {  const colorEl = document.createElement("video");  const depthEl = document.createElement("video");  const body = document.querySelector('body');
  body.appendChild(colorEl);
  body.appendChild(depthEl);
  // Assume that all our video inputs are depth stream capable.  const devices = await navigator.mediaDevices.enumerateDevices();  const sources = devices.filter(device => device.kind == "videoinput");  // Attached cameras usually comes last.  const camera = sources.pop();  // Regular RGB video will be rendered.
  attachVideoStream(colorEl, "color", camera.groupId);
  // Depth information will be rendered in its grayscale representation.
  attachVideoStream(depthEl, "depth", camera.groupId);
}
play();

Example 2 : WebGL: upload to float texture

This ~~code~~ upload to float texture sets up a video element from a depth stream, uploads it to a WebGL 2.0 float texture.

async function attachVideoStream(el) {
  const constraints = {
    video: { videoKind: { exact: "depth" } }
  }

Example
2

  el.srcObject = await navigator.mediaDevices.getUserMedia(constraints);
  el.play();

navigator.mediaDevices.getUserMedia({
  }}
}).then({
  
  );
  depthVideo.srcObject = stream;
  depthVideo.play();
}).catch({
  
});

  return el;
}

);

const depthVideoEl = attachVideoStream(document.querySelector('#depthVideo'));
const gl = canvas.getContext("webgl2");
// Activate the standard WebGL 2.0 extension for using
// single component R32F texture format.
gl.getExtension('EXT_color_buffer_float');
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texImage2D(
   gl.TEXTURE_2D,
   ,
   gl.R32F,
   gl.RED,
   gl.FLOAT,
   depthVideo);

const render = () => {
  // Later, in the rendering loop...
  gl.bindTexture(gl.TEXTURE_2D, depthTexture);
  gl.texImage2D(
    gl.TEXTURE_2D,
    0,
    gl.R32F,
    gl.RED,
    gl.FLOAT,
    depthVideoEl);
  // ...
  requestAnimationFrame(render);
}
render();

Example 3 : WebGL: readPixels from float ~~texture This section is non-normative.~~

This readPixels from float example extends the upload to float texture example.

This code creates the texture to which we will upload the depth video frame. Then, it sets up a named framebuffer, attach the texture as color attachment and, after uploading the depth video to the texture, reads the texture content to Float32Array.

~~Example 3~~

// Initialize texture and framebuffer for reading back the texture.
 depthTexture = gl.createTexture();

const depthTexture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
let framebuffer = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, framebuffer);
gl.framebufferTexture2D(
  gl.FRAMEBUFFER,
  gl.COLOR_ATTACHMENT0,
  gl.TEXTURE_2D,
  depthTexture,
  0);
let buffer;
const render = () => {
  // Later, in the rendering loop ...
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texImage2D(
   gl.TEXTURE_2D,
   ,
   gl.R32F,
   gl.RED,
   gl.FLOAT,
   depthVideo);

  gl.bindTexture(gl.TEXTURE_2D, depthTexture);
  gl.texImage2D(
    gl.TEXTURE_2D,
    0,
    gl.R32F,
    gl.RED,
    gl.FLOAT,
    depthVideoEl);

 (!buffer) {
  buffer =
      (depthVideo.videoWidth * depthVideo.videoHeight);
}

  if (!buffer) {
    const length = depthVideoEl.videoWidth * depthVideoEl.videoHeight;
    buffer = new Float32Array(length);
  }

gl.readPixels(
  ,
  ,
  depthVideo.videoWidth,
  depthVideo.videoHeight,
  gl.RED,
  gl.FLOAT,
  buffer);

  gl.readPixels(
    0, 0, depthVideoEl.videoWidth, depthVideoEl.videoHeight,
    gl.RED, gl.FLOAT, buffer
  );
  // ...
  requestAnimationFrame(render);
}

Note

Use gl.getParameter(gl.IMPLEMENTATION_COLOR_READ_FORMAT); gl.getParameter(gl.IMPLEMENTATION_COLOR_READ_FORMAT) to check whether readPixels to gl.RED or gl.RGBA float is supported.

Example 4 : WebGL Vertex Shader that implements mapping color and depth

This vertex shader is used for 3D point cloud rendering . The code here shows how the web developer can implement algorithm to map depth pixels to color pixels . Draw call used is glDrawArrays(GL_POINTS, 0, depthMap.width * depthMap.height). Shader output is 3D position of vertices (gl_Position) and color texture sampling coordinates per vertex.

~~Example 4~~

#version 300 es
#define DISTORTION_NONE 0
#define USE_DEPTH_DEPROJECTION_DISTORTION_COEFFICIENTS 1
#define USE_COLOR_PROJECTION_DISTORTION_COEFFICIENTS 2
#version 300 es
#define DISTORTION_NONE 0
#define USE_DEPTH_DEPROJECTION_DISTORTION_COEFFICIENTS 1
#define USE_COLOR_PROJECTION_DISTORTION_COEFFICIENTS 2

uniform mat4 u_mvp;
uniform vec2 u_color_size;
uniform vec2 u_depth_size;
uniform highp usampler2D s_depth_texture;
uniform float u_depth_scale_in_meter;
uniform mat4 u_depth_to_color;
uniform vec2 u_color_offset;
uniform vec2 u_color_focal_length;
uniform float u_color_coeffs[5];
uniform int u_color_projection_distortion;
uniform vec2 u_depth_offset;
uniform vec2 u_depth_focal_length;
uniform float u_depth_coeffs[5];
uniform int u_depth_deprojection_distortion;

uniform vec2 u_color_size;
uniform vec2 u_depth_size;
uniform highp usampler2D s_depth_texture;
uniform float u_depth_scale_in_meter;
uniform mat4 u_depth_to_color;
uniform vec2 u_color_offset;
uniform vec2 u_color_focal_length;
uniform float u_color_coeffs[5];
uniform int u_color_projection_distortion;
uniform vec2 u_depth_offset;
uniform vec2 u_depth_focal_length;
uniform float u_depth_coeffs[5];
uniform int u_depth_deprojection_distortion;

out vec2 v_tex;
vec3 depth_deproject(vec2 pixel, float depth)
{
  vec2 point = (pixel - u_depth_offset) / u_depth_focal_length;
  if(u_depth_deprojection_distortion == USE_DEPTH_DEPROJECTION_DISTORTION_COEFFICIENTS)
  {
    float r2 = dot(point, point);
    float f = 1.0 + u_depth_coeffs[0] * r2 + u_depth_coeffs[1] * r2 * r2 + u_depth_coeffs[4] * r2 * r2 * r2;
    float ux = point.x * f + 2.0 * u_depth_coeffs[2] * point.x * point.y +
               u_depth_coeffs[3] * (r2 + 2.0 * point.x * point.x);
    float uy = point.y * f + 2.0 * u_depth_coeffs[3] * point.x * point.y +
               u_depth_coeffs[2] * (r2 + 2.0 * point.y * point.y);
    point = vec2(ux, uy);

  vec2 point = (pixel - u_depth_offset) / u_depth_focal_length;
  if (u_depth_deprojection_distortion == USE_DEPTH_DEPROJECTION_DISTORTION_COEFFICIENTS) {
    float r2 = dot(point, point);    float f = 1.0 + u_depth_coeffs[0] * r2 + u_depth_coeffs[1] * r2 * r2 + u_depth_coeffs[4] * r2 * r2 * r2;    float ux = point.x * f + 2.0 * u_depth_coeffs[2] * point.x * point.y +               u_depth_coeffs[3] * (r2 + 2.0 * point.x * point.x);    float uy = point.y * f + 2.0 * u_depth_coeffs[3] * point.x * point.y +               u_depth_coeffs[2] * (r2 + 2.0 * point.y * point.y);    point = vec2(ux, uy);
  }
  return vec3(point * depth, depth);
}
vec2 color_project(vec3 point)
{
  vec2 pixel = point.xy / point.z;
  if(u_color_projection_distortion == USE_COLOR_PROJECTION_DISTORTION_COEFFICIENTS)
  {
    float r2 = dot(pixel, pixel);
    float f = 1.0 + u_color_coeffs[0] * r2 + u_color_coeffs[1] * r2 * r2 +
              u_color_coeffs[4] * r2 * r2 * r2;
    pixel = pixel * f;
    float dx = pixel.x + 2.0 * u_color_coeffs[2] * pixel.x * pixel.y +
               u_color_coeffs[3] * (r2 + 2.0 * pixel.x * pixel.x);
    float dy = pixel.y + 2.0 * u_color_coeffs[3] * pixel.x * pixel.y +
               u_color_coeffs[2] * (r2 + 2.0 * pixel.y * pixel.y);
    pixel = vec2(dx, dy);

  if (u_color_projection_distortion == USE_COLOR_PROJECTION_DISTORTION_COEFFICIENTS) {
    float r2 = dot(pixel, pixel);    float f = 1.0 + u_color_coeffs[0] * r2 + u_color_coeffs[1] * r2 * r2 +              u_color_coeffs[4] * r2 * r2 * r2;    pixel = pixel * f;    float dx = pixel.x + 2.0 * u_color_coeffs[2] * pixel.x * pixel.y +               u_color_coeffs[3] * (r2 + 2.0 * pixel.x * pixel.x);    float dy = pixel.y + 2.0 * u_color_coeffs[3] * pixel.x * pixel.y +               u_color_coeffs[2] * (r2 + 2.0 * pixel.y * pixel.y);    pixel = vec2(dx, dy);
  }
  return pixel * u_color_focal_length + u_color_offset;

  return pixel * u_color_focal_length + u_color_offset;

}
void main()
{
  vec2 depth_pixel;
  // generate lattice pos; (0, 0) (1, 0) (2, 0) ... (w-1, h-1)
  depth_pixel.x = mod(float(gl_VertexID) + 0.5, u_depth_size.x);
  depth_pixel.y = clamp(floor(float(gl_VertexID) / u_depth_size.x) + 0.5, 0.0, u_depth_size.y);

  // get depth
  vec2 depth_tex_pos = depth_pixel / u_depth_size;
  uint depth = texture(s_depth_texture, depth_tex_pos).r;
  float depth_in_meter = float(depth) * u_depth_scale_in_meter;

  // Generate lattice pos; (0, 0) (1, 0) (2, 0) ... (w-1, h-1)
  depth_pixel.x = mod(float(gl_VertexID) + 0.5, u_depth_size.x);
  depth_pixel.y = clamp(floor(float(gl_VertexID) / u_depth_size.x) + 0.5, 0.0, u_depth_size.y);

  vec3 depth_point = depth_deproject(depth_pixel, depth_in_meter);
  vec4 color_point = u_depth_to_color * vec4(depth_point, 1.0);
  vec2 color_pixel = color_project(color_point.xyz);

  vec2 depth_tex_pos = depth_pixel / u_depth_size;
  uint depth = texture(s_depth_texture, depth_tex_pos).r;
  float depth_in_meter = float(depth) * u_depth_scale_in_meter;
  vec3 depth_point = depth_deproject(depth_pixel, depth_in_meter);
  vec4 color_point = u_depth_to_color * vec4(depth_point, 1.0);
  vec2 color_pixel = color_project(color_point.xyz);

  // map [0, w) to [0, 1]
  v_tex = color_pixel / u_color_size;

  v_tex = color_pixel / u_color_size;

  gl_Position = u_mvp * vec4(depth_point, 1.0);
}

  gl_Position = u_mvp * vec4(depth_point, 1.0);
}

Media Capture Depth Stream Extensions

W3C Editor's Draft 13 March 2017

Abstract

Status of This Document

1. Introduction

2. Use cases and requirements

3. Conformance

4. Dependencies

5. Terminology

5.1 Depth map

6. Depth to grayscale conversion

7. Extensions

7.1 `MediaTrackSupportedConstraints` dictionary

7.2 `MediaTrackCapabilities` dictionary

7.3 `MediaTrackConstraintSet` dictionary

7.4 `MediaTrackSettings` dictionary

7.5 Constrainable properties for video

7.5.1 Implementation considerations

7.6 Constrainable properties for depth

7.7 `WebGLRenderingContext` interface

7.7.1 Implementation considerations

Upload video frame to WebGL texture

Read the data from a WebGL texture

8. Synchronizing depth and color video rendering

8.1 Deproject to depth 3D space

8.2 Transform from depth to color 3D space

8.3 Project from color 3D to pixel

9. Examples

10. Privacy and security considerations

A. Acknowledgements

B. References

B.1 Normative references

B.2 Informative references

Abstract

Status of This Document

1. Introduction

2. Use cases and requirements

3. Conformance

4. Dependencies

5. Terminology

5.1 Depth map

6. Depth to grayscale conversion

7. Extensions

7.1 MediaTrackSupportedConstraints dictionary

7.2 MediaTrackCapabilities dictionary

7.3 MediaTrackConstraintSet dictionary

7.4 MediaTrackSettings dictionary

7.5 Constrainable properties for video

7.5.1 Implementation considerations

7.6 Constrainable properties for depth

7.7 WebGLRenderingContext interface

7.7.1 Implementation considerations

Upload video frame to WebGL texture

Read the data from a WebGL texture

8. Synchronizing depth and color video rendering

8.1 Deproject to depth 3D space

8.2 Transform from depth to color 3D space

8.3 Project from color 3D to pixel

9. Examples

10. Privacy and security considerations

A. Acknowledgements

B. References

B.1 Normative references

B.2 Informative references

7.1 `MediaTrackSupportedConstraints` dictionary

7.2 `MediaTrackCapabilities` dictionary

7.3 `MediaTrackConstraintSet` dictionary

7.4 `MediaTrackSettings` dictionary

7.7 `WebGLRenderingContext` interface