Skip to main content

How it works

This page explains how AR 51 turns camera video into usable 3D motion — the data path from the cameras to your application. AR 51 is markerless: nothing is worn by the performer. The pipeline has three stages — Capture → Compute → Consume.

The AR 51 markerless pipeline: Capture with 9 MP / 120 FPS cameras, Compute on the GPU vision server, Consume in Mocap Studio and game enginesThe AR 51 markerless pipeline: Capture with 9 MP / 120 FPS cameras, Compute on the GPU vision server, Consume in Mocap Studio and game engines

1. Capture

AR 51's MindVision cameras (9 MP, 120 FPS, higher frame rates supported) ring the capture volume and stream synchronized video to the server. The cameras are hardware-synced so every frame across the array shares a timestamp — this is what lets the next stage fuse views correctly.

→ See hardware overview and room & camera setup.

2. Compute

The computer-vision server (CVS) runs GPU pose estimation, fusing all camera views into 3D data many times per second. Per frame it produces:

  • Skeletons and hands for every tracked person
  • Tracked objects you've registered (props, tools)
  • Camera poses from calibration, so all output shares one coordinate space

It tracks multiple people simultaneously and re-identifies them across frames via a persistent EntityId, so a person keeps their identity after leaving and re-entering the volume.

→ Fusion depends on a one-time camera calibration; identity handling is covered in entity identification.

3. Consume

The 3D output is consumed in two ways.

Mocap Studio — visualize the capture, record takes, and export (FBX and other formats).

SDKs and APIs — stream the data live into your own application over gRPC. Available clients:

SDK / APILanguageTypical use
Unity SDKC#Unity games/apps, VR, virtual production
Unreal SDKC++ / BlueprintUnreal projects, LiveLink, RenderStream
.NETC#Headless / desktop consumers
C++C++Native integrations
Python (PyCvs)PythonResearch, data pipelines, ML

Clients don't hard-code addresses: they discover services through the OMS registry and connect. See Connecting a client.

The pieces, in a sentence each

TermWhat it is
CVSComputer-vision server — runs pose estimation and produces the 3D motion from camera video.
OMSThe registration/discovery service that lets components find each other.
DGSShared scene & spatial anchors for multi-user / VR sessions.
EntityId / PersonIdPersistent vs. per-session identities for tracked people.

Full definitions in the glossary.

Where to go next

  • Quickstart — go from a running system to your first capture.
  • SDK & API → Architecture — service topology and the data model.
  • Connecting a client — discover services and open a stream.
Was this page helpful?