User Plane

AR4-MK3 Digital Twin Design

Date: 2026-03-02 Status: Approved (v1) Epic: auraison-5z3 (Digital Twins) Related: auraison-eh1 (Cosmos-Reason2), auraison-oys (Cosmos-Predict2), auraison-i6l (Cosmos-Transfer2.5), auraison-2a5 (Pydantic AI decoupling)


Relationship to the digital twins design

DocumentPurpose
docs/user-plane/design.mdxCanonical user plane design
docs/user-plane/digital-twins.mdxDigital Twins subsystem design — schema, TwinAgent, TurtleBot reference asset
This documentAR4-MK3 as second reference asset; layered plane decomposition; schema extensions for 6-DOF arms

This document extends the base digital twins design for the AR4-MK3 6-DOF robotic arm. It does not redefine the architecture — read digital-twins.mdx first. What this document adds: critique of two external proposals, the planes × layers decomposition, AR4-specific schema extensions (firmware_versions, joint vector columns), capability metadata for gripper variants, and the 6-step VLA runtime reasoning loop mapped to Auraison planes.


Problem

The AR4-MK3 is a 6-DOF open-source robotic arm (Annin Robotics) with a Teensy 4.1 controller, optional auxiliary Arduino boards, multiple end-effector variants (pneumatic/servo grippers), and an optional 7th axis. It is an ideal second reference asset for the Auraison digital twin framework alongside TurtleBot.

Two external architecture proposals were evaluated:

  1. A C4-style monolithic twin platform (ChatGPT) — vertically integrated, standalone backend
  2. An Open Physical-AI Stack — layered, decoupled, NVIDIA-as-plugin philosophy

Neither maps cleanly to the Auraison four-plane architecture. This design document:

  • Critiques both proposals
  • Maps the AR4 digital twin to the Auraison planes
  • Introduces a layered decomposition within each plane — reconciling the Open Stack's layer model with our plane separation
  • Extends the existing twins/ schema for industrial arm concerns

Critique of external proposals

ChatGPT C4 design — structural problems

The C4 design proposes a monolithic "Digital Twin Backend" containing State Sync, Model Services, Programs, Calibration, and a "Simulation Runtime". When mapped to Auraison:

  1. Conflates plane concerns. State sync is user-plane (real-time, Ray worker writes), model services are data-plane (persistent schema), orchestration is control-plane (TwinAgent). The monolith puts components with fundamentally different latency and consistency requirements in one container.

  2. No persistent world model. The "Event Log" is mentioned but not designed. Our lakehouse twins accumulate state across jobs. The C4 design treats telemetry as logging, not memory.

  3. No learned world model. The "Shadow Executor" replays programs deterministically. Our Cosmos stack (Predict2 → Transfer2.5 → Reason2) generates visual predictions from learned models — fundamentally different from program replay.

  4. No agent architecture. Who orchestrates the twin lifecycle? No equivalent of TwinAgent. Operations are implicit.

  5. Missing data plane entirely. No lakehouse, no persistent schema, no query layer.

  6. "Twin UI" as separate container doesn't scale. Our Next.js dashboard is a control-plane surface that renders any asset type.

What it gets right: Variant handling (gripper types, extra axis) as a first-class concern. Version coupling (firmware/software/sketch). ROS 2 as a first-class integration path.

Open Physical-AI Stack — better foundation, incomplete mapping

The Open Stack's core principle — "simulation and control must be independent from intelligence" — is correct and maps to our plane separation. Its five layers (World, Control, AI Runtime, World Model, Memory) are a useful decomposition.

Gaps when mapped to Auraison:

  1. No control plane. The Open Stack has no orchestration layer. Who dispatches VLA inference jobs? Who manages the twin lifecycle? Who handles experiment tracking?

  2. No management plane. No billing, no quotas, no access control.

  3. "Memory" is underspecified. "Store trajectories, failures, sensor traces" is correct but needs a concrete schema (our twins/ Parquet tables).

  4. Policy Server placement ambiguous. The Open Stack shows it as a peer of VLA and World Model, but doesn't specify compute placement. In Auraison, the Policy Server runs on torch.dev.gpu as a Ray Serve endpoint — user plane, not control plane.

  5. MoveIt2 as safety layer is correctly identified but not placed in any plane. It belongs in the user plane alongside ros2_control.


Architecture: layered planes

The key insight from this design: each plane contains multiple layers. The Open Stack's layers map into the planes as rows. The planes remain the primary separation (different latency, consistency, failure domains). The layers provide internal structure within each plane.

Diagram 1: Planes (columns) × Layers (rows)

Planes (columns) by layers (rows). User Plane column stacks Layer A — World (Gazebo Harmonic · AR4 Teensy; physics truth · /joint_states · camera) → Layer B — Robot Control (ROS 2 Jazzy · ros2_control · MoveIt2; trajectory exec · IK · collision check) → Layer C — AI Runtime (vLLM inference via Zenoh queryable; AR4: LeRobot VLA ACT → Pi0 → GR00T) → Layer D — World Model (Cosmos-Predict2 · Transfer2.5 · Cosmos-Reason2). Control Plane column: Orchestration (TwinAgent · AgentOps subsystem) → API (FastAPI /api/v1/twins, /api/v1/inference) → UI (Next.js dashboard, twin state viewer); Orchestration also → Store (Postgres · Redis). Data Plane column: Twin Memory (twins/ Parquet tables; state · sensors · events) → Lakehouse (DuckDB · DuckLake); Version Registry (firmware_versions table; software ↔ sketch coupling) → Lakehouse; SDG Datasets (Cosmos-augmented training data) → Lakehouse. Management Plane (v2) column: Billing (per-asset GPU hours), Quotas (asset limits), Observability (Logfire traces · W&B), Access Control (per-twin ACLs). Cross-plane edges: Layer D — World Model sends predicted snapshots to Twin Memory; Layer A — World sends in-job writes to Twin Memory; Orchestration dispatches jobs to Layer C — AI Runtime and does sync / query to Lakehouse; API spawns agents into Orchestration; Billing governs (v2, dashed) the API; Observability sends traces (dashed) to Orchestration.

Editable Mermaid source: images/ar4-digital-twin-planes-layers.mermaid.md

Diagram 2: Layers mapped to KubeRay clusters

AR4 layers mapped to KubeRay clusters. ros.dev.gpu RayCluster: Layer A (Gazebo · AR4 Teensy), Layer B (ros2_control · MoveIt2), Layer D partial (Cosmos-Reason2). torch.dev.gpu RayCluster: Layer C (vLLM · LeRobot VLA), Layer D partial (Cosmos-Predict2), Layer D partial (Cosmos-Transfer2.5). Data Plane — Lakehouse: Layer E (twins/ tables · firmware_versions). Edges: Layer A sends /joint_states and /camera to Layer B; Layer B sends observation to Layer C; Layer C sends proposed action to Cosmos-Predict2; Cosmos-Predict2 sends synthetic video to Cosmos-Transfer2.5; Cosmos-Transfer2.5 sends photorealistic to Cosmos-Reason2; Cosmos-Reason2 sends go/no-go to Layer B; Layer A writes in-job writes to Layer E; Cosmos-Transfer2.5 writes predicted snapshots to Layer E.

Editable Mermaid source: images/ar4-digital-twin-layers-clusters.mermaid.md

Layer mapping table

Open Stack LayerAuraison PlaneClusterComponents
A — WorldUser planeros.dev.gpuGazebo Harmonic, AR4 Teensy serial bridge, /joint_states, camera topics
B — ControlUser planeros.dev.gpuros2_control (trajectory, PID), MoveIt2 (IK, collision, constraints)
C — AI RuntimeUser planetorch.dev.gpuvLLM inference via Zenoh queryable; AR4 impl: LeRobot (lerobot-ros / AnninAR4), VLA model (ACT → Pi0 → GR00T)
D — World ModelUser planesplitCosmos-Predict2 + Transfer2.5 (torch.dev.gpu), Cosmos-Reason2 (ros.dev.gpu)
E — MemoryData planetwins/ Parquet tables, DuckDB
(none) — OrchestrationControl planeTwinAgent, FastAPI, AgentOps, InferenceAgent (v2)
(none) — GovernanceManagement planeBilling, quotas, observability (v2)

Key principles preserved:

  • NVIDIA Cosmos models are plugins in the user plane, not infrastructure.
  • Layer C is a generic inference serving layer (vLLM + Zenoh queryable), not tied to any specific VLA framework. Each reference application plugs in its own model backend: AR4 uses LeRobot (ACT → Pi0 → GR00T), turtlebot-maze uses Cosmos stack, counter-uas uses perception/tracking models. The platform does not constrain the AI runtime.
  • VLA backends are swappable without touching ROS 2 or the control plane.

AR4 as second reference asset

The AR4-MK3 is registered in the existing twins/assets table alongside TurtleBot. It does not need a new architecture — it needs AR4-specific capability metadata and schema extensions.

Asset registration

TwinAgent.create_twin(
  asset_id="ar4-mk3-01",
  asset_type="robot",
  urdf_path="user-plane/ar4/urdf/ar4_mk3.urdf",
  metadata={
    "manufacturer": "Annin Robotics",
    "model": "AR4-MK3",
    "dof": 6,
    "controller": "teensy_4.1",
    "gripper_type": "servo",       # or "pneumatic"
    "extra_axis": false,
    "firmware_version": "4.2.0",
    "software_version": "6.3",
    "aux_sketch_version": null
  }
)

Capability model

The ChatGPT design correctly identifies variant handling as critical. The AR4's variants (gripper type, extra axis) are encoded in twins/assets.metadata as a capability model:

{
  "capabilities": {
    "gripper": {"type": "servo", "io_pins": [12, 13], "state_machine": "open_close"},
    "extra_axis": {"enabled": false, "range_deg": null, "steps_per_deg": null},
    "controller": {"type": "teensy_4.1", "protocol": "serial", "baud": 115200},
    "aux_board": {"type": null, "sketch_version": null}
  }
}

This is configuration-driven behavior, not code branching. The TwinAgent reads capabilities to determine which sensors to expect, which state machine governs the gripper, and whether a 7th axis exists.


Schema extensions

New table: twins/firmware_versions

Tracks firmware/software/sketch version history per asset. Current version lives in assets.metadata; this table provides the audit trail.

ColumnTypeDescription
version_idVARCHAR PKUUID
asset_idVARCHAR FK → assets
firmware_versionVARCHARTeensy sketch version
software_versionVARCHARAR4 desktop control software version
aux_sketch_versionVARCHARNULL if no aux board
ros2_driver_versionVARCHARNULL if not using ROS 2
validatedBOOLEANTrue if versions are known-compatible
recorded_atTIMESTAMP
recorded_byVARCHARAgent or operator

Extended twins/state_snapshots for 6-DOF arm

The existing state_snapshots schema uses position_x/y/z + quaternion for mobile robots. For a 6-DOF arm, we additionally need the joint vector:

ColumnTypeDescription
joint_positionsDOUBLE[]Array of joint angles (radians), length = DOF
joint_velocitiesDOUBLE[]Array of joint velocities (rad/s)
joint_torquesDOUBLE[]Array of estimated torques (Nm), NULL if not available
gripper_stateVARCHARopen | closed | moving | unknown
gripper_positionDOUBLE0.0 (closed) to 1.0 (open) for servo; NULL for pneumatic
end_effector_poseJSON{x, y, z, qx, qy, qz, qw} in world frame (FK-derived)
moveit_plan_idVARCHARMoveIt2 trajectory ID that produced this motion, NULL if manual

These columns are added to the existing table. For TurtleBot (mobile base), joint_positions is NULL. For AR4 (arm), position_x/y/z is NULL (the base doesn't move). The schema accommodates both via nullable columns — no separate tables needed.

Extended twins/events event types

AR4-specific event types:

arm.homed               — startup homing procedure completed
arm.calibrated          — calibration offsets recorded
arm.estop               — emergency stop triggered
arm.limit_reached       — joint limit hit (payload: {joint, limit_type, value})
gripper.opened          — gripper opened
gripper.closed          — gripper closed
program.loaded          — motion program loaded (payload: {program_id, version})
program.executed        — motion program execution completed
program.diverged        — predicted vs actual trajectory divergence flagged
firmware.updated        — firmware version changed
moveit.plan_generated   — MoveIt2 generated a trajectory plan
moveit.collision_check  — collision check result (payload: {passed, obstacles})

Runtime reasoning loop (AR4 on Auraison)

The Open Stack's 6-step loop mapped to Auraison planes:

Step 1 — Perception (User plane, ros.dev.gpu)
  ROS 2 topics: /joint_states, /camera/rgb, /camera/depth
  Ray worker on ros.dev.gpu subscribes via Zenoh bridge
  In-job writes: state_snapshots + sensor_readings → lakehouse (data plane)

Step 2 — Observation formatting (User plane, torch.dev.gpu)
  vLLM inference endpoint (via Zenoh queryable) receives observation:
    obs = \{image: rgb, joints: q, task: instruction\}

Step 3 — VLA proposes action (User plane, torch.dev.gpu)
  VLA model (LeRobot ACT / Pi0 / GR00T) outputs: "move end-effector +3cm, close gripper"

Step 4 — World model evaluates (User plane, split)
  Cosmos-Predict2 (torch.dev.gpu): current frame + action → predicted trajectory video
  Cosmos-Transfer2.5 (torch.dev.gpu): synthetic → photorealistic
  Cosmos-Reason2 (ros.dev.gpu): feasibility evaluation
  → Predicted snapshots written to data plane (source=predicted)

Step 5 — MoveIt validates (User plane, ros.dev.gpu)
  MoveIt2: collision check, IK, trajectory generation
  → moveit.plan_generated event written to data plane

Step 6 — Execute (User plane, ros.dev.gpu)
  ros2_control: trajectory execution via Teensy serial bridge
  → Observed state_snapshots written to data plane (source=ros_job)

Post-job — Reconciliation (Control plane)
  TwinAgent.sync_twin("ar4-mk3-01", job_id):
    Compare predicted vs observed snapshots
    Flag divergences as program.diverged events
    Update firmware_versions if changed
    Set reconciled=True on validated snapshots

Data flow diagram

AR4 data flow. User Plane — ros.dev.gpu: Gazebo Harmonic + AR4 URDF, Physical AR4 (Teensy serial), ROS 2 Jazzy (ros2_control · MoveIt2), Cosmos-Reason2 (feasibility). User Plane — torch.dev.gpu: vLLM Inference (Zenoh queryable), LeRobot VLA (ACT → Pi0 → GR00T), Cosmos-Predict2, Cosmos-Transfer2.5. Control Plane: TwinAgent, PolicyAgent, FastAPI /api/v1/twins. Data Plane: Lakehouse (twins/ tables), firmware_versions. Edges: Gazebo sends /joint_states and /camera to ROS 2 Jazzy; Physical AR4 sends serial to ROS 2 Jazzy; ROS 2 Jazzy sends observation to vLLM Inference; vLLM Inference feeds LeRobot VLA; LeRobot VLA sends proposed action to Cosmos-Predict2; Cosmos-Predict2 sends synthetic video to Cosmos-Transfer2.5; Cosmos-Transfer2.5 sends photorealistic to Cosmos-Reason2; Cosmos-Reason2 sends go/no-go to ROS 2 Jazzy; ROS 2 Jazzy writes in-job writes to Lakehouse; Cosmos-Transfer2.5 writes predicted snapshots to Lakehouse; TwinAgent does post-job sync to Lakehouse and version audit to firmware_versions; FastAPI spawns TwinAgent and dispatches inference to vLLM Inference.

Editable Mermaid source: images/ar4-digital-twin-data-flow.mermaid.md


AR4-specific concerns

Teensy serial bridge

The AR4's Teensy 4.1 communicates via serial USB. In our architecture, this is a ROS 2 hardware interface plugin in ros2_control — same pattern as any ROS 2 robot. The Teensy bridge runs on ros.dev.gpu as part of the ROS 2 stack, not as a separate container.

MoveIt2 as safety layer

The Open Stack correctly identifies MoveIt2 as the critical safety layer between VLA intent and physical execution. VLA outputs high-level actions ("move gripper here"); MoveIt2 translates these into safe trajectories with collision checking and joint limit enforcement. This is Layer B in the user plane — it never leaves ros.dev.gpu.

Version coupling validation

On every job start, the TwinAgent validates that the firmware/software/sketch versions recorded in twins/assets.metadata match the versions reported by the Teensy controller. Mismatches are flagged as firmware.updated events and require re-validation before the job proceeds.

Gripper state machine

Pneumatic and servo grippers have different state machines:

  • Pneumatic: binary (open/closed), controlled by digital IO pins
  • Servo: continuous (0.0–1.0 position), controlled by PWM

The capabilities.gripper.state_machine field in assets.metadata determines which state machine governs gripper_state and gripper_position in state_snapshots.


Evolution path

v1   — AR4 registered as second reference asset; URDF in Gazebo; in-job writes + post-job
       reconciliation; firmware_versions table; variant handling via capabilities metadata
v1.5 — vLLM inference on torch.dev.gpu via Zenoh queryable; LeRobot VLA (ACT); MoveIt2
       safety layer; Cosmos Predict → Transfer → Reason → Execute loop for AR4
       Redis hot-cache for real-time joint state (6-DOF at 100Hz+)
v2   — Pi0 / GR00T as VLA backend (swappable via vLLM + Zenoh abstraction)
       Cosmos post-trained on AR4 manipulation datasets
       MoveIt2 collision checks feed Reason2 feasibility scoring
       Program repository: versioned motion programs with provenance
       Pydantic AI TwinAgent + InferenceAgent (control plane migration)

Files to create / modify

user-plane/ar4/
  urdf/ar4_mk3.urdf              AR4-MK3 URDF model
  config/ar4_controllers.yaml     ros2_control configuration
  config/ar4_moveit.yaml          MoveIt2 configuration

control-plane/backend/
  agents/twin_agent.py            Extend for firmware_versions + capabilities
  agents/inference_agent.py       New (v2): dispatches inference jobs via vLLM + Zenoh
  api/twins.py                    Extend for /predict endpoint + firmware validation
  models/twin.py                  Extend for AR4 capability model

data-plane/
  schema/twins/firmware_versions/ Schema definition for new table