AR4-MK3 Digital Twin — Design Document

Date: 2026-03-02 Status: Approved (v1) Epic: auraison-5z3 (Digital Twins) Related: auraison-eh1 (Cosmos-Reason2), auraison-oys (Cosmos-Predict2), auraison-i6l (Cosmos-Transfer2.5), auraison-2a5 (Pydantic AI decoupling)

Problem

The AR4-MK3 is a 6-DOF open-source robotic arm (Annin Robotics) with a Teensy 4.1 controller, optional auxiliary Arduino boards, multiple end-effector variants (pneumatic/servo grippers), and an optional 7th axis. It is an ideal second reference asset for the Auraison digital twin framework alongside TurtleBot.

Two external architecture proposals were evaluated:

A C4-style monolithic twin platform (ChatGPT) — vertically integrated, standalone backend
An Open Physical-AI Stack — layered, decoupled, NVIDIA-as-plugin philosophy

Neither maps cleanly to the Auraison four-plane architecture. This design document:

Critiques both proposals
Maps the AR4 digital twin to the Auraison planes
Introduces a layered decomposition within each plane — reconciling the Open Stack's layer model with our plane separation
Extends the existing twins/ schema for industrial arm concerns

Critique of external proposals

ChatGPT C4 design — structural problems

The C4 design proposes a monolithic "Digital Twin Backend" containing State Sync, Model Services, Programs, Calibration, and a "Simulation Runtime". When mapped to Auraison:

Conflates plane concerns. State sync is user-plane (real-time, Ray worker writes), model services are data-plane (persistent schema), orchestration is control-plane (TwinAgent). The monolith puts components with fundamentally different latency and consistency requirements in one container.
No persistent world model. The "Event Log" is mentioned but not designed. Our lakehouse twins accumulate state across jobs. The C4 design treats telemetry as logging, not memory.
No learned world model. The "Shadow Executor" replays programs deterministically. Our Cosmos stack (Predict2 → Transfer2.5 → Reason2) generates visual predictions from learned models — fundamentally different from program replay.
No agent architecture. Who orchestrates the twin lifecycle? No equivalent of TwinAgent. Operations are implicit.
Missing data plane entirely. No lakehouse, no persistent schema, no query layer.
"Twin UI" as separate container doesn't scale. Our Next.js dashboard is a control-plane surface that renders any asset type.

What it gets right: Variant handling (gripper types, extra axis) as a first-class concern. Version coupling (firmware/software/sketch). ROS 2 as a first-class integration path.

Open Physical-AI Stack — better foundation, incomplete mapping

The Open Stack's core principle — "simulation and control must be independent from intelligence" — is correct and maps to our plane separation. Its five layers (World, Control, AI Runtime, World Model, Memory) are a useful decomposition.

Gaps when mapped to Auraison:

No control plane. The Open Stack has no orchestration layer. Who dispatches VLA inference jobs? Who manages the twin lifecycle? Who handles experiment tracking?
No management plane. No billing, no quotas, no access control.
"Memory" is underspecified. "Store trajectories, failures, sensor traces" is correct but needs a concrete schema (our twins/ Parquet tables).
Policy Server placement ambiguous. The Open Stack shows it as a peer of VLA and World Model, but doesn't specify compute placement. In Auraison, the Policy Server runs on torch.dev.gpu as a Ray Serve endpoint — user plane, not control plane.
MoveIt2 as safety layer is correctly identified but not placed in any plane. It belongs in the user plane alongside ros2_control.

Architecture: layered planes

The key insight from this design: each plane contains multiple layers. The Open Stack's layers map into the planes as rows. The planes remain the primary separation (different latency, consistency, failure domains). The layers provide internal structure within each plane.

Diagram 1: Planes (columns) × Layers (rows)

Diagram 2: Layers mapped to KubeRay clusters

Layer mapping table

Open Stack Layer	Auraison Plane	Cluster	Components
A — World	User plane	`ros.dev.gpu`	Gazebo Harmonic, AR4 Teensy serial bridge, `/joint_states`, camera topics
B — Control	User plane	`ros.dev.gpu`	ros2_control (trajectory, PID), MoveIt2 (IK, collision, constraints)
C — AI Runtime	User plane	`torch.dev.gpu`	Policy Server (Ray Serve), VLA model (OpenVLA / GR00T), swappable
D — World Model	User plane	split	Cosmos-Predict2 + Transfer2.5 (`torch.dev.gpu`), Cosmos-Reason2 (`ros.dev.gpu`)
E — Memory	Data plane	—	`twins/` Parquet tables, DuckDB, MinIO
(none) — Orchestration	Control plane	—	TwinAgent, PolicyAgent, FastAPI, AgentOps
(none) — Governance	Management plane	—	Billing, quotas, observability (v2)

Key principle preserved: NVIDIA Cosmos models are plugins in the user plane, not infrastructure. The Policy Server abstraction means VLA backends are swappable (OpenVLA → GR00T → custom) without touching ROS 2 or the control plane.

AR4 as second reference asset

The AR4-MK3 is registered in the existing twins/assets table alongside TurtleBot. It does not need a new architecture — it needs AR4-specific capability metadata and schema extensions.

Asset registration

TwinAgent.create_twin(
  asset_id="ar4-mk3-01",
  asset_type="robot",
  urdf_path="user-plane/ar4/urdf/ar4_mk3.urdf",
  metadata={
    "manufacturer": "Annin Robotics",
    "model": "AR4-MK3",
    "dof": 6,
    "controller": "teensy_4.1",
    "gripper_type": "servo",       # or "pneumatic"
    "extra_axis": false,
    "firmware_version": "4.2.0",
    "software_version": "6.3",
    "aux_sketch_version": null
  }
)

Capability model

The ChatGPT design correctly identifies variant handling as critical. The AR4's variants (gripper type, extra axis) are encoded in twins/assets.metadata as a capability model:

{
  "capabilities": {
    "gripper": {"type": "servo", "io_pins": [12, 13], "state_machine": "open_close"},
    "extra_axis": {"enabled": false, "range_deg": null, "steps_per_deg": null},
    "controller": {"type": "teensy_4.1", "protocol": "serial", "baud": 115200},
    "aux_board": {"type": null, "sketch_version": null}
  }
}

This is configuration-driven behavior, not code branching. The TwinAgent reads capabilities to determine which sensors to expect, which state machine governs the gripper, and whether a 7th axis exists.

Schema extensions

New table: `twins/firmware_versions`

Tracks firmware/software/sketch version history per asset. Current version lives in assets.metadata; this table provides the audit trail.

Column	Type	Description
`version_id`	VARCHAR PK	UUID
`asset_id`	VARCHAR FK → assets
`firmware_version`	VARCHAR	Teensy sketch version
`software_version`	VARCHAR	AR4 desktop control software version
`aux_sketch_version`	VARCHAR	NULL if no aux board
`ros2_driver_version`	VARCHAR	NULL if not using ROS 2
`validated`	BOOLEAN	True if versions are known-compatible
`recorded_at`	TIMESTAMP
`recorded_by`	VARCHAR	Agent or operator

Extended `twins/state_snapshots` for 6-DOF arm

The existing state_snapshots schema uses position_x/y/z + quaternion for mobile robots. For a 6-DOF arm, we additionally need the joint vector:

Column	Type	Description
`joint_positions`	DOUBLE[]	Array of joint angles (radians), length = DOF
`joint_velocities`	DOUBLE[]	Array of joint velocities (rad/s)
`joint_torques`	DOUBLE[]	Array of estimated torques (Nm), NULL if not available
`gripper_state`	VARCHAR	`open \| closed \| moving \| unknown`
`gripper_position`	DOUBLE	0.0 (closed) to 1.0 (open) for servo; NULL for pneumatic
`end_effector_pose`	JSON	`{x, y, z, qx, qy, qz, qw}` in world frame (FK-derived)
`moveit_plan_id`	VARCHAR	MoveIt2 trajectory ID that produced this motion, NULL if manual

These columns are added to the existing table. For TurtleBot (mobile base), joint_positions is NULL. For AR4 (arm), position_x/y/z is NULL (the base doesn't move). The schema accommodates both via nullable columns — no separate tables needed.

Extended `twins/events` event types

AR4-specific event types:

arm.homed               — startup homing procedure completed
arm.calibrated          — calibration offsets recorded
arm.estop               — emergency stop triggered
arm.limit_reached       — joint limit hit (payload: {joint, limit_type, value})
gripper.opened          — gripper opened
gripper.closed          — gripper closed
program.loaded          — motion program loaded (payload: {program_id, version})
program.executed        — motion program execution completed
program.diverged        — predicted vs actual trajectory divergence flagged
firmware.updated        — firmware version changed
moveit.plan_generated   — MoveIt2 generated a trajectory plan
moveit.collision_check  — collision check result (payload: {passed, obstacles})

Runtime reasoning loop (AR4 on Auraison)

The Open Stack's 6-step loop mapped to Auraison planes:

Step 1 — Perception (User plane, ros.dev.gpu)
  ROS 2 topics: /joint_states, /camera/rgb, /camera/depth
  Ray worker on ros.dev.gpu subscribes via Zenoh bridge
  In-job writes: state_snapshots + sensor_readings → MinIO (data plane)

Step 2 — Observation formatting (User plane, torch.dev.gpu)
  Policy Server (Ray Serve) receives observation:
    obs = {image: rgb, joints: q, task: instruction}

Step 3 — VLA proposes action (User plane, torch.dev.gpu)
  VLA model (OpenVLA / GR00T) outputs: "move end-effector +3cm, close gripper"

Step 4 — World model evaluates (User plane, split)
  Cosmos-Predict2 (torch.dev.gpu): current frame + action → predicted trajectory video
  Cosmos-Transfer2.5 (torch.dev.gpu): synthetic → photorealistic
  Cosmos-Reason2 (ros.dev.gpu): feasibility evaluation
  → Predicted snapshots written to data plane (source=predicted)

Step 5 — MoveIt validates (User plane, ros.dev.gpu)
  MoveIt2: collision check, IK, trajectory generation
  → moveit.plan_generated event written to data plane

Step 6 — Execute (User plane, ros.dev.gpu)
  ros2_control: trajectory execution via Teensy serial bridge
  → Observed state_snapshots written to data plane (source=ros_job)

Post-job — Reconciliation (Control plane)
  TwinAgent.sync_twin("ar4-mk3-01", job_id):
    Compare predicted vs observed snapshots
    Flag divergences as program.diverged events
    Update firmware_versions if changed
    Set reconciled=True on validated snapshots

Data flow diagram

AR4-specific concerns

Teensy serial bridge

The AR4's Teensy 4.1 communicates via serial USB. In our architecture, this is a ROS 2 hardware interface plugin in ros2_control — same pattern as any ROS 2 robot. The Teensy bridge runs on ros.dev.gpu as part of the ROS 2 stack, not as a separate container.

MoveIt2 as safety layer

The Open Stack correctly identifies MoveIt2 as the critical safety layer between VLA intent and physical execution. VLA outputs high-level actions ("move gripper here"); MoveIt2 translates these into safe trajectories with collision checking and joint limit enforcement. This is Layer B in the user plane — it never leaves ros.dev.gpu.

Version coupling validation

On every job start, the TwinAgent validates that the firmware/software/sketch versions recorded in twins/assets.metadata match the versions reported by the Teensy controller. Mismatches are flagged as firmware.updated events and require re-validation before the job proceeds.

Gripper state machine

Pneumatic and servo grippers have different state machines:

Pneumatic: binary (open/closed), controlled by digital IO pins
Servo: continuous (0.0–1.0 position), controlled by PWM

The capabilities.gripper.state_machine field in assets.metadata determines which state machine governs gripper_state and gripper_position in state_snapshots.

Evolution path

v1   — AR4 registered as second reference asset; URDF in Gazebo; in-job writes + post-job
       reconciliation; firmware_versions table; variant handling via capabilities metadata
v1.5 — Policy Server on torch.dev.gpu (Ray Serve); VLA inference (OpenVLA); MoveIt2 safety
       layer; Cosmos Predict → Transfer → Reason → Execute loop for AR4
       Redis hot-cache for real-time joint state (6-DOF at 100Hz+)
v2   — GR00T as VLA backend (swappable via Policy Server abstraction)
       Cosmos post-trained on AR4 manipulation datasets
       MoveIt2 collision checks feed Reason2 feasibility scoring
       Program repository: versioned motion programs with provenance
       Pydantic AI TwinAgent + PolicyAgent (control plane migration)

Files to create / modify

user-plane/ar4/
  urdf/ar4_mk3.urdf              AR4-MK3 URDF model
  config/ar4_controllers.yaml     ros2_control configuration
  config/ar4_moveit.yaml          MoveIt2 configuration

control-plane/backend/
  agents/twin_agent.py            Extend for firmware_versions + capabilities
  agents/policy_agent.py          New: dispatches VLA inference jobs to Policy Server
  api/twins.py                    Extend for /predict endpoint + firmware validation
  models/twin.py                  Extend for AR4 capability model

data-plane/
  schema/twins/firmware_versions/ Schema definition for new table

Problem​

Critique of external proposals​

ChatGPT C4 design — structural problems​

Open Physical-AI Stack — better foundation, incomplete mapping​

Architecture: layered planes​

Diagram 1: Planes (columns) × Layers (rows)​

Diagram 2: Layers mapped to KubeRay clusters​

Layer mapping table​

AR4 as second reference asset​

Asset registration​

Capability model​

Schema extensions​

New table: twins/firmware_versions​

Extended twins/state_snapshots for 6-DOF arm​

Extended twins/events event types​

Runtime reasoning loop (AR4 on Auraison)​

Data flow diagram​

AR4-specific concerns​

Teensy serial bridge​

MoveIt2 as safety layer​

Version coupling validation​

Gripper state machine​

Evolution path​

Files to create / modify​