AR4-MK3 Digital Twin Design
Date: 2026-03-02 Status: Approved (v1) Epic: auraison-5z3 (Digital Twins) Related: auraison-eh1 (Cosmos-Reason2), auraison-oys (Cosmos-Predict2), auraison-i6l (Cosmos-Transfer2.5), auraison-2a5 (Pydantic AI decoupling)
Relationship to the digital twins design
| Document | Purpose |
|---|---|
docs/user-plane/design.mdx | Canonical user plane design |
docs/user-plane/digital-twins.mdx | Digital Twins subsystem design — schema, TwinAgent, TurtleBot reference asset |
| This document | AR4-MK3 as second reference asset; layered plane decomposition; schema extensions for 6-DOF arms |
This document extends the base digital twins design for the AR4-MK3 6-DOF robotic arm. It does not redefine the architecture — read digital-twins.mdx first. What this document adds: critique of two external proposals, the planes × layers decomposition, AR4-specific schema extensions (firmware_versions, joint vector columns), capability metadata for gripper variants, and the 6-step VLA runtime reasoning loop mapped to Auraison planes.
Problem
The AR4-MK3 is a 6-DOF open-source robotic arm (Annin Robotics) with a Teensy 4.1 controller, optional auxiliary Arduino boards, multiple end-effector variants (pneumatic/servo grippers), and an optional 7th axis. It is an ideal second reference asset for the Auraison digital twin framework alongside TurtleBot.
Two external architecture proposals were evaluated:
- A C4-style monolithic twin platform (ChatGPT) — vertically integrated, standalone backend
- An Open Physical-AI Stack — layered, decoupled, NVIDIA-as-plugin philosophy
Neither maps cleanly to the Auraison four-plane architecture. This design document:
- Critiques both proposals
- Maps the AR4 digital twin to the Auraison planes
- Introduces a layered decomposition within each plane — reconciling the Open Stack's layer model with our plane separation
- Extends the existing
twins/schema for industrial arm concerns
Critique of external proposals
ChatGPT C4 design — structural problems
The C4 design proposes a monolithic "Digital Twin Backend" containing State Sync, Model Services, Programs, Calibration, and a "Simulation Runtime". When mapped to Auraison:
-
Conflates plane concerns. State sync is user-plane (real-time, Ray worker writes), model services are data-plane (persistent schema), orchestration is control-plane (TwinAgent). The monolith puts components with fundamentally different latency and consistency requirements in one container.
-
No persistent world model. The "Event Log" is mentioned but not designed. Our lakehouse twins accumulate state across jobs. The C4 design treats telemetry as logging, not memory.
-
No learned world model. The "Shadow Executor" replays programs deterministically. Our Cosmos stack (Predict2 → Transfer2.5 → Reason2) generates visual predictions from learned models — fundamentally different from program replay.
-
No agent architecture. Who orchestrates the twin lifecycle? No equivalent of TwinAgent. Operations are implicit.
-
Missing data plane entirely. No lakehouse, no persistent schema, no query layer.
-
"Twin UI" as separate container doesn't scale. Our Next.js dashboard is a control-plane surface that renders any asset type.
What it gets right: Variant handling (gripper types, extra axis) as a first-class concern. Version coupling (firmware/software/sketch). ROS 2 as a first-class integration path.
Open Physical-AI Stack — better foundation, incomplete mapping
The Open Stack's core principle — "simulation and control must be independent from intelligence" — is correct and maps to our plane separation. Its five layers (World, Control, AI Runtime, World Model, Memory) are a useful decomposition.
Gaps when mapped to Auraison:
-
No control plane. The Open Stack has no orchestration layer. Who dispatches VLA inference jobs? Who manages the twin lifecycle? Who handles experiment tracking?
-
No management plane. No billing, no quotas, no access control.
-
"Memory" is underspecified. "Store trajectories, failures, sensor traces" is correct but needs a concrete schema (our
twins/Parquet tables). -
Policy Server placement ambiguous. The Open Stack shows it as a peer of VLA and World Model, but doesn't specify compute placement. In Auraison, the Policy Server runs on
torch.dev.gpuas a Ray Serve endpoint — user plane, not control plane. -
MoveIt2 as safety layer is correctly identified but not placed in any plane. It belongs in the user plane alongside ros2_control.
Architecture: layered planes
The key insight from this design: each plane contains multiple layers. The Open Stack's layers map into the planes as rows. The planes remain the primary separation (different latency, consistency, failure domains). The layers provide internal structure within each plane.
Diagram 1: Planes (columns) × Layers (rows)
Editable Mermaid source: images/ar4-digital-twin-planes-layers.mermaid.md
Diagram 2: Layers mapped to KubeRay clusters
Editable Mermaid source: images/ar4-digital-twin-layers-clusters.mermaid.md
Layer mapping table
| Open Stack Layer | Auraison Plane | Cluster | Components |
|---|---|---|---|
| A — World | User plane | ros.dev.gpu | Gazebo Harmonic, AR4 Teensy serial bridge, /joint_states, camera topics |
| B — Control | User plane | ros.dev.gpu | ros2_control (trajectory, PID), MoveIt2 (IK, collision, constraints) |
| C — AI Runtime | User plane | torch.dev.gpu | vLLM inference via Zenoh queryable; AR4 impl: LeRobot (lerobot-ros / AnninAR4), VLA model (ACT → Pi0 → GR00T) |
| D — World Model | User plane | split | Cosmos-Predict2 + Transfer2.5 (torch.dev.gpu), Cosmos-Reason2 (ros.dev.gpu) |
| E — Memory | Data plane | — | twins/ Parquet tables, DuckDB |
| (none) — Orchestration | Control plane | — | TwinAgent, FastAPI, AgentOps, InferenceAgent (v2) |
| (none) — Governance | Management plane | — | Billing, quotas, observability (v2) |
Key principles preserved:
- NVIDIA Cosmos models are plugins in the user plane, not infrastructure.
- Layer C is a generic inference serving layer (vLLM + Zenoh queryable), not tied to any specific VLA framework. Each reference application plugs in its own model backend: AR4 uses LeRobot (ACT → Pi0 → GR00T), turtlebot-maze uses Cosmos stack, counter-uas uses perception/tracking models. The platform does not constrain the AI runtime.
- VLA backends are swappable without touching ROS 2 or the control plane.
AR4 as second reference asset
The AR4-MK3 is registered in the existing twins/assets table alongside TurtleBot. It does
not need a new architecture — it needs AR4-specific capability metadata and schema extensions.
Asset registration
Capability model
The ChatGPT design correctly identifies variant handling as critical. The AR4's variants
(gripper type, extra axis) are encoded in twins/assets.metadata as a capability model:
This is configuration-driven behavior, not code branching. The TwinAgent reads capabilities to determine which sensors to expect, which state machine governs the gripper, and whether a 7th axis exists.
Schema extensions
New table: twins/firmware_versions
Tracks firmware/software/sketch version history per asset. Current version lives in
assets.metadata; this table provides the audit trail.
| Column | Type | Description |
|---|---|---|
version_id | VARCHAR PK | UUID |
asset_id | VARCHAR FK → assets | |
firmware_version | VARCHAR | Teensy sketch version |
software_version | VARCHAR | AR4 desktop control software version |
aux_sketch_version | VARCHAR | NULL if no aux board |
ros2_driver_version | VARCHAR | NULL if not using ROS 2 |
validated | BOOLEAN | True if versions are known-compatible |
recorded_at | TIMESTAMP | |
recorded_by | VARCHAR | Agent or operator |
Extended twins/state_snapshots for 6-DOF arm
The existing state_snapshots schema uses position_x/y/z + quaternion for mobile robots.
For a 6-DOF arm, we additionally need the joint vector:
| Column | Type | Description |
|---|---|---|
joint_positions | DOUBLE[] | Array of joint angles (radians), length = DOF |
joint_velocities | DOUBLE[] | Array of joint velocities (rad/s) |
joint_torques | DOUBLE[] | Array of estimated torques (Nm), NULL if not available |
gripper_state | VARCHAR | open | closed | moving | unknown |
gripper_position | DOUBLE | 0.0 (closed) to 1.0 (open) for servo; NULL for pneumatic |
end_effector_pose | JSON | {x, y, z, qx, qy, qz, qw} in world frame (FK-derived) |
moveit_plan_id | VARCHAR | MoveIt2 trajectory ID that produced this motion, NULL if manual |
These columns are added to the existing table. For TurtleBot (mobile base), joint_positions
is NULL. For AR4 (arm), position_x/y/z is NULL (the base doesn't move). The schema
accommodates both via nullable columns — no separate tables needed.
Extended twins/events event types
AR4-specific event types:
Runtime reasoning loop (AR4 on Auraison)
The Open Stack's 6-step loop mapped to Auraison planes:
Data flow diagram
Editable Mermaid source: images/ar4-digital-twin-data-flow.mermaid.md
AR4-specific concerns
Teensy serial bridge
The AR4's Teensy 4.1 communicates via serial USB. In our architecture, this is a ROS 2
hardware interface plugin in ros2_control — same pattern as any ROS 2 robot. The Teensy
bridge runs on ros.dev.gpu as part of the ROS 2 stack, not as a separate container.
MoveIt2 as safety layer
The Open Stack correctly identifies MoveIt2 as the critical safety layer between VLA intent
and physical execution. VLA outputs high-level actions ("move gripper here"); MoveIt2
translates these into safe trajectories with collision checking and joint limit enforcement.
This is Layer B in the user plane — it never leaves ros.dev.gpu.
Version coupling validation
On every job start, the TwinAgent validates that the firmware/software/sketch versions
recorded in twins/assets.metadata match the versions reported by the Teensy controller.
Mismatches are flagged as firmware.updated events and require re-validation before the
job proceeds.
Gripper state machine
Pneumatic and servo grippers have different state machines:
- Pneumatic: binary (open/closed), controlled by digital IO pins
- Servo: continuous (0.0–1.0 position), controlled by PWM
The capabilities.gripper.state_machine field in assets.metadata determines which state
machine governs gripper_state and gripper_position in state_snapshots.