Digital Twins Design
Date: 2026-03-02 Status: Approved (v1) Epic: auraison-5z3
Relationship to the user plane design
| Document | Purpose |
|---|---|
docs/user-plane/design.mdx | Canonical user plane design — KubeRay environments, reference applications, interfaces |
| This document | Digital Twins subsystem design — persistent world model spanning user plane, data plane, and control plane |
docs/user-plane/ar4-digital-twin.mdx | AR4-MK3 as second reference asset; layered plane decomposition; schema extensions |
Digital twins are a cross-plane feature: the user plane writes live state, the data plane stores it, and the control plane (TwinAgent subprocess) reconciles and queries it. This document specifies all three sides of that interaction. The TurtleBot on ros.dev.gpu is the v1 reference asset.
Problem
Agentic workloads in the user plane produce rich runtime state — robot pose, sensor readings, navigation events, perception outputs — but this state is ephemeral. It lives inside a RayJob for its lifetime and is lost when the job completes. The control plane has no persistent, queryable model of the physical world its agents are acting on.
A digital twin is a persistent, structured representation of a physical asset that accumulates state over time. For Auraison, twins are the bridge between the transient execution world of the user plane and the durable memory world of the data plane. They enable:
- Historical replay: reconstruct what the robot was doing during any past job
- Causal analysis: correlate agent decisions with physical state at decision time
- Predictive modelling: feed twin state into Cosmos-Predict2 + Cosmos-Transfer2.5 to forecast photorealistic future states
- Agent memory: let control-plane agents read world state without polling live sensors
- Reasoning substrate: serve twin state snapshots as visual context to Cosmos-Reason2 for physics-grounded feasibility evaluation
Goals
- Persist physical asset state (pose, sensor readings, events) in the data-plane lakehouse
- Provide a TwinAgent subprocess for control-plane agents to create, sync, query, and retire twins
- Expose twin state via a FastAPI router consistent with existing API conventions
- Demonstrate end-to-end with the TurtleBot reference asset on
ros.dev.gpu
Non-goals (v1)
- Real-time sub-second twin state (deferred to v1.5 with Redis hot-cache)
- Multi-tenant asset isolation (management plane — v2)
- Cosmos-Predict2 / Cosmos-Transfer2.5 predicted twin state snapshots (v1.5)
- Physics simulation driven from twin state (v1.5 / v2)
- Visualisation UI in the Next.js dashboard (v2)
Architecture
Digital twins span three planes:
Editable Mermaid source: images/digital-twins-cross-plane-architecture.mermaid.md
TwinAgent
TwinAgent is a claude -p subprocess in control-plane/backend/agents/twin_agent.py,
following the same pattern as LakehouseAgent and ClusterAgent.
Tool scope: Bash(duckdb *), Bash(python *), Read
Operations exposed to the control plane:
| Operation | Description |
|---|---|
create_twin(asset_id, asset_type, urdf_path) | Register a new asset; create lakehouse tables if absent |
sync_twin(twin_id, job_id) | Post-job reconciliation: read ROS bag / W&B outputs; append validated snapshots |
query_twin(twin_id, query) | DuckDB query over twin tables; returns Arrow/pandas result |
get_twin_state(twin_id, at=None) | Latest state snapshot, or point-in-time if at provided |
annotate_twin(twin_id, annotation) | Append an annotation record |
predict_twin(twin_id, action, horizon_s) | Call Cosmos-Predict2 → Cosmos-Transfer2.5 with latest observed state; store predicted snapshots (source=predicted/sim2real) |
retire_twin(twin_id) | Mark asset as retired; preserve history |
Python wrapper interface:
Each function constructs a prompt and calls run_agent() from agents/base.py with
ALLOWED_TOOLS = "Bash(duckdb *),Bash(python *),Read".
Schema
Seven Parquet tables under the twins/ prefix in the lakehouse. DuckDB reads them via the existing
DuckLake configuration.
twins/assets
Asset registry — one row per physical asset.
| Column | Type | Description |
|---|---|---|
asset_id | VARCHAR PK | Stable identifier (e.g. turtlebot-01) |
asset_type | VARCHAR | robot | drone | sensor_node | custom |
display_name | VARCHAR | Human-readable name |
status | VARCHAR | active | retired |
created_at | TIMESTAMP | |
retired_at | TIMESTAMP | NULL if active |
metadata | JSON | Arbitrary key-value pairs |
twins/urdf_assets
URDF / CAD model references, versioned.
| Column | Type | Description |
|---|---|---|
urdf_id | VARCHAR PK | |
asset_id | VARCHAR FK → assets | |
version | VARCHAR | Semantic version string |
urdf_path | VARCHAR | Path in the lakehouse or local repo |
format | VARCHAR | urdf | xacro | sdf | obj |
uploaded_at | TIMESTAMP | |
checksum | VARCHAR | SHA-256 of file |
twins/state_snapshots
Point-in-time pose snapshots.
| Column | Type | Description |
|---|---|---|
snapshot_id | VARCHAR PK | UUID |
asset_id | VARCHAR FK → assets | |
job_id | VARCHAR FK → twin_jobs | |
timestamp | TIMESTAMP | Time of observation |
source | VARCHAR | ros_job | manual | simulation | predicted | sim2real |
position_x | DOUBLE | Metres, world frame |
position_y | DOUBLE | |
position_z | DOUBLE | |
orientation_qx | DOUBLE | Quaternion |
orientation_qy | DOUBLE | |
orientation_qz | DOUBLE | |
orientation_qw | DOUBLE | |
linear_velocity | DOUBLE | m/s |
angular_velocity | DOUBLE | rad/s |
reconciled | BOOLEAN | True after post-job TwinAgent validation |
cosmos_model | VARCHAR | NULL for observed; predict2 | transfer2.5 for generated snapshots |
predicted_from_snapshot_id | VARCHAR FK → state_snapshots | Seed snapshot used by Cosmos-Predict2; NULL for observed |
twins/sensor_readings
Time-series sensor data (separate from pose to keep state_snapshots lean).
| Column | Type | Description |
|---|---|---|
reading_id | VARCHAR PK | UUID |
asset_id | VARCHAR FK → assets | |
job_id | VARCHAR FK → twin_jobs | |
sensor_type | VARCHAR | imu | lidar | camera | gps | odometry |
timestamp | TIMESTAMP | |
payload | JSON | Sensor-specific structured data |
raw_path | VARCHAR | Path to raw file in the lakehouse (e.g. ROS bag slice) |
twins/events
Discrete twin lifecycle and runtime events.
| Column | Type | Description |
|---|---|---|
event_id | VARCHAR PK | UUID |
asset_id | VARCHAR FK → assets | |
job_id | VARCHAR FK → twin_jobs | NULL for lifecycle events |
event_type | VARCHAR | twin.created | twin.synced | twin.retired | nav.goal_set | nav.goal_reached | nav.obstacle_detected | … |
timestamp | TIMESTAMP | |
actor | VARCHAR | Agent or system that generated the event |
payload | JSON | Event-specific data |
twins/twin_jobs
Link table — maps twin to jobs that produced state.
| Column | Type | Description |
|---|---|---|
twin_job_id | VARCHAR PK | UUID |
twin_id | VARCHAR FK → assets | |
job_id | VARCHAR | Control-plane job UUID |
ray_job_id | VARCHAR | Ray job ID |
environment | VARCHAR | torch.dev.gpu | ros.dev.gpu |
started_at | TIMESTAMP | |
completed_at | TIMESTAMP | |
wandb_run_id | VARCHAR | NULL if not tracked |
sync_status | VARCHAR | pending | synced | failed |
twins/annotations
Human or agent annotations on twin state.
| Column | Type | Description |
|---|---|---|
annotation_id | VARCHAR PK | UUID |
asset_id | VARCHAR FK → assets | |
snapshot_id | VARCHAR FK → state_snapshots | NULL for job-level annotations |
author | VARCHAR | Agent name or user email |
annotation_type | VARCHAR | label | anomaly_flag | note | review |
content | VARCHAR | Free text or JSON |
created_at | TIMESTAMP |
Data flow — v1
In-job writes (Ray worker → lakehouse)
During a ros.dev.gpu RayJob the worker writes live data directly to the lakehouse:
The worker uses the lakehouse S3 endpoint configured in data-plane/ (same credentials as
the LakehouseAgent) and writes to a job-specific partition:
twins/state_snapshots/job_id=\{job_id\}/part-0.parquet.
Post-job reconciliation (TwinAgent)
When the Ray worker completes, the control plane calls sync_twin(twin_id, job_id).
The TwinAgent subprocess:
- Reads in-job Parquet partitions from the lakehouse
- Reads the W&B run (if linked) for additional metrics
- Validates schema integrity; flags anomalies in
events - Sets
reconciled=Trueon validated snapshots - Merges job partition into main table (or leaves partitioned — DuckDB handles both)
- Appends a
twin.syncedevent record - Updates
twin_jobs.sync_status = 'synced'
API
New router: control-plane/backend/api/twins.py, mounted at /api/v1/twins.
Pydantic models in control-plane/backend/models/twin.py:
Twin, TwinCreate, TwinState, TwinSyncRequest, TwinAnnotation.
Reference asset: TurtleBot
The v1 end-to-end demo twins turtlebot-01 on ros.dev.gpu:
Evolution path
Files to create
main.py — mount twins router alongside existing jobs, clusters, experiments, lakehouse.