Auraison — Four-Plane Architecture
Date: 2026-02-23 Updated: 2026-03-02 Status: Approved
Overview
Auraison is structured as four planes following the SDN / telecom separation pattern. Three vertical planes (user, control, management) handle execution, orchestration, and governance respectively. The data plane sits horizontally, serving all three. The planes have fundamentally different latency, consistency, and availability requirements.
| Plane | What runs here | Latency / consistency | Failure consequence |
|---|---|---|---|
| User plane | Customer agents: VLA, Nav2, behavior trees, YOLOv8, SLAM; Cosmos-Reason2 (physical reasoning), Cosmos-Predict2 (world model), Cosmos-Transfer2.5 (sim2real) | Real-time (ms), stateful per-session | Agent stops; robot halts |
| Control plane | Job dispatch, cluster management, experiment tracking, agent lifecycle governance | Seconds, eventually consistent | Degraded visibility; user plane continues |
| Data plane | Lakehouse (DuckDB + DuckLake + MinIO), embeddings, event log | Seconds, eventually consistent | Queries fail; ingestion queued; agents lose context |
| Management plane | Billing, tenancy, quotas, user management | Minutes, strongly consistent | No new deployments; running agents unaffected |
The control plane includes an agent operations subsystem (execution scheduling,
backpressure, guardrails, trace collection) that governs agent behaviour at runtime.
This is implemented as control-plane/backend/agentops/ — a package within the control
plane, not a separate architectural layer.
First principle: User plane failures must not cascade to the control plane, and control plane outages must not halt running agents.
System context (C4 Level 1)
Repository layout
auraison/
├── control-plane/ FastAPI API + Claude Code agent layer + AgentOps subsystem + Next.js UI
├── user-plane/ Agentic workloads: VLA, ROS 2, multi-agent (KubeRay)
├── data-plane/ Lakehouse: DuckDB + DuckLake + MinIO (migrated from aegean-ai/lakehouse)
├── management-plane/ Billing, tenancy, quotas (v2)
├── docs/
│ ├── architecture/ System-level design docs (this directory)
│ ├── plans/ Plane-specific design docs
│ └── decisions/ Cross-cutting ADRs
└── docker-compose.yml Local dev infra (Postgres + Redis)
Communication between planes
See docs/plans/2026-02-23-auraison-control-plane-design.md §"Communication between planes"
for the current v1 contract (subprocess + webhook) and the v1.5/v2 evolution path
(Redis Streams → NATS + Kafka).
A dedicated cross-plane communication design doc is tracked in beads issue auraison-eco.
Reference application: turtlebot-maze
The canonical user-plane application. Demonstrates Claude Code + ros-mcp-server doing
real-time robot control on ros.dev.gpu, extended in v1.5 with the Cosmos model stack:
Claude Code /navigate skill
→ ros-mcp-server (MCP over rosbridge WebSocket :9090)
→ ROS 2 Nav2 action server
→ TurtleBot navigation
Predict → Transfer → Reason → Execute loop (v1.5):
Cosmos-Predict2 (torch.dev.gpu): current frame + action → synthetic trajectory
→ Cosmos-Transfer2.5 (torch.dev.gpu): synthetic → photorealistic
→ Cosmos-Reason2 (ros.dev.gpu): feasibility evaluation → go / no-go
→ Nav2 goal dispatched or behavior tree selects alternative
The control plane manages the ros.dev.gpu and torch.dev.gpu RayCluster lifecycles and
experiment bookkeeping. The control plane does not control the robot in real-time — that
is ros-mcp-server's domain.
Evolution path
v1 — Control plane + user plane operational; data plane migrated to monorepo; synchronous subprocess dispatch
v1.5 — AgentOps subsystem in control plane: execution scheduler, backpressure, trace collector; Redis Streams
Cosmos-Reason2 (ros.dev.gpu): physical reasoning + actuation feasibility gating
Cosmos-Predict2 (torch.dev.gpu): world model inference for pre-execution trajectory simulation
Cosmos-Transfer2.5 (torch.dev.gpu): sim2real augmentation; SDG pipeline → lakehouse datasets
Predict → Transfer → Reason → Execute loop for turtlebot-maze reference application
v2 — NATS (control messages) + Kafka (audit/telemetry); Pydantic AI runtime agents; management plane; data plane RAG
Digital Twins: persistent world model in lakehouse; predicted twin state from Cosmos-Predict2
Cosmos models post-trained on turtlebot-maze ROS bag recordings
v3 — World-model-driven agent governance; VLA training pipeline over SDG lakehouse datasets; feature store