Auraison — Four-Plane Architecture

Date: 2026-02-23 Updated: 2026-03-02 Status: Approved

Overview

Auraison is structured as four planes following the SDN / telecom separation pattern. Three vertical planes (user, control, management) handle execution, orchestration, and governance respectively. The data plane sits horizontally, serving all three. The planes have fundamentally different latency, consistency, and availability requirements.

Plane	What runs here	Latency / consistency	Failure consequence
User plane	Customer agents: VLA, Nav2, behavior trees, YOLOv8, SLAM; Cosmos-Reason2 (physical reasoning), Cosmos-Predict2 (world model), Cosmos-Transfer2.5 (sim2real)	Real-time (ms), stateful per-session	Agent stops; robot halts
Control plane	Job dispatch, cluster management, experiment tracking, agent lifecycle governance	Seconds, eventually consistent	Degraded visibility; user plane continues
Data plane	Lakehouse (DuckDB + DuckLake + MinIO), embeddings, event log	Seconds, eventually consistent	Queries fail; ingestion queued; agents lose context
Management plane	Billing, tenancy, quotas, user management	Minutes, strongly consistent	No new deployments; running agents unaffected

The control plane includes an agent operations subsystem (execution scheduling, backpressure, guardrails, trace collection) that governs agent behaviour at runtime. This is implemented as control-plane/backend/agentops/ — a package within the control plane, not a separate architectural layer.

First principle: User plane failures must not cascade to the control plane, and control plane outages must not halt running agents.

System context (C4 Level 1)

Repository layout

auraison/
├── control-plane/     FastAPI API + Claude Code agent layer + AgentOps subsystem + Next.js UI
├── user-plane/        Agentic workloads: VLA, ROS 2, multi-agent (KubeRay)
├── data-plane/        Lakehouse: DuckDB + DuckLake + MinIO (migrated from aegean-ai/lakehouse)
├── management-plane/  Billing, tenancy, quotas (v2)
├── docs/
│   ├── architecture/  System-level design docs (this directory)
│   ├── plans/         Plane-specific design docs
│   └── decisions/     Cross-cutting ADRs
└── docker-compose.yml Local dev infra (Postgres + Redis)

Communication between planes

See docs/plans/2026-02-23-auraison-control-plane-design.md §"Communication between planes" for the current v1 contract (subprocess + webhook) and the v1.5/v2 evolution path (Redis Streams → NATS + Kafka).

A dedicated cross-plane communication design doc is tracked in beads issue auraison-eco.

Reference application: turtlebot-maze

The canonical user-plane application. Demonstrates Claude Code + ros-mcp-server doing real-time robot control on ros.dev.gpu, extended in v1.5 with the Cosmos model stack:

Claude Code /navigate skill
  → ros-mcp-server (MCP over rosbridge WebSocket :9090)
  → ROS 2 Nav2 action server
  → TurtleBot navigation

Predict → Transfer → Reason → Execute loop (v1.5):
  Cosmos-Predict2 (torch.dev.gpu): current frame + action → synthetic trajectory
  → Cosmos-Transfer2.5 (torch.dev.gpu): synthetic → photorealistic
  → Cosmos-Reason2 (ros.dev.gpu): feasibility evaluation → go / no-go
  → Nav2 goal dispatched or behavior tree selects alternative

The control plane manages the ros.dev.gpu and torch.dev.gpu RayCluster lifecycles and experiment bookkeeping. The control plane does not control the robot in real-time — that is ros-mcp-server's domain.

Evolution path

v1   — Control plane + user plane operational; data plane migrated to monorepo; synchronous subprocess dispatch
v1.5 — AgentOps subsystem in control plane: execution scheduler, backpressure, trace collector; Redis Streams
       Cosmos-Reason2 (ros.dev.gpu): physical reasoning + actuation feasibility gating
       Cosmos-Predict2 (torch.dev.gpu): world model inference for pre-execution trajectory simulation
       Cosmos-Transfer2.5 (torch.dev.gpu): sim2real augmentation; SDG pipeline → lakehouse datasets
       Predict → Transfer → Reason → Execute loop for turtlebot-maze reference application
v2   — NATS (control messages) + Kafka (audit/telemetry); Pydantic AI runtime agents; management plane; data plane RAG
       Digital Twins: persistent world model in lakehouse; predicted twin state from Cosmos-Predict2
       Cosmos models post-trained on turtlebot-maze ROS bag recordings
v3   — World-model-driven agent governance; VLA training pipeline over SDG lakehouse datasets; feature store

Overview​

System context (C4 Level 1)​

Repository layout​

Communication between planes​

Reference application: turtlebot-maze​

Evolution path​