Skip to main content

Auraison — Four-Plane Architecture

Date: 2026-02-23 Updated: 2026-03-02 Status: Approved


Overview

Auraison is structured as four planes following the SDN / telecom separation pattern. Three vertical planes (user, control, management) handle execution, orchestration, and governance respectively. The data plane sits horizontally, serving all three. The planes have fundamentally different latency, consistency, and availability requirements.

PlaneWhat runs hereLatency / consistencyFailure consequence
User planeCustomer agents: VLA, Nav2, behavior trees, YOLOv8, SLAM; Cosmos-Reason2 (physical reasoning), Cosmos-Predict2 (world model), Cosmos-Transfer2.5 (sim2real)Real-time (ms), stateful per-sessionAgent stops; robot halts
Control planeJob dispatch, cluster management, experiment tracking, agent lifecycle governanceSeconds, eventually consistentDegraded visibility; user plane continues
Data planeLakehouse (DuckDB + DuckLake + MinIO), embeddings, event logSeconds, eventually consistentQueries fail; ingestion queued; agents lose context
Management planeBilling, tenancy, quotas, user managementMinutes, strongly consistentNo new deployments; running agents unaffected

The control plane includes an agent operations subsystem (execution scheduling, backpressure, guardrails, trace collection) that governs agent behaviour at runtime. This is implemented as control-plane/backend/agentops/ — a package within the control plane, not a separate architectural layer.

First principle: User plane failures must not cascade to the control plane, and control plane outages must not halt running agents.


System context (C4 Level 1)


Repository layout

auraison/
├── control-plane/ FastAPI API + Claude Code agent layer + AgentOps subsystem + Next.js UI
├── user-plane/ Agentic workloads: VLA, ROS 2, multi-agent (KubeRay)
├── data-plane/ Lakehouse: DuckDB + DuckLake + MinIO (migrated from aegean-ai/lakehouse)
├── management-plane/ Billing, tenancy, quotas (v2)
├── docs/
│ ├── architecture/ System-level design docs (this directory)
│ ├── plans/ Plane-specific design docs
│ └── decisions/ Cross-cutting ADRs
└── docker-compose.yml Local dev infra (Postgres + Redis)

Communication between planes

See docs/plans/2026-02-23-auraison-control-plane-design.md §"Communication between planes" for the current v1 contract (subprocess + webhook) and the v1.5/v2 evolution path (Redis Streams → NATS + Kafka).

A dedicated cross-plane communication design doc is tracked in beads issue auraison-eco.


Reference application: turtlebot-maze

The canonical user-plane application. Demonstrates Claude Code + ros-mcp-server doing real-time robot control on ros.dev.gpu, extended in v1.5 with the Cosmos model stack:

Claude Code /navigate skill
→ ros-mcp-server (MCP over rosbridge WebSocket :9090)
→ ROS 2 Nav2 action server
→ TurtleBot navigation

Predict → Transfer → Reason → Execute loop (v1.5):
Cosmos-Predict2 (torch.dev.gpu): current frame + action → synthetic trajectory
→ Cosmos-Transfer2.5 (torch.dev.gpu): synthetic → photorealistic
→ Cosmos-Reason2 (ros.dev.gpu): feasibility evaluation → go / no-go
→ Nav2 goal dispatched or behavior tree selects alternative

The control plane manages the ros.dev.gpu and torch.dev.gpu RayCluster lifecycles and experiment bookkeeping. The control plane does not control the robot in real-time — that is ros-mcp-server's domain.


Evolution path

v1   — Control plane + user plane operational; data plane migrated to monorepo; synchronous subprocess dispatch
v1.5 — AgentOps subsystem in control plane: execution scheduler, backpressure, trace collector; Redis Streams
Cosmos-Reason2 (ros.dev.gpu): physical reasoning + actuation feasibility gating
Cosmos-Predict2 (torch.dev.gpu): world model inference for pre-execution trajectory simulation
Cosmos-Transfer2.5 (torch.dev.gpu): sim2real augmentation; SDG pipeline → lakehouse datasets
Predict → Transfer → Reason → Execute loop for turtlebot-maze reference application
v2 — NATS (control messages) + Kafka (audit/telemetry); Pydantic AI runtime agents; management plane; data plane RAG
Digital Twins: persistent world model in lakehouse; predicted twin state from Cosmos-Predict2
Cosmos models post-trained on turtlebot-maze ROS bag recordings
v3 — World-model-driven agent governance; VLA training pipeline over SDG lakehouse datasets; feature store