System Architecture
Status: Proposed (RFC) · Updated: 2026-06-15
Auraison is an agentic platform built around two cooperating agent runtimes, and a key design consideration is that each runtime's agent harness is configurable — the runtime is the abstraction (its contract and its place in the domain model); the harness that implements it is a deployment choice:
- an edge runtime: one or more application-local agents, each domain-specific and frequently running in parallel (typically one per reference application), doing latency-sensitive work close to the application. The current working harness is pi.dev — a minimal, extensible harness that runs Claude and other providers through typed primitives (skills, tools, session state, events); every edge action is a Pi intent executed against one of four operational domains.
- a central runtime: a single, durable, stateful agent that centralizes the functions worth solving once for the whole fleet — it owns the shared orchestration state and exposes cross-cutting services (model routing, planning, optimization) to the edge agents. Its current working harness is a Claude Managed Agent (CMA) integrated with Cloudflare developer tooling — the Agent ADK (Agents SDK) and Dynamic Workers.
Because harnesses are pluggable behind each runtime's contract, either side can be re-hosted (a different edge harness, a different central agent) without changing the domain model. The runtime contract (§1.1) remains the system's ontology; the four domains are projections and adapters of it.
Editable diagram source: images/pi-runtime-c4.c4.yaml (co-located with the rendered SVG) — rendered with the c4-diagram skill (c4svg).
Governance governs the central runtime; the central runtime (§1.2) coordinates the edge runtime and holds the shared, stateful core of the Orchestration and Governance domains. Each edge runtime is the hub for the three domains it drives: Memory is a derived projection of its event log; Execution and Orchestration are driven peers it sends intents to and observes events from. The sections below define the runtimes and each domain.
1. Agent runtimes
Auraison runs a dual-runtime model: an edge runtime (many domain-specific agents) plus one central runtime. Each runtime's agent harness is configurable — the runtime contract is fixed; the harness implementing it is a deployment choice, so a runtime can be re-hosted on a different harness without disturbing the domain model.
1.1 Edge runtime (per application)
Current working harness: pi.dev. Pi is a CLI/SDK agent harness built from a small core: Skills (loaded on demand for context efficiency), Extensions (TypeScript modules that add tools, commands, and event hooks), Session Trees (branching, replayable run history), and embed modes (TUI, print/JSON, RPC, SDK), multi-provider and switchable mid-session. It ships no MCP, sub-agents, permission dialogs, or background processes by design — those are built as extensions. The platform runs multiple Pi agents concurrently — typically one per reference application (turtlebot-maze, ar4-physical-ai, DEA, …) — each scoped to its domain and its own session tree.
Runtime contract
The platform owns a typed contract over Pi's primitives. These six schemas are defined before any transport or infrastructure is chosen:
| Primitive | Backed by (Pi) | Typed schema |
|---|---|---|
| Intent | a command / skill invocation | intent_id, type, params, policy_ctx, targets[] |
| Skill | Pi Skill (instructions + tools) | platform agents are packaged as Skills |
| ToolCall | Pi tool | tool, input_schema, output_schema, result |
| RunState | Pi Session Tree | run_id, intent_id, tree_ref, status |
| AgentEvent | Pi events (Extension hooks) | event_id, run_id, kind, payload, ts |
| ProjectionEvent | derived from the AgentEvent / Session log | event_id, domain, projection, payload, ts |
| Policy hooks | Pi Extensions | guardrail / quota / tenancy checks |
| Projection adapters | Pi RPC / SDK + Extensions | drive intents into domains; ingest events back |
Design principles
- The runtime is the ontology; the four domains are projections and adapters, not the system's structure.
- Define the contract (the six schemas) before selecting any transport or streaming infrastructure.
- Operations such as scheduling, guardrails, and tool bridges are runtime extensions, not a separate architectural layer.
Adoption
- Define the six typed schemas as the runtime contract.
- Package existing Claude Code / Pydantic-AI agents as Pi Skills.
- Build orchestration (scheduling, backpressure, guardrails, MCP bridges, sub-agents) as Pi Extensions.
- Keep ROS / Ray / vLLM / Zenoh as execution adapters.
Rollout: v1 runtime + execution operational, synchronous dispatch · v1.5 runtime operations + Cosmos perception loop · v2 full runtime contract, governance domain, foundation VLAs · v3 runtime-driven governance, VLA training over lakehouse datasets.
Requirements. SYS-001: the system runs two cooperating runtimes — an edge runtime and a central runtime, each with a configurable harness; the four domains are adapters/projections of the runtime contract. SYS-008: the current edge harness runs on Pi (multi-provider; Claude via the Anthropic provider). SYS-015: the runtime exposes the six typed primitives, defined before transport. SYS-018: transport/streaming selection follows the contract. SYS-006: evolution follows v1 → v1.5 → v2 → v3.
1.2 Central runtime (shared)
Current working harness: a Claude Managed Agent (CMA) integrated with Cloudflare developer tooling — the Agent ADK (Agents SDK) for the agent surface and Dynamic Workers for on-demand compute. The edge runtimes are deliberately thin: anything that benefits from a single, shared, durable view is hoisted into this central agent. It is stateful (Durable-Object-backed), globally reachable, and long-lived, so it can hold the shared orchestration state the fleet coordinates around — which agents exist, what each is working on, shared plans, and budgets — without any single edge agent owning it.
Note — Cloudflare provides a self-managed environment for Claude Managed Agents. The agent loop runs on the Anthropic platform, while Cloudflare provides the runtime — sandboxes, egress control, browser access, email, and custom tools — that the agent's actions execute in. So the central runtime's harness is split: Anthropic owns the reasoning loop; Cloudflare owns the environment those actions touch.
Open design item — the CMA agent loop is event-based. We need to understand the event-based architecture of the Claude Managed Agent loop — how it emits and consumes events — and reconcile it with the platform's
AgentEvent/ProjectionEventcontract (§1.1) so that central-runtime and edge-runtime events share one model. Reference: Build a production-ready agent with Claude Managed Agents ↗.
Functions centralized here (incrementally; each is opt-in for the edge agents):
- Model gateway (first). A Cloudflare AI Gateway fronts all model traffic, giving one
place for observability, caching, rate limiting, retries, and provider fallback. pi.dev
consumes it natively as a provider —
CLOUDFLARE_API_KEY,CLOUDFLARE_ACCOUNT_ID,CLOUDFLARE_GATEWAY_ID, routing through the OpenAI / Anthropic / Workers AI providers, plus OpenAI-compatible custom providers for self-hosted models such as local vLLM (pi.dev providers). The local vLLM ↔ AI Gateway wiring is tracked in AURA-705. - Per-invocation model selection / optimization. Pick the right model and decoding params for each call — cost / latency / quality routing — rather than hard-coding one model per agent.
- Global & PDDL task planning. Cross-agent planning, including symbolic PDDL task planning, where a global plan is decomposed into intents dispatched to the relevant local agents.
- Input / context optimization. Optimize the prompt and retrieved context handed to each agent (compression, selection, formatting) before invocation.
The boundary rule: latency-sensitive, domain-specific reasoning stays at the edge; globally shared, "solve-once" concerns move central. An edge agent must keep functioning (degraded) if the central runtime is unreachable — the central runtime coordinates and optimizes, it is not on the hard real-time control path.
Requirements. SYS-019: two cooperating runtimes — an edge runtime (N domain-specific agents) plus one central runtime; each runtime's harness is configurable (current: pi.dev at the edge; a Claude Managed Agent with Cloudflare Agent ADK / Dynamic Workers centrally). SYS-020: the central runtime owns shared orchestration state and hosts the centralized cross-cutting functions. SYS-021: the first centralized function is the Cloudflare AI Gateway, consumed by the edge harness as a provider (see AURA-705); model selection/optimization, global/PDDL planning, and input optimization follow. SYS-022: edge agents degrade gracefully and stay operational when the central runtime is unavailable (it is off the hard real-time path).
1.3 Locality model — edge vs central is not purely topological
"Edge" and "central" are not just physical locations. Each runtime is better characterized by where each concern lives across several independent locality dimensions — and because harnesses are configurable, a given deployment can push some dimensions central and keep others at the edge. The split is a gradient across these dimensions, not a single boundary.
| Dimension | Question | Edge runtime | Central runtime |
|---|---|---|---|
| Inference locality | Where does the model run? (local GPU, edge GPU, cloud API, Workers AI, remote vLLM) | local/edge GPU or remote vLLM (torch.dev.gpu / ros.dev.gpu); cloud API reachable via the gateway | cloud API (Anthropic, for the CMA loop) + Workers AI / routed providers via the AI Gateway |
| Control-plane locality | Where does the agent loop run? (browser, laptop, server, Cloudflare Worker/DO, Kubernetes) | on the application host — robot, server, or Kubernetes (KubeRay) | the CMA loop runs on the Anthropic platform; its actions execute in the Cloudflare runtime (Worker / Durable Object) |
| State locality | Where is memory/session state? (local files, Durable Objects, Postgres, Redis, vector DB) | local session-tree files + the data-plane lakehouse (Postgres catalog) | Cloudflare Durable Objects (shared, global) |
| Tool locality | Where are tools executed? (laptop, SaaS backend, robot, browser, server) | close to hardware — robot / ROS nodes, local server, vLLM | the Cloudflare-provided environment — sandboxes, browser, email, custom Worker tools |
| Data locality | Where is the retrieved/private data? (laptop, cloud DB, S3/R2, lab server) | lab server + lakehouse (RustFS local, R2) | cloud DB / R2 (centralized) |
| User locality | Where is the user relative to the model/API/tool endpoints? | co-located / same LAN as the robot and GPU | remote — reaches the central agent over the network (global edge) |
2. Execution domain
A driven peer: where Pi-driven work runs — ROS, Ray, VLA inference, notebooks, and
simulations on the torch.dev.gpu and ros.dev.gpu clusters. Pi sends intents in and
observes telemetry/events out; the domain holds its own authoritative state. Real-time
(ms), stateful per session; on failure an agent stops or a robot halts.
Reference applications
Robotics apps are independent repos under aegean-ai, deployed onto KubeRay clusters.
- turtlebot-maze — navigation via Pi + ros-mcp-server on
ros.dev.gpu. v1.5 adds the Cosmos loop: Predict2 (world model) → Transfer2.5 (sim2real) → Reason2 (feasibility) → Nav2 execute. - ar4-physical-ai — VLA manipulation for the AR4 arm on LeRobot +
lerobot-ros, Zenoh middleware, Gazebo Harmonic sim. VLA path: ACT → cross-embodiment transfer → Pi0/GR00T. - Deep Evidence Agent (
aegean-ai/dea) — multi-agent evidence-grounded reasoning over engineering artifacts (Planner / Researcher / Critic / Synthesizer) with GraphRAG retrieval. - counter-uas (v2) — VisDrone aerial perception + UE5 simulation + GRID hardware; detection/tracking, not VLA or navigation.
| Concern | turtlebot-maze | ar4-physical-ai | Deep Evidence Agent | counter-uas (v2) |
|---|---|---|---|---|
| Robot framework | Nav2 + behavior trees | MoveIt2 + ros2_control | — | GRID platform |
| Model | Cosmos stack | LeRobot VLA (ACT → Pi0) | Multi-agent LLM | Detection/tracking |
| Middleware | DDS | Zenoh + DDS bridge | runtime API | Zenoh + DDS bridge |
| Sim | Gazebo | Gazebo Harmonic | — | Unreal Engine 5 |
| Data pipeline | ROS bag → lakehouse | LeRobot → HF Hub | artifacts → lakehouse | VisDrone/UE5 → lakehouse |
Middleware: Zenoh and vLLM
Zenoh is transport (pub/sub/query, microsecond wire latency, DDS bridge); vLLM is compute
(GPU inference). Robot nodes reach inference through a Zenoh queryable — a thin adapter
forwards zenoh.get("ai/inference/vla", obs) to vLLM's async engine and returns the result,
so ROS 2 nodes need no HTTP and no knowledge of vLLM's location. This realizes the dual-speed
pattern: System 2 (vLLM on torch.dev.gpu) for planning, System 1 (VLA action heads
on ros.dev.gpu) for real-time control. Evaluate rmw_zenoh as a full DDS replacement once
it leaves experimental status.
Requirements. SYS-003: four reference applications. SYS-004: Zenoh is the standard non-ROS transport. SYS-005: dual-speed inference (System 2 planning / System 1 control).
3. Orchestration domain
A driven peer: scheduling, run lifecycle, job dispatch, backpressure, guardrails, and health. In the dual-runtime model the shared, stateful core of this domain is realized by the central runtime (§1.2) — it holds cross-agent orchestration state and runs the centralized planning/routing functions; per-agent control still runs as harness extensions inside each edge runtime. Seconds, eventually consistent; on failure visibility degrades but running agents continue.
First principle: execution failures must not cascade to orchestration, and orchestration outages must not halt running agents.
Requirements. SYS-002: failure isolation between execution and orchestration. SYS-016: control operations (scheduling, backpressure, guardrails, MCP bridges, sub-agent orchestration) are implemented as Pi Extensions/Skills.
Deferred infrastructure (re-decided only after the runtime contract): control-event streaming (SYS-009 control tier, SYS-010 StatusEvents-as-streams, SYS-012 streaming over a persistent MCP connection) is frozen — these were selected before the contract existed.
4. Memory domain
A derived projection: a read model rebuilt from the AgentEvent / Session log. Built on DuckDB + DuckLake (PostgreSQL catalog) over object storage (RustFS local + Cloudflare R2 for exposed buckets). Holds traces, datasets, digital twins, retrieval indexes, and event history. Seconds, eventually consistent; on failure queries fail and agents lose context, but execution continues.
Requirements. SYS-007: DuckDB + DuckLake over RustFS/R2. SYS-017: Memory is derived from the AgentEvent / Session log (unlike the driven Execution and Orchestration domains).
Deferred infrastructure: analytics/audit streaming (SYS-011 trace/audit stream, SYS-013 Flink stream processing) is frozen — subsumed by the AgentEvent log plus this projection until the contract is settled.
5. Governance domain
Governs the runtime via policy hooks: tenancy, quotas, policy, billing, and audit (v2). Minutes, strongly consistent; on failure no new deployments occur but running agents are unaffected. Policy decisions enter the runtime as Pi Extension hooks on intents.
Requirements. Governance requirements are v2 (MP- prefix in the management-plane design
doc), traced to SYS-001/SYS-017.
Deferred infrastructure: multi-tenant topic isolation (SYS-014, Apache Pulsar) is frozen until the contract and tenancy model are settled.