Auraison — Data Plane Design
Date: 2026-02-23 Status: Approved (v1 — migrated from aegean-ai/lakehouse to data-plane/)
Problem
The original three-plane model (user / control / management) governs control flow cleanly but does not
govern data flow. As agentic workloads grow, a structural gap emerges: the lakehouse
(currently in the separate aegean-ai/lakehouse repo) is used simultaneously as:
- Storage for user-plane outputs (perception data, job results, telemetry)
- Memory substrate for control-plane agents (job history, cluster failure patterns)
- Training corpus for model fine-tuning (VLA, classification models)
- Observability archive (agent traces, experiment results)
No single plane owns this. The control plane's LakehouseAgent reaches into it; the user
plane writes to it; the management plane governs access to it. The lakehouse is not a
single-plane component — it spans planes, and needs its own architectural treatment.
The deeper issue: in agentic systems, the data flow direction is reversed relative to traditional software.
Traditional: logic → data
Agentic: data → reasoning → action
Data becomes the substrate of cognition. The lakehouse is not analytics infrastructure — it is the persistent world model of the system. It needs a dedicated plane.
Definition
The data plane governs: movement, storage, transformation, and accessibility of data across the entire system. It is orthogonal to reasoning (control plane) or execution (user plane).
The data plane sits horizontally — all other planes interact with it:
Goals
- Provide a unified persistent storage substrate for all planes
- Formalise the
LakehouseAgentas the control-plane API boundary to the data plane - Define ingestion pipelines from the user plane (structured, versioned, lineaged)
- Enable semantic retrieval for control-plane agents (RAG over job history and agent traces)
- Support world-model snapshots for AgentOps checkpointing and causal replay
- Serve as training data substrate for VLA and ML model fine-tuning (v3)
- Consolidate
aegean-ai/lakehouseinto this monorepo underdata-plane/
Non-goals
- Real-time message passing — that is Zenoh / NATS / DDS (transport, not storage)
- Governance policy definition — that is the management plane
- Agent reasoning or query planning — that is the control plane
- Job execution — that is the user plane
Migration: aegean-ai/lakehouse → data-plane/
The aegean-ai/lakehouse repo contained the Python package scaffold, DuckLake schema
reference, infrastructure config (MinIO + PostgreSQL), tests, and design docs. It has been
consolidated into this monorepo.
Migration completed
| What | Source | Destination |
|---|---|---|
| Design docs | aegean-ai/lakehouse/docs/plans/ | docs/plans/ (this monorepo) |
| Python package scaffold | aegean-ai/lakehouse/ | data-plane/ |
| Infrastructure | aegean-ai/lakehouse/docker-compose.yml | data-plane/docker-compose.yml |
| Tests | aegean-ai/lakehouse/tests/ | data-plane/tests/ |
| CLAUDE.md | aegean-ai/lakehouse/CLAUDE.md | data-plane/CLAUDE.md |
LakehouseAgent update
LakehouseAgent was updated from dbt CLI (incorrect assumption) to duckdb and
python -m lakehouse (correct tooling for DuckDB + DuckLake):
# Before:
ALLOWED_TOOLS = "Bash(dbt *),Read,Edit"
# After:
ALLOWED_TOOLS = "Bash(duckdb *),Bash(python *),Bash(docker *),Read,Edit"
DATA_PLANE_DIR = REPO_ROOT / "data-plane"
Remaining step
Archive aegean-ai/lakehouse on GitHub (mark read-only) with a README redirect to this
monorepo. This is a manual GitHub operation.
Architecture
Storage layers
| Layer | Technology | Contents | Access pattern |
|---|---|---|---|
| Lakehouse | DuckDB + DuckLake (PostgreSQL catalog) + MinIO S3 | Structured outputs, experiment results, job history | SQL (DuckDB in-process), Parquet partitions in MinIO |
| Object store | MinIO (local) / S3 / Cloudflare R2 | Raw files: fMP4 chunks, GeoParquet, point clouds, model checkpoints | Blob read/write via s3fs |
| Embeddings store | pgvector / Chroma (v2) | Dense vectors for semantic retrieval (agent traces, docs) | ANN search |
| Feature store | Feast / custom Parquet (v3) | Structured ML features for VLA and classification models | Batch + online |
| Event log | Append-only Parquet partitions in MinIO landing/ | Agent traces, Zenoh event recordings, ROS bag metadata | Append write, batch read |
| World model snapshots | DuckLake snapshots (v2) | Point-in-time environment state for AgentOps | Write on checkpoint, read on replay |
DuckDB + DuckLake as the query and catalog layer
DuckDB is the in-process analytical query engine. DuckLake is the transactional catalog:
ATTACH 'ducklake:postgresql://...' exposes tables whose metadata lives in PostgreSQL and
whose data lives in MinIO as Parquet fragments. The full DuckLake schema (174 catalog tables)
is in data-plane/tests/ducklake-schema.sql.
Key catalog tables:
experiments— experiment registry (id, project, description, created_at)simulation_runs— per-simulator run records with S3 prefix, status, configducklake_table,ducklake_data_file— DuckLake's own catalog metadata
The LakehouseAgent (Bash(duckdb *), Bash(python *), Read, Edit) is the control plane's
operator interface: it runs DuckDB queries, inspects the catalog, and calls
python -m lakehouse commands. It is not a transformation pipeline — it is a catalog
operator and query runner.
Data flow
User plane → data plane (ingestion)
User-plane workers (Ray jobs on torch.dev.gpu and ros.dev.gpu) write outputs to the
data plane on job completion:
In v1, ingestion is manual (Ray worker writes output files; LakehouseAgent registers them
via DuckDB). In v1.5, the ingestion API is a lightweight FastAPI endpoint in
data-plane/lakehouse/ called directly by Ray workers on job completion.
Control plane → data plane (reads)
Control-plane agents read from the data plane in two modes:
- Structured query (current):
LakehouseAgentruns DuckDB queries over DuckLake and reads Parquet files via DuckDB in the agent subprocess - Semantic retrieval (v2): control-plane agents call an embeddings query endpoint to retrieve relevant agent traces, job history, or world-model state as context
# v1: LakehouseAgent reads directly
"Run SELECT * FROM job_outcomes ORDER BY completed_at DESC LIMIT 10 WHERE status = 'failed'"
# v2: semantic retrieval endpoint
GET /data/retrieve?query="notebook jobs that failed on torch.dev.gpu last week"&top_k=5
→ [{job_id, summary, outcome, wandb_run_id}]
AgentOps → data plane (snapshots)
The control plane's AgentOps subsystem writes world-model snapshots to the data plane on checkpoint events and at the end of each agent invocation. These snapshots are the foundation for causal replay and VLA training data.
WorldModelSnapshot {
snapshot_id: UUID
intent_id: UUID
timestamp: ISO 8601
agents_active: [{role, status, tool_call_count}]
user_plane: {torch: {jobs_in_flight, gpu_util}, ros: {jobs_in_flight, gpu_util}}
causal_chain: [{intent_id, job_id, outcome}]
}
The lakehouse as persistent world model
In agentic robotics systems, the lakehouse accumulates not just analytics data but the episodic and semantic memory of the entire system:
| Memory type | Contents | Enables |
|---|---|---|
| Episodic | Job history, cluster failure events, navigation trial outcomes | Agent reasoning over "what happened before" |
| Semantic | Learned cluster failure patterns, experiment regression signatures | Agent pattern matching without re-executing |
| Procedural | Successful job submission sequences, DuckLake catalog dependency chains | Agent skill recall |
| Perceptual | GeoParquet outputs, YOLOv8 detections, SLAM maps | VLA fine-tuning, world-model grounding |
For the turtlebot-maze reference application:
- Every Nav2 navigation trial → appended to
event_log.ros_navigation_trials - Every YOLOv8 detection → appended to
event_log.object_detections - Every world-model snapshot at decision point → stored in
world_model.snapshots - Aggregate: accumulated robot experience becomes a fine-tuning corpus for VLA models
This is the convergence point between the data plane and world-model-based VLA research: the lakehouse is the persistent world model.
Interfaces
Ingestion API (data plane ← user plane)
POST /data/ingest
Body: {table: str, records: list[dict], schema_version: str, job_id: UUID, tenant_id: UUID}
→ 201 {partition_id, row_count}
POST /data/snapshots
Body: WorldModelSnapshot
→ 201 {snapshot_id}
Query API (data plane → control plane)
GET /data/query?sql=<duckdb_sql>&tenant_id=<uuid>
→ {columns: [...], rows: [...]}
GET /data/retrieve?query=<natural_language>&top_k=<n>&tenant_id=<uuid> (v2)
→ [{id, text, score, metadata}]
Policy interface (management plane → data plane)
PUT /data/policy
Body: {tenant_id, retention_days, allowed_tables: [...], max_storage_gb: float}
→ 200
Critical distinction: data plane vs management plane
A common mistake is placing the lakehouse under the management plane. They are separate:
| Data Plane | Management Plane |
|---|---|
| Stores and transforms data | Governs how data is stored |
| Serves queries | Defines query access policies |
| Manages schemas and lineage | Manages access rights and retention |
| Enables agent reasoning | Enforces compliance |
The management plane governs the data plane. It does not own it.
Evolution path
v1 — Data plane implicit: lakehouse in aegean-ai/lakehouse; LakehouseAgent reaches across repos
v1.5 — Migrate to data-plane/ in monorepo; formalise ingestion API; structured AgentEvent log
v2 — Embeddings store + RAG retrieval endpoint for control-plane agents
World-model snapshots (AgentOps → data plane)
Schema lineage visible in management-plane dashboard
v3 — VLA training pipeline: accumulated perception data → fine-tuning loop
Feature store for online inference (real-time VLA feature serving)
Data plane exposes MCP server: agents query job history and world state via tool calls
See also:
docs/plans/2026-02-23-aiops-control-plane-design.md§"AgentOps Subsystem" — world-model snapshots, checkpointingdocs/plans/2026-02-23-auraison-control-plane-design.md— LakehouseAgent, agent memorydocs/plans/2026-02-23-auraison-management-plane-design.md— retention policy, RBACdocs/plans/2026-02-23-auraison-user-plane-design.md— ingestion producers (Ray workers)