Data Plane — COCO-Caption Demo Implementation Plan (Experiment #0)
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Relationship to the data plane design
This is the implementation plan for Experiment #0 of the Auraison data plane.
| Document | Purpose |
|---|---|
docs/data-plane/design.mdx | Canonical data plane design — four-plane model, requirements DP-001 to DP-042 |
docs/data-plane/coco-demo-design.mdx | Architectural decisions for this demo (read before implementing) |
| This document | Step-by-step implementation plan for data-plane/lakehouse/ and Experiment #0 |
Goal: Build the lakehouse/ Python package and the COCO-Caption Experiment #0 demo (script + notebook) that exercises every layer of the full architecture at 1,000-sample scale.
Architecture: A layered Python package (catalog → sync → query → stream → tools → visualize) backed by DuckDB/DuckLake for catalog management, the lakehouse for object storage, and HF datasets for the streaming egress interface. Pydantic-AI registers the query/sample functions as agent tools (v2 preview — in v1 the LakehouseAgent Claude Code subprocess covers this role). W&B is an optional loose adapter (no W&B = everything still works). Rerun visualises spatial/temporal samples.
Tech Stack: Python 3.12, DuckDB 1.3+, DuckLake (DuckDB extension), s3fs, huggingface-hub, datasets, pydantic-ai, wandb, rerun-sdk, moto (test mocking), pytest
Prerequisites
Docker Compose services must be running for integration tests:
Unit tests use moto (S3 mock) and a tmp file catalog — no Docker needed.
Task dependency graph
Editable Mermaid source: images/coco-demo-plan-task-dependency-graph.mermaid.md
Task 1: Add Dependencies
Files:
- Modify:
pyproject.toml
Step 1: Add the six new runtime dependencies
Edit the dependencies list in pyproject.toml to add after "duckdb>=1.3.0":
Step 2: Sync the environment
Expected: lock file updated, packages installed with no errors.
Step 3: Verify key imports
Expected: all imports OK
Step 4: Commit
Task 2: Package Skeleton
Files:
- Create:
lakehouse/__init__.py - Create:
lakehouse/catalog.py - Create:
lakehouse/sync.py - Create:
lakehouse/query.py - Create:
lakehouse/stream.py - Create:
lakehouse/tools.py - Create:
lakehouse/visualize.py - Create:
tests/test_catalog.py - Create:
tests/test_sync.py - Create:
tests/test_query.py - Create:
tests/test_stream.py - Create:
tests/test_tools.py - Create:
tests/test_visualize.py - Create:
experiments/__init__.py - Create:
notebooks/(directory only)
Step 1: Create directory structure
Step 2: Create lakehouse/__init__.py
Step 3: Create all other files as empty stubs
Each file should contain only:
Step 4: Create experiments/__init__.py as empty.
Step 5: Verify package is importable
Expected: package OK
Step 6: Commit
Task 3: Catalog — Experiment Schema and DuckLake Attach
Files:
- Modify:
lakehouse/catalog.py - Modify:
tests/test_catalog.py
What this builds
LakehouseCatalog wraps a DuckDB connection with a DuckLake catalog attached. It owns the experiments and simulation_runs tables and provides typed methods for registering and updating records.
Step 1: Write failing tests in tests/test_catalog.py
Step 2: Run to verify they fail
Expected: ImportError or AttributeError — LakehouseCatalog not implemented.
Step 3: Implement lakehouse/catalog.py
Step 4: Run tests and verify they pass
Expected: 5 tests pass.
Step 5: Commit
Task 4: Sync — HF Hub to the lakehouse
Files:
- Modify:
lakehouse/sync.py - Modify:
tests/test_sync.py
What this builds
sync_from_hf() downloads Parquet files from HF Hub and uploads them to a lakehouse S3 bucket. Tests use moto to mock S3 so Docker is not required.
Step 1: Write failing tests in tests/test_sync.py
Step 2: Run to verify they fail
Expected: ImportError — sync_from_hf not implemented.
Step 3: Implement lakehouse/sync.py
Step 4: Run tests and verify they pass
Expected: 2 tests pass.
Step 5: Commit
Task 5: Query — Typed DuckDB Queries Over the Catalog
Files:
- Modify:
lakehouse/query.py - Modify:
tests/test_query.py
What this builds
LakehouseQuery wraps a DuckDB connection with S3 secrets configured and exposes typed query methods. Tests build a small in-memory Parquet fixture via DuckDB without touching the lakehouse.
Step 1: Write failing tests in tests/test_query.py
Step 2: Run to verify they fail
Expected: ImportError.
Step 3: Implement lakehouse/query.py
Step 4: Run tests and verify they pass
Expected: 6 tests pass.
Step 5: Commit
Task 6: Stream — HF IterableDataset Egress
Files:
- Modify:
lakehouse/stream.py - Modify:
tests/test_stream.py
What this builds
as_iterable_dataset() wraps a LakehouseQuery result in an HF IterableDataset, enabling training code to consume lakehouse data with the standard datasets API. Uses fetch_arrow_reader() for zero-copy streaming.
Step 1: Write failing tests in tests/test_stream.py
Step 2: Run to verify they fail
Expected: ImportError.
Step 3: Implement lakehouse/stream.py
Step 4: Run tests and verify they pass
Expected: 6 tests pass. Note: test_epoch_reshuffle_differs has a tiny probability of false failure (two identical random permutations of 100 items). Re-run if it fails once.
Step 5: Commit
Task 7: Tools — Pydantic-AI Agent Tool Definitions
Files:
- Modify:
lakehouse/tools.py - Modify:
tests/test_tools.py
What this builds
build_lakehouse_agent() returns a configured Pydantic-AI Agent with lakehouse query, sample, and quality-check tools registered. The same Pydantic models that define tool schemas will later generate the MCP server schema (Phase 2).
Step 1: Write failing tests in tests/test_tools.py
Step 2: Run to verify they fail
Expected: ImportError.
Step 3: Add pytest-asyncio to dependencies
Add to pyproject.toml:
Run: uv sync
Also add pytest.ini or pyproject.toml section:
Step 4: Implement lakehouse/tools.py
Step 5: Run tests and verify they pass
Expected: 3 tests pass.
Step 6: Commit
Task 8: Visualize — Rerun and W&B Routing
Files:
- Modify:
lakehouse/visualize.py - Modify:
tests/test_visualize.py
What this builds
visualize(data, backend="auto") inspects the Arrow schema and routes to Rerun (spatial columns present) or W&B (scalar time-series). Tests mock both backends so no running Rerun viewer or W&B account is needed.
Step 1: Write failing tests in tests/test_visualize.py
Step 2: Run to verify they fail
Expected: ImportError.
Step 3: Implement lakehouse/visualize.py
Step 4: Run tests and verify they pass
Expected: 6 tests pass.
Step 5: Commit
Task 9: Full Unit Test Suite
Step 1: Run all unit tests together
Expected: all tests pass (approx 28 tests).
Step 2: If any failures, fix before continuing
Do not proceed to Task 10 with failing tests.
Step 3: Commit any fixes
Task 10: COCO Demo Script
Files:
- Create:
experiments/coco_demo.py
What this builds
An end-to-end runnable script that exercises all lakehouse layers in sequence. Requires Docker Compose services running.
Step 1: Create experiments/coco_demo.py
Step 2: Run the script (requires Docker Compose up)
Expected: all 9 steps print with either output or a yellow skipped notice. No unhandled exceptions.
Step 3: Commit
Task 11: COCO Demo Notebook
Files:
- Create:
notebooks/coco_demo.ipynb
Step 1: Create the notebook
Run in the repo root:
Alternatively, create the notebook manually in JupyterLab and run cells in order. The key requirement is that all cells execute without error.
Step 2: Verify notebook runs clean
Expected: no CellExecutionError.
Step 3: Commit
Task 12: Final Integration Test
Step 1: Run the full test suite
Expected: all tests pass.
Step 2: Run the demo script end-to-end
Expected: 9 steps complete, no unhandled exceptions.
Step 3: Final commit
What Is NOT in This Plan (Deferred to Phase 2+)
| Feature | Phase |
|---|---|
MCP server (mcp_server.py) | 2 |
| Zenoh ingest subscriber | 3 |
| K > 1 parallel simulators | 4 |
| NATS/JetStream control events | 4 |
| Ray Data distributed egress | 5 |
load_dataset("lakehouse", ...) custom DatasetBuilder | 5 |