Deep Evidence Agents (DEA)

The DEA is a multi-agent system for safety- and mission-critical engineering organizations (telecom, aerospace, automotive, medical devices). It turns scattered engineering artifacts (requirements, design docs, code, tests, standards) into a traceable, auditable knowledge base with evidence-grounded reasoning. Uses Microsoft GraphRAG for graph-based retrieval-augmented generation.

Repo: aegean-ai/dea (private) Environment: torch.dev.gpu Status: Repo created, GraphRAG indexing planned Beads: auraison-c5l (closed), auraison-1i3 (open)


Overview

DEA is a research paper analysis application powered by Microsoft GraphRAG. It builds a knowledge graph from a corpus of academic papers and enables evidence-based reasoning through local and global search over the graph.

Data pipeline

Google Drive (Paperpile)
  |  rclone sync every 8h (auraison-sus)
  v
landing/paperpile/          3,545 PDFs, 12.1 GiB, 16 topic folders
  |  PDF extraction (PyMuPDF/marker)
  v
warehouse/paperpile/        Parquet: text + metadata per paper
  |  GraphRAG indexer
  v
Knowledge graph             Entities, relationships, communities
  |  DEA query engine
  v
Local + global search       Evidence-based answers with citations

Topic folders

The Paperpile corpus covers: AI, ML (DNN architectures, classical ML, continual learning), Info Theory, Wireless/MIMO, Mathematics, Networking, Architecture, Algorithms, Blockchain, Business-Startups, Energy, Yield Management, Published Papers, and reference textbooks.

Platform services

  • Data plane: landing/paperpile/ (raw PDFs), warehouse/paperpile/ (extracted Parquet)
  • User plane: GraphRAG indexing as a Ray Job on torch.dev.gpu
  • Control plane: LakehouseAgent for querying the corpus catalog

Dependencies

  • Microsoft GraphRAG (git submodule in aegean-ai/dea)
  • rclone sync cron for continuous paper ingestion
  • PDF text extraction pipeline (planned)

On this page