Skip to main content

Deep Evidence Agents (DEA)

Repo: aegean-ai/dea (private) Environment: torch.dev.gpu Status: Repo created, GraphRAG indexing planned Beads: auraison-c5l (closed), auraison-1i3 (open)


Overview

DEA is a research paper analysis application powered by Microsoft GraphRAG. It builds a knowledge graph from a corpus of academic papers and enables evidence-based reasoning through local and global search over the graph.

Data pipeline

Google Drive (Paperpile)
| rclone sync every 8h (auraison-sus)
v
landing/paperpile/ 3,545 PDFs, 12.1 GiB, 16 topic folders
| PDF extraction (PyMuPDF/marker)
v
warehouse/paperpile/ Parquet: text + metadata per paper
| GraphRAG indexer
v
Knowledge graph Entities, relationships, communities
| DEA query engine
v
Local + global search Evidence-based answers with citations

Topic folders

The Paperpile corpus covers: AI, ML (DNN architectures, classical ML, continual learning), Info Theory, Wireless/MIMO, Mathematics, Networking, Architecture, Algorithms, Blockchain, Business-Startups, Energy, Yield Management, Published Papers, and reference textbooks.

Platform services

  • Data plane: landing/paperpile/ (raw PDFs), warehouse/paperpile/ (extracted Parquet)
  • User plane: GraphRAG indexing as a Ray Job on torch.dev.gpu
  • Control plane: LakehouseAgent for querying the corpus catalog

Dependencies

  • Microsoft GraphRAG (git submodule in aegean-ai/dea)
  • rclone sync cron for continuous paper ingestion
  • PDF text extraction pipeline (planned)