Deep Evidence Agents (DEA)
Repo: aegean-ai/dea (private) Environment: torch.dev.gpu Status: Repo created, GraphRAG indexing planned Beads: auraison-c5l (closed), auraison-1i3 (open)
Overview
DEA is a research paper analysis application powered by Microsoft GraphRAG. It builds a knowledge graph from a corpus of academic papers and enables evidence-based reasoning through local and global search over the graph.
Data pipeline
Google Drive (Paperpile)
| rclone sync every 8h (auraison-sus)
v
landing/paperpile/ 3,545 PDFs, 12.1 GiB, 16 topic folders
| PDF extraction (PyMuPDF/marker)
v
warehouse/paperpile/ Parquet: text + metadata per paper
| GraphRAG indexer
v
Knowledge graph Entities, relationships, communities
| DEA query engine
v
Local + global search Evidence-based answers with citations
Topic folders
The Paperpile corpus covers: AI, ML (DNN architectures, classical ML, continual learning), Info Theory, Wireless/MIMO, Mathematics, Networking, Architecture, Algorithms, Blockchain, Business-Startups, Energy, Yield Management, Published Papers, and reference textbooks.
Platform services
- Data plane: landing/paperpile/ (raw PDFs), warehouse/paperpile/ (extracted Parquet)
- User plane: GraphRAG indexing as a Ray Job on torch.dev.gpu
- Control plane: LakehouseAgent for querying the corpus catalog
Dependencies
- Microsoft GraphRAG (git submodule in aegean-ai/dea)
- rclone sync cron for continuous paper ingestion
- PDF text extraction pipeline (planned)