Auraison - Research Agent PRD

Product Overview

Goal

Build an AI agent as a browser sidecar that turns any pile of PDFs, visited web pages, notes and videos, into a focused task assistant, able to:

Help the user receive responses to their natural language queries by grounding them to the content across documents, modalities or temporal constraints,
Help the user learn new topics by presenting a plan (curriculum) of reading materials curated for incremental learning and comprehension,
Help the user switch contexts effectively, by structuring content into thematic areas, topics and projects. @
Help the user research new ideas by acting as a deep researcher able to perform scientific discovery [@Space2025-mq] on their behalf,
Help the user, distill their knowledge and share the context with other user's agents that are part of a project team.
Learn from the user's activities and self-improve over time.

Our goal is to support the following knowledge workers:

Researchers and scientists that need to consume vast amounts of literature to understand new research directions.
Enterprise business users that structure their work around internal processes and traceable Line of Business (LoB) tools. These users make business decisions distilling information from others within or across organization boundaries and depend on both information consistency and explainability all the way down to individual numbers.
Engineers that make design decisions based on engineering guidelines and processes. These users depend on accurate capture of requirements and need routinely to perform verification tasks from deliverables such as code, engineering diagrams or other design documents back to the requirements that drove such design decisions.
Students that need tutoring assistance as part of a course and need to be best prepared to face exams that test their knowledge.

One common denominator in the life of all knowledge workers is how to handle change.

No matter the job function, new discoveries change previous hypotheses, processes change to improve business decision making, new customer feedback change previous product requirements, exam results change assumptions of the quality of our knowledge.
All knowledge workers reconcile information. A supply chain director in the pharmaceutical industry for example, must reconcile demand planning and external manufacturing, each with its own pile of documents or LoB tool reports. Decision making is typically done in meetings with peers where new information is naturally introduced - no worker knows everything, everywhere all at once, either by nature of by design in regulated enterprises.

In the following we qualify further how change is captured by our system in a way that leads to adaptation and therefore robustness to the agents reasoning ability to generate quality information, suggest actions to humans in the loop or make autonomous decisions.

Core Value Proposition

Auraison agents achieve their goals using three key innovations.

Reasoning that is grounded to the user's content and the temporal dynamics of the user's project context i.e. what the user reads, highlights, watches over time under the project context. The reasoning is assisted by a Knowledge Multi-Graph (KMG) that expands on KGs that capture semantic relationships between concepts to relationships of any kind such as hierarchical relationships, temporal dynamics etc. .
The KMG also stores pointers to the agent's reasoning process from previous interactions allowing the system to recall and refine long term memories into a short-term memory (see shared memory architecture) bank. The restoration is based on a dynamic association of subgraphs to the current query.
Explainable – every answer, suggestion, and research step is backed by concrete segments (page/timestamp) and graph paths.
RL-ready – treats your behavior (highlights, dwell time, accepted suggestions) as structured signals that can tune future behavior.

Differentiators

#	Differentiator	What It Does	Why It’s Different
1	Multi-agency	Knowledge store is a toolbox for agents. Retrieval exposed as clean tools, not tangled internals.	Competing tools aren’t designed as first-class backends for multi-agent or tool-based systems.
2	Not just literature → any modality	Handles research papers and OEM service bulletins, internal manuals, textbooks, training PDFs, etc. Projects can group any arbitrary pile of documents into a coherent knowledge universe.	Others are primarily tuned for academic papers / standard literature workflows.
3	Segment-centric	Suggests sections, paragraphs, figures, timestamps, not only whole documents. Lets you “jump to the part that matters” in a paper or lecture.	Most tools recommend or surface entire documents, not fine-grained segments.
4	Knowledge Graph at the core	Builds a personal, project-local knowledge graph of topics, artifacts, segments, citations, authors, missions, user-defined links. Retrieval is graph-augmented from day one.	Others are primarily list/tag/folder-based with optional embeddings, not graph-first.
5	Personalized graph & recommendations	Citation/topic graphs are project-local and user-specific. Suggestions are driven by what you actually read, highlight, and accept/ignore.	Typical tools lean on global citation graphs or generic recs, not per-user project graphs.
6	Personalization through RL	Logs highlights, reading behavior, dwell time, accepted/ignored suggestions, mission progress as structured events for reward modeling, RL-style policy tuning, and better planning.	Others may track “reads” or highlights, but don’t treat them as a full RL-ready event stream.
7	State-of-mind tracking	Remembers recent artifacts/topics, trains of thought, missions in progress, powering a “state-of-mind bar” and browser/desktop integrations.	Most tools remember last opened doc at best, not your broader cognitive/work state.
8	Local-first with pluggable AI	All core features run entirely local (local DB, embeddings, LLM via Ollama or similar). Cloud APIs (OpenAI, etc.) are opt-in and swappable.	Others are heavily cloud-centric and/or tied to specific proprietary AI backends.

Primary Personas & Use Cases

Researcher / Scientist
- Manages hundreds of papers + notes across multiple projects.
- Needs:
  - “Where did I see this equation/idea?”
  - “What should I read next for Topic X?”
  - Project-local citation neighborhoods.
Enterprise Worker
- Works with piles of internal PDFs, manuals, service bulletins, training docs.
- Needs:
  - “Which bulletin proves this repair is under warranty?”
  - “Which documents relate to this component?”
- Auraison acts as a private assistant over company PDFs.
Student
- Reads slides, assignments, quizzes, transcripts in Canvas/LMS.
- Needs:
  - “What was I thinking about last time I studied this?”
  - “Show me the best 3 things to review before this quiz.”
- Uses the same core with extra ingestion (screenshots + OCR, LMS APIs).

Design Principles

Local-first by default
- Single-machine deployment, local DB + storage, offline by default.
- Cloud sync is opt-in and end-to-end encrypted.
Knowledge Graph as a first-class primitive
- Retrieval combines:
  - Keyword,
  - Embeddings,
  - Knowledge graph paths (GraphRAG).
- Graph scope is per user + per project, not a global monolith.
Unified knowledge store for hybrid retrieval
- Long-term: a single store (e.g. SurrealDB) expressing:
  - Relational filters,
  - Graph traversals,
  - Vector similarity,
  - Document-style flexibility.
- Avoid dependency hell with disjoint Postgres + separate vector DB + separate graph DB.
User-as-signal / RL-ready
- Treat user behavior as structured data:
  - Highlights, dwell times, suggestion feedback, mission outcomes.
- Make logs queriable and usable for future RL and evaluation.
Reusable core, UI-agnostic
- Same core can drive:
  - Desktop reference manager,
  - Research assistant web UI,
  - Canvas plugin or student tutor,
  - CLI / API tools.
- No polluting core with UI-specific assumptions.
Tool-driven, multi-agent ready
- Expose retrieval and graph operations as tool-like functions:
  - search_segments, graph_neighbors, topic_expansion, mission_step_search, etc.
- Agents (Goose/Pydantic AI/Ray) operate on these tools, not on raw DB internals. This way we might be able to facilitate such an RL training loop to improve when models choose to activate these tools, for example
Auxilliary Models-triggered enrichment at ingestion
- Treat ingestion as the first opportunity to enrich artifacts and segments with structure:
  - Automated tagging (topics, entities, methods, datasets, domains),
  - Quality / type classification (method section, related work, figure caption, etc.),
  - Domain-specific labels (e.g., warranty vs non-warranty bulletin, course vs exam material).
- Prefer lightweight, local models and/or database-side triggers (e.g., SurrealDB ML triggers) to:
  - Auto-tag artifacts and segments as they arrive,
  - Maintain consistent tagging across the corpus,
  - Reduce manual organizational overhead for the user.

⚙️ Core System Design

Storage & Data Model (High-Level)

Layer	Technology (initial)	Purpose
Artifact Storage	Local filesystem	Raw PDFs, HTML snapshots, notes, media files
Primary DB	SQLite → Postgres (option) → SurrealDB	Artifacts, segments, metadata, topics, missions, logs
Vector Store	Qdrant / SQLite-vectors (initial)	Embeddings for segments, topics, annotations
Graph Layer	DB tables / edges → SurrealDB graph	Topic graph, citation graph, user-defined relationships
Search Index	SQLite FTS5 / DB FTS	Keyword search across segments, titles, annotations
Cache Layer	On-disk + in-memory	Recent embeddings, RAG results, mission and planning state

Long-term: converge towards SurrealDB as unified knowledge store (relational + graph + vector).

Artifact Layer

Purpose: Represent all user knowledge sources in a unified, structured way.

Why it exists:

Users bring in heterogeneous sources: PDFs, web pages, screenshots, lecture videos, handwritten notes.
Without a unified artifact schema, downstream systems (retrieval, graph, topics) cannot reason about origin, identity, structure, or metadata.
Makes the system domain-agnostic: academic research, law, medicine, engineering, note-taking, corporate knowledge, etc.
Enables stable IDs → essential for citations, cross-referencing, and long-term user history.

Core value: The artifact system is the backbone for organizing the user's external world into a structured knowledge universe.

Segment Layer

Purpose: Provide a universal “atomic unit of meaning” across all artifact types.

Why it exists:

LLMs, search, and retrieval do necessarily have to operate on whole PDFs; they can also operate on paragraphs, text blocks, figures, or transcripts.
Standardizes PDFs, web, audio, and notes into a single, queryable unit that everything else can consume.
Enables fine-grained citations and traceability (“your answer came from this specific page block”).
Unlocks suggestion engines (“read this paper / paragraph next”) and granular RAG.

Core value: Segments unify all modalities into one consistent, retrievable data model.

Topic & Knowledge Graph Layer

Purpose: Encode structure, relationships, and meaning that embeddings alone cannot capture.

Why it exists:

Pure vector search cannot model conceptual hierarchies (“reinforcement learning” → “Q-learning” → “DQN variants”).
Topic clusters + edges model conceptual and contextual relationships.
Allows cross-artifact relationships (“this lecture timestamp explains the same idea as this paper section”).
Captures user mental models through user-defined links and topics.

Core value: Adds structure, context, and intent to the raw text world — essential for meaningful navigation and understanding.

GraphRAG Layer

Purpose: Combine graph structure + embeddings for context selection beyond simple similarity.

Why it exists:

Standard RAG retrieves “nearest text,” which often produces noisy, redundant, or shallow results.
GraphRAG uses:
- topic relations
- citation networks
- co-occurrence
- user-linked concepts
- “segments that helped in the past”
Allows “deep research queries,” e.g.:

“Explain how this method fits within the broader idea of representation learning.”

Core value: Retrieval grows from “nearest neighbors” → contextual reasoning over interconnected knowledge.

Indexing & Retrieval Layer

Purpose: Provide multiple complementary retrieval channels optimized for different query types.

Why it exists:

Not all queries are semantic; many are precise (“find that graph I saw on page 47”).
Keywords sometimes outperform embeddings; embeddings outperform keywords in fuzzy queries.
Graph relationships sometimes outweigh both (“find things related to Topic X through citations”).
Combining the three produces robust retrieval across all research scenarios.

Core value: Retrieval must be multi-modal and composable to handle the unpredictable nature of human queries.

Knowledge Store Strategy (SQLite/Postgres → SurrealDB)

Purpose: Provide a data backend that can evolve from simple local deployment to unified graph+vector+document storage.

Why it exists:

Early versions should be fast to build: SQLite or Postgres is simple and robust.
Long-term, a fragmented stack becomes difficult:
- 3–5 services
- multi-step queries
- complex migrations
- more failure points
SurrealDB provides:
- relational
- graph
- document objects
- embeddings
- single query surface
- single deployment model

Core value: Smooth migration from prototype → scalable “everything-in-one” local-first knowledge store.

External Corpora Integration (Semantic Scholar, APIs)

Purpose: Enrich the user’s graph and library with global scientific structure.

Why it exists:

A user’s library is only as strong as its connectivity.
Citation networks, related works, and topic metadata dramatically enhance search and topic modeling.
Long-term, missions and suggestions rely on understanding:
- what’s influential
- what’s foundational
- what’s related
Use APIs to supplement existing data and to drive suggestions for new data sources as well

Core value: Import structure — not just text — to build a richer knowledge graph.

AI / RAG / GraphRAG Orchestration Layer

Purpose: Turn raw retrieval into well-scoped, well-cited, high-quality LLM answers.

Why it exists:

Retrieval must be deterministic, inspectable, and reproducible — not a black-box prompt.
LLM responses need:
- the right context,
- deduped segments,
- citations,
- metadata,
- mission constraints.
Guarantees that the user sees where answers came from, providing transparency and better end results / usability.

Core value: Provides a disciplined, stateful reasoning loop over user knowledge, not just a chat interface.

Topic Modeling & Suggestion Engine

Purpose: Help the user decide what to read next or what to study based on their goals.

Why it exists:

Reading lists and recommendations must adapt to:
- user engagement
- interest clusters
- project themes
- repeated queries
Identifies knowledge gaps and suggests optimal entry points.

Core value: Moves the platform from passive storage → active research assistant.

State-of-Mind

Purpose: Maintain continuity across sessions and unify research across local/browser/desktop contexts.

Why it exists:

Research is nonlinear: you jump between papers, pages, tabs, tools.
A “state-of-mind” view lets users resume:
- last mission
- last artifact
- active topics
- last questions
Porting browser activity (via OCR snapshots) into the same graph creates a holistic personal knowledge base.

Core value: Bridges short-term working memory with long-term structured knowledge.

🧠 User-Facing Workflows

Signature Flows

#	Flow Name	User Action / Input	System Behavior	Output / UI
1	“Where did I see this?” recall	Asks a fuzzy question or phrase	Uses semantic + graph-aware retrieval; weights segments by engagement (highlights, rereads).	Ranked segments with artifact title, page/timestamp, user annotations.
2	“Show me 5 sections I haven’t read yet”	Selects project and topic	Scores unseen segments linked to the topic; balances recency, diversity, importance.	Prioritized reading queue of segments for that topic.
3	“Jump to the right moment in a lecture”	Searches for a concept	Retrieves transcript segments + timestamps; generates short summaries for each.	List of snippets with “Open at 12:34–17:20”–style actions into the video/audio.
4	Topic overview & evolution	Opens project topic view	Shows topics, sizes (#segments), highlight density, and interaction over time.	Topic dashboard; highlights neglected topics and suggests balancing readings.
5	Deep research mission	Creates mission (question/theme + optional seed topics/artifacts)	Proposes local reading steps, external expansions (e.g., Semantic Scholar), and sub-questions; logs progress.	Mission dashboard tracking steps completed, segments consulted, external sources, open sub-questions.
6	Library chat	Chats scoped to project and/or topics	Runs GraphRAG retrieval; assembles context; generates grounded answers with citations and follow-up suggestions.	Chat responses with citations, “open segment” actions, and optional mission suggestions.
7	(Future) Canvas / LMS study assistant	Uses LMS-integrated view; asks e.g., “What to review before quiz?”	Ingests Canvas exports, screenshots + DeepSeek OCR, LMS assignments/quizzes into same graph; runs topic-aware retrieval.	Study flows like “What should I review before this quiz?” and “Where did I see this concept?”, with links back to course materials.

🔄 Data & Interaction Lifecycle

Ingestion

Import
- PDFs, URLs, videos, notes, or folders.
Parse
- PDFs → markdown/HTML with structural sections.
- Web → DOM segmentation into headings/blocks.
- Video/audio → ASR transcript + keyframe descriptions and subsequent segmentation.
- (Future) Screenshots → OCR.
Segment
- Convert parsed content into segments with locations and types.
Enrich & auto-tag (model / trigger based)
- Run lightweight models (locally or via DB triggers) over artifacts and segments to:
  - Assign tags (topics, entities, domains, document type),
  - Classify sections (intro, methods, results, discussion; or “procedural”, “policy”, “troubleshooting”),
  - Detect key entities (algorithms, components, datasets, course IDs).
- Persist results as:
  - Structured fields on artifacts / segments,
  - Graph edges linking segments to topics/entities,
  - Candidate topics to be reviewed/merged by the user.
Embed
- Compute embeddings for:
  - Segments,
  - Topics,
  - Annotations
Update graph & topics
- Insert nodes and edges:
  - Artifacts, segments, topics, citations, etc.
- Attach segments to topics (soft links), using both user tags and automated tags.

Interaction & Logging

User actions are logged as structured events, not just free-form logs:

events table (conceptually):
- event_id
- user_id
- project_id
- event_type (highlight, open_segment, suggestion_shown, suggestion_accepted, suggestion_ignored, mission_step_completed, query_asked, answer_viewed, …)
- payload (JSON with segment_id, topic_id, mission_id, etc.)
- timestamp

Every interaction—in particular:

Highlights,
Reading sessions,
Suggestions ± feedback,
Mission steps,
Queries— becomes RL-ready signal data.

Suggestions & Missions

Suggestions:
- When shown, log suggestion_shown.
- When clicked, log suggestion_accepted.
- If ignored/hidden, log suggestion_ignored.
Missions:
- Maintain mission state:
  - Steps list (planned vs completed),
  - Segments consulted,
  - External sources fetched.
- Each mission step completion logs:
  - mission_step_completed with success/failure and notes.

Learning Loop & RL-Readiness

Captured signals:

Which segments were helpful (accepted suggestions, mission success),
Which suggestions were ignored,
Which topics users revisit often,
Where users dwell or leave quickly.
Patterns of user behavior
Usefulness of various sources to help with user queries

These feed:

Immediate heuristics (ranking tweaks),
Future:
- Reward modeling,
- RL fine-tuning for:
  - Suggestion policies,
  - Topic expansion,
  - Mission planning and search strategies.

🔒 Non-Functional Requirements

Category	Requirement
Privacy	Entirely local by default; no external calls unless explicitly configured by the user.
Performance	Search & simple RAG queries < 1s on a modern laptop for ~10k artifacts / 100k+ segments.
Scalability	Schema & APIs compatible with migration to Postgres/SurrealDB and external object storage as needed.
Reliability	No data loss on normal crashes; journaling transactions and backup/export tools.
Extensibility	Pluggable LLMs, vector backends, and external APIs via stable interfaces (OpenAI-compatible client).
Transparency	Every AI answer & suggestion has inspectable provenance (segments, graph paths).
Portability	Runs as a Dockerized app and/or desktop app with self-contained DB + storage volume.
Security	Local ACLs and optional encryption at rest; cloud sync is end-to-end encrypted when enabled.

🧩 Component Architecture

Multi-Agent & Agentic Retrieval Readiness

Tool-Oriented Internal API

Expose core operations as tools (functions/endpoints) with clear contracts:

search_segments(query, project_id, topics=None, limit=20)
graph_neighbors(node_id, depth=1, edge_types=None)
topic_expansion(topic_id, k=10)
mission_step_search(mission_id, query)
suggest_next_segments(project_id, topic_ids=None, limit=5)
get_state_of_mind(project_id) → minimal state snapshot for the “state-of-mind bar”.

Tools:

Use typed models internally (Pydantic),
Are stateless from the perspective of the caller,
Emit structured logs of usage for observability.

Multi-Agent Scenarios

We anticipate future setups where:

Different agents operate over the same knowledge store:
- “Research Planner”, “Reader Summarizer”, “Citation Graph Builder”, “Tutor”, etc.
Some are read-only, others can write:
- Annotations,
- Missions,
- Topic links.

Requirements:

Tools are idempotent where sensible,
Clear, documented side-effects,
Access control around who can modify what.

Integration Targets

Goose / Pydantic AI
- Map tools directly to Pydantic-typed functions.
- Use them in planning loops and tool-calling phases.
Ray / Multi-agent frameworks
- Wrap tools in Ray actors for:
  - Parallel retrieval,
  - Batch mission evaluation,
  - Large-scale RL experiments (later).

🚀 Roadmap (High-Level)

This is product-level, not implementation detail. MVP will be specified separately.

Phase	Deliverable	Focus
Phase 0	GraphRAG prototype	Small corpus, basic hybrid graph+vector retrieval
Phase 1	Local smart reference manager + doc RAG	Ingest PDFs/web, highlights, single-doc chat
Phase 2	Cross-document semantic & graph memory	Segment-level search, library-wide chat, project graphs
Phase 3	Topic graph + suggestion engine	Topic modeling, next-section/timestamp suggestions
Phase 4	Deep research missions (individual)	Missions, Semantic Scholar, project-local citation graph
Phase 5	Longitudinal analytics & state-of-mind views	Topic evolution, state/action based event timelines
Phase 6	SurrealDB-based unified knowledge store	Consolidate graph + vectors + metadata + events
Phase 7	Team workflows & LMS integrations (optional)	Shared projects, Canvas/LMS plugins, team analytics

📎 SurrealDB or Postgresql?

SurrealDB & Hybrid Graph Rationale

Need hybrid syntactic + semantic knowledge graph:
- Graph structure (citations, topics, user behavior),
- Text semantics (embeddings),
- Temporal patterns (sessions).
Traditional stack requires orchestrating Postgres + vector store + graph DB.
SurrealDB promises:
- Single query surface for graph + vector fields,
- Local-first deployment,
- Easier tool design for agents.
SurrealDB’s ability to host models or model-triggered procedures close to the data makes it a natural fit for:
- Automated tagging during ingestion (e.g., triggers that call local models to classify new artifacts/segments),
- Maintaining consistent tagging and topic assignment as the corpus grows,
- Attaching model outputs directly as graph nodes/edges without extra orchestration services.

Net effect: SurrealDB is a strong candidate for the final unified store as an alternative to SQLite/Postgres + vector store, especially for a design that leans heavily on automated, model-driven tagging and enrichment at ingestion time.

Video / Timestamp & Notes

Interview: Building The Database That Can Do It All | Tobie Morgan Hitchcock, CEO of SurrealDB

AI Summary:

SurrealDB offers a compelling approach to database management that could significantly benefit your advanced AI/ML systems by simplifying your data infrastructure and streamlining development. Here's a summary of its key advantages:

Reimagining Databases for Simplification and Consolidation: SurrealDB is a multi-model database designed to simplify data storage and querying (2:09). Unlike traditional approaches that involve managing multiple specialized databases (e.g., time-series, document, graph, relational) (0:15, 4:01), SurrealDB consolidates these different data types into a single, coherent platform (0:22, 2:34, 16:42). This consolidation significantly reduces costs, development time, and the complexity of managing your application's backend (2:38, 3:30, 34:30).
Unified Data Models for Enhanced Flexibility: SurrealDB inherently supports document, graph, and time-series data (14:42, 15:01). This means you can store traditional document data, augment it with graph relationships (modeling data the way humans think, in terms of types and relationships) (16:08), and handle streams of events with timestamps (15:01). Its unique record ID system allows for efficient querying across these models, enabling powerful real-time analytics that combine transactional and analytical queries (18:23, 20:59, 24:18).
Revolutionary AI/ML Integration: Models Live Inside the Database: A standout feature for AI/ML development is SurrealDB's ability to bring machine learning models directly inside the database (0:56, 30:37). This innovative approach allows you to run custom or off-the-shelf models right alongside your data (30:45). This eliminates the need to push data out to external clusters (e.g., Kubernetes), wait for synchronous events, and deal with the significant administrative overhead of microservices, containers, and complex pipelines (31:00, 31:14). This concept of "bringing the models to the data rather than the data to the models" (32:13) greatly simplifies AI/ML workflows, letting developers focus on the model's functionality rather than the underlying infrastructure (32:35).
Scalability and Performance Designed for Modern Needs: SurrealDB separates its storage layer from its compute layer (17:46, 20:51), allowing for independent scaling of both based on your application's demands (17:57, 28:08, 36:09). It can run as an embedded database, a single node, or a distributed cluster, providing flexibility across different environments (17:50, 27:59). This architecture supports powerful real-time analytics (18:33), which is crucial for applications that require immediate insights from dynamic data.
Addressing Complexity for Future-Proof Development: The speaker highlights that the proliferation of specialized databases and microservice-based architectures has led to massive complexity (31:39, 5:00). SurrealDB's rebundling approach (20:13) aims to reduce this complexity, making it easier and quicker to build and manage applications (29:32, 34:30). For advanced AI/ML systems, where data interaction is paramount, this simplification of the underlying data infrastructure is a significant long-term advantage.

By consolidating data models, enabling in-database AI/ML model execution, and offering flexible scalability, SurrealDB provides a powerful platform for building sophisticated AI/ML applications with reduced complexity and improved efficiency.

Product Overview​

Goal​

Core Value Proposition​

Differentiators​

Primary Personas & Use Cases​

Design Principles​

⚙️ Core System Design​

Storage & Data Model (High-Level)​

Artifact Layer​

Segment Layer​

Topic & Knowledge Graph Layer​

GraphRAG Layer​

Indexing & Retrieval Layer​

Knowledge Store Strategy (SQLite/Postgres → SurrealDB)​

External Corpora Integration (Semantic Scholar, APIs)​

AI / RAG / GraphRAG Orchestration Layer​

Topic Modeling & Suggestion Engine​

State-of-Mind​

🧠 User-Facing Workflows​

Signature Flows​

🔄 Data & Interaction Lifecycle​

Ingestion​

Interaction & Logging​

Suggestions & Missions​

Learning Loop & RL-Readiness​

🔒 Non-Functional Requirements​

🧩 Component Architecture​

Multi-Agent & Agentic Retrieval Readiness​

Tool-Oriented Internal API​

Multi-Agent Scenarios​

Integration Targets​

🚀 Roadmap (High-Level)​

📎 SurrealDB or Postgresql?​

SurrealDB & Hybrid Graph Rationale​

Video / Timestamp & Notes​

AI Summary:​

References​

Product Overview

Goal

Core Value Proposition

Differentiators

Primary Personas & Use Cases

Design Principles

⚙️ Core System Design

Storage & Data Model (High-Level)

Artifact Layer

Segment Layer

Topic & Knowledge Graph Layer

GraphRAG Layer

Indexing & Retrieval Layer

Knowledge Store Strategy (SQLite/Postgres → SurrealDB)

External Corpora Integration (Semantic Scholar, APIs)

AI / RAG / GraphRAG Orchestration Layer

Topic Modeling & Suggestion Engine

State-of-Mind

🧠 User-Facing Workflows

Signature Flows

🔄 Data & Interaction Lifecycle

Ingestion

Interaction & Logging

Suggestions & Missions

Learning Loop & RL-Readiness

🔒 Non-Functional Requirements

🧩 Component Architecture

Multi-Agent & Agentic Retrieval Readiness

Tool-Oriented Internal API

Multi-Agent Scenarios

Integration Targets

🚀 Roadmap (High-Level)

📎 SurrealDB or Postgresql?

SurrealDB & Hybrid Graph Rationale

Video / Timestamp & Notes

AI Summary:

References