PRD - Deep Evidence Agent
Problem Statement
Current Situation
AI agents offer unprecedented assistance to engineering organizations by continually monitoring and ingesting all sorts of internal to the organization data repositories. In its most elaborate form, graph-based retrieval augmented generation (GraphRAG) systems can create internal Knowledge Bases (KB) and allow foundational Large Language Models (fLLMs) generate information that is conditioned on these KBs. However, existing tools fall short in enabling deep, evidence-grounded generative AI tasks that ensure traceability, reliability and truthfulness. We provide below few scenarios where current AI agents fail to deliver on their promises.
Countless hours are spent in large engineering organizations manually tracing design decisions back to original requirements documents. Take for example how a tool like IBM's Rational DOORS is used in safety-critical engineering domains to manage requirements and offer traceability to design and test artifacts.

The decades-old tool is by many accounts a glorified combination of a relational database, MS Word and Excel. Current AI assistants can more or less replicate DOORS and even offer with custom plugins a seamless migration path off it, but only a handful of companies have the resources to invest in such migration and in the re-education of their engineering base. The conservative nature of engineers also stands in the way.
Todays Engineering Lifecycle Management (ELM) space is thirsty for an infusion of AI capabilities and a complete rethinking of engineering processes that are full of manual, tedious and error prone work: Capturing functional and non-functional requirements, tracing design decisions and test results, acceptance criteria and constantly doing change management tasks such as updating documentation from a hundreds of factors that can introduce variations. Such error prone engineering processes increase both product risk and over time technical debt and can lead to very expensive redesigning or patching up much later. .
Many counteract the ability of AI agents based on pretrained LLMs such as ChatGPT, Claude and others to be in fact very helpful in engineering tasks. However, such models are not designed to do deep evidence-grounded reasoning and the very nature of pretraining at scale, using trillions of internet tokens, goes in the exact opposite direction of engineering AI agents whose utility is based on specificity, accuracy and traceability to evidence. Deep Evidence Agents (DEAs) need to be designed from the ground up to support engineering processes that are evidence-grounded, traceable and auditable. These AI agents don't try to be pleasant, conversational assistants but rather focused, task-oriented researchers that help engineers do their work better, faster and with higher confidence. They also know when to say "I dont know".
Summarising the pain points faced by engineering organizations today when trying to use AI agents for deep engineering tasks:
| Pain Point | Description | Impact |
|---|
| Information overload | Engineers make design decisions with incomplete information or assumptions. | High |
| Manual evidence handling | Copy-paste of excerpts, tables, and figures into word documents or slides | High |
| Lack of traceability | Claims not reliably linked to primary sources | High |
| Fragmented tooling | Search, note-taking, and reporting tools are disconnected | Medium |
| Limited reuse | Prior lessons learned is hard to discover and adapt for new questions or assumptions | Medium |
Goals
| ID | Goal | Description | Priority |
|---|
| G-1 | End-to-end engineering workflows | Engineering task → plan → search → evidence → synthesis → report | P0 |
| G-2 | Provenance-aware engineering outputs | Every factual claim, design justification, or test conclusion traceable to one or more primary artifacts (requirements, design models, code, test reports, standards) | P0 |
| G-3 | Multi-agent reasoning for engineering tasks | Planner, researcher, critic, and summarizer agents with clear coordination over engineering artifacts | P1 |
| G-4 | Human-in-the-loop engineering review | Interactive refinement of plans, evidence sets, and drafts by engineers, architects, and QA | P0 |
| G-5 | Safe and compliant model usage | Guardrails, policies, and logging aligned with engineering standards (where applicable) | P0 |
| G-6 | Integration with engineering data lakes | Ability to use internal engineering corpora (requirements tools, PLM, code, test systems) under access controls policies | P1 |
User Personas
Systems Engineer (Primary)
| Field | Value |
|---|
| Name | Systems Engineer |
| Role | Defines and refines system architecture, interfaces, and component behavior |
| Domain | Telecom, aerospace, automotive, medical devices, finance, other safety-/mission-critical domains |
| Goals | Make design decisions consistent with requirements and standards; understand the impact of changes quickly |
| Frustrations | Manual traceability, ambiguous requirements, disconnected tools, and incomplete justification on design decisions |
| AI Familiarity | Medium to high |
| Success Metrics | Reduced effort to define, trace from requirements to implementation, fewer design defects linked to misunderstood or incorrect assumptions |
Verification & Validation Engineer
| Field | Value |
|---|
| Name | Verification & Validation Engineer |
| Role | Plans, designs, and executes tests to ensure requirements are met |
| Domain | Same as systems engineer |
| Goals | Confirm coverage against requirements and safety goall, understand the rationale for test selection and prioritization |
| Frustrations | Difficulty mapping tests to requirements and acceptance criteria; manual evidence compilation for audits |
| AI Familiarity | Medium |
| Success Metrics | Faster creation and maintenance of traceability matrices, smoother audits and reviews |
Requirements Engineer
| Field | Value |
|---|
| Name | Requirements Engineer |
| Role | Defines, maintains, and curates requirements and product lifecycle data |
| Domain | Any complex engineered system |
| Goals | Maintain consistent, non-contradictory requirements; manage change requests; ensure end-to-end traceability |
| Frustrations | High manual effort, inconsistent naming and IDs, poor impact analysis tooling |
| AI Familiarity | Medium |
| Success Metrics | Reduced cycle time for change requests; improved completeness of traceability |
Product Manager / Engineering Manager
| Field | Value |
|---|
| Name | Product Manager / Engineering Manager |
| Role | Aligns engineering deliverables with business goals and roadmaps |
| Domain | Cross-cutting, across multiple product lines |
| Goals | Make decisions with clear, evidence-based tradeoffs, communicate impact of changes |
| Frustrations | Fragmented information across teams and tools, lack of clear decision rationale |
| AI Familiarity | Medium |
| Success Metrics | Faster and more informed decision cycles, reduced surprises late in the product lifecycle |
Use Cases
Use Case Summary (Engineering-Focused)
| ID | Name | Description | Primary Persona | Priority |
|---|
| UC-1 | Requirements Traceability & Impact Analysis | Build and maintain bidirectional traceability between requirements, design, code, and tests; assess impact of changes | Requirements / PLM Engineer, Systems Engineer | P0 |
| UC-2 | Design Decision Justification | Generate evidence-backed design decision records linking to requirements, alternatives, and standards | Systems / Design Engineer | P0 |
| UC-3 | Standards & Compliance Dossier Assembly | Assemble compliance evidence packages from distributed artifacts | Verification & Validation Engineer, Compliance | P1 |
| UC-4 | Incident / Defect Root-Cause Analysis | Investigate defects or incidents, linking them back to contributing requirements, design decisions, and tests | Systems Engineer, V&V Engineer | P1 |
| UC-5 | Engineering Knowledge Capture & Reuse | Capture lessons learned, patterns, and prior analyses for reuse across projects | Engineering Manager | P1 |
Detailed Use Case – UC-1: Requirements Traceability & Impact Analysis
Goal
Maintain accurate, bidirectional traceability between requirements, design artifacts, source code, and tests, and perform impact analysis when requirements or design elements change.
Preconditions
- User is authenticated and has access to the relevant project and repositories.
- Requirements, design artifacts, code, and test results are accessible via configured connectors (e.g., DOORS/ReqIF, Jira, Git, test management tools).
- Requirements have stable identifiers, at least at the project level.
Main Flow
- User selects a scope (e.g., project, subsystem, feature) and initiates a traceability or impact analysis request.
- Planner agent interprets the request (e.g., “show missing traces,” “analyze impact of requirement X changes”).
- Researcher agents fetch relevant artifacts (requirements, design docs, models, code, tests, defects) from configured tools.
- Evidence is extracted and aligned into candidate trace links (e.g., requirement → design section → code module → test case).
- Critic agent evaluates link quality and identifies gaps, inconsistencies, and suspicious links.
- Synthesizer agent generates:
- A traceability matrix or graph.
- A narrative summary of impact (e.g., “Requirement REQ-123 affects modules A/B/C and tests T-10/T-11”).
- User reviews, confirms, or edits links and impact assessment.
- System persists the curated trace links and optionally pushes updates back to lifecycle tools (where integration allows).
- User exports the traceability view or impact report for design reviews or change control boards.
Alternatives / Edge Cases
- Artifacts are missing or inconsistent: system flags incomplete traces and suggests remediation actions.
- Access is restricted for some repositories: system clearly indicates missing scopes and partial confidence.
- Requirements are ambiguous or duplicate: system highlights potential duplicates/conflicts for human resolution.
Success Criteria
- Trace coverage and correctness, as judged by subject-matter experts, meets agreed thresholds.
- Time to perform impact analysis for a change request is significantly reduced versus baseline.
- Audit or review findings report fewer traceability-related defects.
Functional Requirements
| ID | Name | Description | Related Use Cases | Priority | Acceptance Criteria |
|---|
| FR-1 | Engineering Question Decomposition | Decompose engineering questions (e.g., impact, trace gaps, design trade-offs) into sub-questions and tasks | UC-1, UC-2, UC-4 | P0 | For representative engineering questions, plans are judged reasonable and complete by senior engineers |
| FR-2 | Engineering Artifact Retrieval | Perform hybrid retrieval (keyword + semantic + structured) over requirements, design docs, code, test results, standards | UC-1, UC-2, UC-3, UC-4 | P0 | Top-k artifacts include a majority of expert-identified relevant items in evaluation tasks |
| FR-3 | Engineering Evidence Extraction | Extract relevant fragments (requirements clauses, design sections, code snippets, test logs) as evidence units | UC-1, UC-2, UC-3, UC-4 | P0 | Extracted evidence aligns with relevant sections of source artifacts in sampled cases |
| FR-4 | Provenance and Traceability Tracking | Maintain explicit links between requirements, design decisions, code modules, tests, and defects | UC-1, UC-2, UC-3, UC-4, UC-5 | P0 | Generated trace graphs and matrices have no “orphan” claims; all links resolve to primary artifacts |
| FR-5 | Multi-Agent Engineering Coordination | Coordinate planner, researcher, critic, and synthesizer agents to operate on engineering artifacts | All | P1 | Logs show coherent, explainable task allocation and agent outputs per session |
| FR-6 | Human-in-the-Loop Engineering Editing | Support engineer-driven refinement of plans, evidence sets, trace links, and reports | All | P0 | Users can accept/reject links and edits; system preserves revision history and allows re-run |
| FR-7 | Export and Tool Integration | Export outputs (trace matrices, impact reports, compliance dossiers) and integrate with lifecycle tools | UC-1, UC-2, UC-3, UC-5 | P1 | Users can export artifacts in agreed formats (CSV, PDF, ReqIF, JSON) and optionally sync with selected tools |
| FR-8 | Session and Project Management | Persist and resume engineering sessions and projects with full context | All | P0 | Users can resume sessions with preserved trace graphs, evidence, and notes across devices |
Non-Functional Requirements (NFR)
Quality Attributes
| ID | Category | Requirement | Metric / Target |
|---|
| NFR-1 | Accuracy | Minimize unsupported or incorrect claims in engineering contexts | ≤ 2% of sampled claims lack valid supporting engineering evidence (requirements, design, tests, standards) |
| NFR-2 | Latency | Maintain interactive experience for light queries | p95 latency < 1 second for simple refinement operations |
| NFR-3 | Throughput | Support concurrent engineering sessions | ≥ 1000 active sessions in production environment |
| NFR-4 | Availability | Ensure high uptime for engineering-critical teams | ≥ 99.5% monthly availability |
| NFR-5 | Cost Efficiency | Keep average compute cost per engineering session bounded | Average GPU cost per engineering session below defined threshold |
| NFR-6 | Auditability | Provide full audit trail for engineering sessions | 100% of sessions have logs with tool calls, agents involved, and outputs |
| NFR-7 | Interpretability | Expose rationale for key actions and source choices | Engineers can inspect per-step reasoning summaries or logs and reconstruct key traces |
UX / Usability
| Requirement | Description |
|---|
| Clear engineering citation surfacing | Engineers can easily view and navigate to underlying artifacts (requirements, design docs, code, tests, standards) for each claim or link |
| Session continuity | Engineers can resume sessions from any device with preserved progress, including open analysis threads and partial trace graphs |
| Progressive disclosure | UI shows high-level summaries (e.g., impact overviews) first, with detailed trace graphs and evidence on demand |
| Safe defaults | Warnings and disclaimers accompany high-risk assumptions or model inferences; UI encourages double-checking for safety-critical decisions |
Data, Constraints, and Dependencies
Data Sources
| Source | Type | Access Method | License / Terms | Notes |
|---|
| Requirements Repository (e.g., DOORS/ReqIF, Jama, Polarion) | Internal requirements | API / export connectors | Organization internal data policy | Includes functional, non-functional, safety, and regulatory requirements |
| Design Repository (e.g., Confluence, SharePoint, Model Repos) | Internal design docs/models | API / file connectors | Internal policies | Architecture docs, interface specs, models (SysML/UML, Simulink, etc.) |
| Code Repositories (e.g., Git) | Source code | Git APIs / SSH / mirrors | Internal policies, OSS licenses | Implementation-level evidence and design patterns |
| Test Management & CI Logs | Test cases, results, logs | APIs / file ingestion | Internal policies | Evidence for verification and validation |
| Standards & Regulations (e.g., PDFs from standards bodies) | External standards | Secure document store / curated library | Vendor or standards body licenses | Normative references for compliance |
Technical Constraints
| ID | Constraint | Description | Impact |
|---|
| C-1 | Compute budget | Fixed GPU/CPU budget per month | Limits model sizes, context windows, and concurrency targets |
| C-2 | Data residency | Engineering data must remain in specific regions | Influences deployment topology and cloud/on-prem splits |
| C-3 | Approved model list | Only vetted LLM families may be used for regulated domains | Restricts experimentation with new models in production |
| C-4 | Network egress limitations | Limits on external requests from production clusters | Encourages local indexing and caching; external web access may be disabled per tenant |
External Dependencies
| Dependency | Type | Owner / Provider | Risk if Unavailable |
|---|
| Lifecycle Tool APIs (e.g., DOORS, Jira, Jama, ALM) | External/internal | Tool owners / IT | High – degraded traceability and integration |
| Identity Provider | Internal | IT / Security | High – engineers cannot authenticate or access project data |
| Vector Database | Internal | Data Platform | High – retrieval becomes limited or slow |
| Logging / Metrics | Internal | SRE / Platform | Medium – reduced observability and compliance auditability |
Competitive Landscape
| Competitor / Tool | Segment | Key Capabilities | Weaknesses / Gaps | Our Differentiators |
|---|
| Traditional ELM + RM tools (e.g., DOORS, Jama, Polarion) | Requirements & lifecycle management | Robust requirements storage, baselines, trace fields | Heavy manual maintenance, weak AI/automation, limited cross-repo intelligence | Deep Evidence Agents add automated evidence gathering, impact analysis, and reasoning on top of existing ELM stacks |
| Generic LLM Chat / Copilot tools | Generic coding and Q&A assistants | Conversational answers, code completions | Weak provenance, no explicit engineering traceability | Engineering-specific workflows, strict provenance and trace graphs |
| Enterprise search & discovery tools | Enterprise-wide search and basic analytics | Indexing and search across document repositories | Limited multi-step reasoning, no engineering semantics | Engineering domain models, multi-agent reasoning, and standards-aware evidence handling |
| Synera and similar AI-in-engineering platforms | Generative design & automation | Workflow automation, simulation and optimization in CAD/CAE | Less focused on textual requirements, traceability, and evidence-grounded reasoning | Focus on deep evidence and requirements/traceability workflows, complementing simulation and design automation |
Assumptions
| ID | Assumption | Rationale | Risk if False |
|---|
| A-1 | Access to core internal engineering artifacts | Required for meaningful traceability and impact analysis | Platform may be perceived as “toy” or superficial |
| A-2 | Availability of GPU resources | Needed for LLM inference at target latency | System may need to fall back to degraded modes or smaller models |
| A-3 | Users possess baseline engineering literacy | Needed to interpret trace graphs, evidence, and limitations | Risk of misinterpreting agent outputs or over-trusting suggestions |
| A-4 | Internal data owners approve corpus ingestion | Required for use of requirements, design docs, code, and tests | Reduced value from missing key repositories; partial traceability |
Metrics and KPIs
Product KPIs
| KPI | Definition | Target | Measurement Method |
|---|
| Time-to-Impact-Analysis | Time from change request to acceptable impact report | ≥ 50% reduction vs baseline tools | Time-tracking and telemetry |
| Traceability Completeness | Fraction of requirements with links to design/code/tests | ≥ 95% for in-scope projects | Automated trace graph evaluation |
| Citation / Evidence Completeness | Fraction of engineering claims with at least one evidence link | ≥ 98% | Automated evaluation of reports and trace graphs |
| User Satisfaction (Engineers) | Average post-session rating (1–5) | ≥ 4.5 | In-product feedback prompts |
Business KPIs
| KPI | Definition | Target | Measurement Method |
|---|
| Engineering Team Adoption | Number of engineering teams actively using tool | 5+ pilot teams in first year, 10+ in second year | Account and usage data |
| Reduction in Audit Findings | Percentage reduction in traceability-related findings from audits | ≥ 30% reduction vs prior cycle | Audit reports and CAPA tracking |
| Retention | Users returning within 30 days | ≥ 60% for target cohorts | Cohort retention analysis |
Security, Privacy, Compliance, and Governance
Security Requirements
| ID | Requirement | Description | Priority |
|---|
| SEC-1 | Authentication & Authorization | All access uses organization SSO and role-based access controls aligned with engineering project structures | P0 |
| SEC-2 | Encryption | Engineering data in transit and at rest encrypted per organizational and regulatory standards | P0 |
| SEC-3 | Tenant / Project Isolation | Sessions and data isolated by tenant, program, and project | P0 |
| SEC-4 | Secure tool invocation | Tools (retrieval, SCM, lifecycle systems) follow allowlists and policies; no arbitrary code execution on production systems | P0 |
Privacy Requirements
| ID | Requirement | Description | Applicable Regulations |
|---|
| PRIV-1 | PII Handling | Clearly define and limit how PII appears in engineering corpora and logs | GDPR, CCPA, internal |
| PRIV-2 | Data Minimization | Collect and store only necessary user/session and artifact metadata | GDPR principles |
| PRIV-3 | Retention & Deletion | Implement retention windows and deletion workflows, including project-level data removal where required | Org data policy |
Compliance and Policy Alignment
| Policy / Regulation | Impact on Product | Required Controls |
|---|
| Internal AI Use Policy | Defines permitted AI use cases and limitations for engineering tools | Guardrails, logging, and periodic compliance review |
| Data Usage Policy | Governs use of internal engineering and external data | Source whitelists, access control, audit logs |
| Domain-specific rules | E.g., ISO 26262, DO-178C, IEC 61508, FDA submissions | Domain-specific disclaimers, process hooks, and explicit non-automation of regulated approval steps |
Risks and Mitigations
| ID | Risk | Likelihood | Impact | Mitigation |
|---|
| R-1 | Hallucinations / unsupported claims in engineering outputs | Medium | High | Strict evidence enforcement, critic agents, domain-specific evaluation datasets, and continuous evaluation |
| R-2 | External or lifecycle tool API outages / rate limits | Medium | Medium | Caching, local indexing, robust connectors, clear degraded-mode behavior |
| R-3 | Legal / licensing issues with standards and proprietary docs | Low | High | Legal review of data sources, explicit license tracking, curated ingestion |
| R-4 | User over-reliance on AI outputs for safety-critical decisions | Medium | High | Warnings, training, enforced human review steps, and domain-specific disclaimers |
| R-5 | Data leakage or privacy violations | Low | High | Strong access controls, encryption, redaction, and regular security audits |
| R-6 | Model drift / quality degradation | Medium | Medium | Scheduled evaluation on engineering-specific benchmarks, model versioning, and retraining policies |
Release Plan and Roadmap
Phased Delivery
| Phase | Timeframe | Scope | Key Deliverables | Exit Criteria |
|---|
| Phase 1 – MVP | Q1–Q2 | Single-agent retrieval + summarization; basic engineering provenance | Prototype UI, connectors to at least one requirements repo and code repo, initial evaluation suite | 1–2 pilot engineering teams, acceptable quality and usability feedback |
| Phase 2 – Multi-Agent Engineering | Q3 | Planner, researcher, critic agents; improved traceability and impact analysis | Multi-agent orchestrator, initial trace graph model, engineering-focused evaluation | Accuracy, coverage, and usability targets met in traceability and impact analysis tasks |
| Phase 3 – Enterprise Rollout | Q4 | Scaling, governance, domain models, monitoring and alerting | Production deployment, SLO dashboards, access controls, support for multiple projects and tenants | SLA met, 5+ engineering teams fully onboarded with positive ROI metrics |
| Item | Description | Owner / Team | Needed By |
|---|
| Vector search & Graph store | Indexing and graph-based retrieval for engineering artifacts | Data Platform | Phase 2 |
| Model serving platform | Hosting and scaling of LLMs | ML Platform | Phase 1 |
| Lifecycle connectors | Connectors to requirements, design, code, and test tools | Platform / Integration Team | Phase 1–2 |
| Observability stack | Logging, metrics, tracing | SRE / Platform | Phase 1 |
Open Questions
| ID | Question | Owner | Target Resolution Date |
|---|
| OQ-1 | Which primary LLM family to standardize on for engineering tasks? | ML Platform Lead | TBD |
| OQ-2 | Which long-term storage format and technology for provenance / trace graphs? | Data Architect | TBD |
| OQ-3 | Which domains and standards (e.g., ISO 26262 vs DO-178C) get domain-specialized models and evaluation first? | Product Lead | TBD |
Traceability to Architecture (ADD)
This section links key PRD items to architecture components and views described in the
Architecture Description Document.
| PRD Item | Description | ADD Section / Component | Notes |
|---|
| FR-1, FR-5 | Engineering question decomposition and multi-agent coordination | Planning & Coordination Module, Orchestrator | Multi-agent planning flows |
| FR-2, FR-3 | Engineering artifact retrieval and evidence extraction | Retrieval & Exploration Module, Ingestion Layer | Hybrid search, parsers, connectors |
| FR-4 | Provenance and traceability tracking | Evidence Management & Provenance Model | Graph or document store |
| NFR-1, NFR-6 | Accuracy, auditability | Evaluation & Monitoring | Quality metrics, logs, and dashboards |
| G-5, SEC-* | Safety and governance | Safety & Governance Layer | Guardrails, policies, and policy engine |
Appendix
Glossary
| Term | Definition |
|---|
| Agent | An autonomous process that uses models and tools to perform engineering tasks |
| Provenance | The trace from an output claim or decision back to the supporting sources and evidence (requirements, design, code, tests, standards) |
| RAG | Retrieval-Augmented Generation: combining retrieval with generative models |
| Corpus | A collection of engineering documents and artifacts used for retrieval and analysis |
| Session | A logically scoped, persistent interaction between a user and the platform |
| Traceability | Ability to follow the life of a requirement forwards and backwards through design, implementation, verification, and operation |
Reference Links
- Architecture Description Document: AI Deep Research / Deep Evidence Agent – Architecture Description v1
- UX and UI prototypes (e.g., Figma link)
- Data inventories and catalog entries for engineering repositories
- Evaluation frameworks and benchmark descriptions for engineering tasks
- Internal AI policy and data usage policy documents