PRD - Deep Evidence Agent

Problem Statement

Current Situation

AI agents offer unprecedented assistance to engineering organizations by continually monitoring and ingesting all sorts of internal to the organization data repositories. In its most elaborate form, graph-based retrieval augmented generation (GraphRAG) systems can create internal Knowledge Bases (KB) and allow foundational Large Language Models (fLLMs) generate information that is conditioned on these KBs. However, existing tools fall short in enabling deep, evidence-grounded generative AI tasks that ensure traceability, reliability and truthfulness. We provide below few scenarios where current AI agents fail to deliver on their promises.

Countless hours are spent in large engineering organizations manually tracing design decisions back to original requirements documents. Take for example how a tool like IBM's Rational DOORS is used in safety-critical engineering domains to manage requirements and offer traceability to design and test artifacts.

The decades-old tool is by many accounts a glorified combination of a relational database, MS Word and Excel. Current AI assistants can more or less replicate DOORS and even offer with custom plugins a seamless migration path off it, but only a handful of companies have the resources to invest in such migration and in the re-education of their engineering base. The conservative nature of engineers also stands in the way.

Todays Engineering Lifecycle Management (ELM) space is thirsty for an infusion of AI capabilities and a complete rethinking of engineering processes that are full of manual, tedious and error prone work: Capturing functional and non-functional requirements, tracing design decisions and test results, acceptance criteria and constantly doing change management tasks such as updating documentation from a hundreds of factors that can introduce variations. Such error prone engineering processes increase both product risk and over time technical debt and can lead to very expensive redesigning or patching up much later. .

Many counteract the ability of AI agents based on pretrained LLMs such as ChatGPT, Claude and others to be in fact very helpful in engineering tasks. However, such models are not designed to do deep evidence-grounded reasoning and the very nature of pretraining at scale, using trillions of internet tokens, goes in the exact opposite direction of engineering AI agents whose utility is based on specificity, accuracy and traceability to evidence. Deep Evidence Agents (DEAs) need to be designed from the ground up to support engineering processes that are evidence-grounded, traceable and auditable. These AI agents don't try to be pleasant, conversational assistants but rather focused, task-oriented researchers that help engineers do their work better, faster and with higher confidence. They also know when to say "I dont know".

Summarising the pain points faced by engineering organizations today when trying to use AI agents for deep engineering tasks:

Pain Point	Description	Impact
Information overload	Engineers make design decisions with incomplete information or assumptions.	High
Manual evidence handling	Copy-paste of excerpts, tables, and figures into word documents or slides	High
Lack of traceability	Claims not reliably linked to primary sources	High
Fragmented tooling	Search, note-taking, and reporting tools are disconnected	Medium
Limited reuse	Prior lessons learned is hard to discover and adapt for new questions or assumptions	Medium

Goals

ID	Goal	Description	Priority
G-1	End-to-end engineering workflows	Engineering task → plan → search → evidence → synthesis → report	P0
G-2	Provenance-aware engineering outputs	Every factual claim, design justification, or test conclusion traceable to one or more primary artifacts (requirements, design models, code, test reports, standards)	P0
G-3	Multi-agent reasoning for engineering tasks	Planner, researcher, critic, and summarizer agents with clear coordination over engineering artifacts	P1
G-4	Human-in-the-loop engineering review	Interactive refinement of plans, evidence sets, and drafts by engineers, architects, and QA	P0
G-5	Safe and compliant model usage	Guardrails, policies, and logging aligned with engineering standards (where applicable)	P0
G-6	Integration with engineering data lakes	Ability to use internal engineering corpora (requirements tools, PLM, code, test systems) under access controls policies	P1

User Personas

Systems Engineer (Primary)

Field	Value
Name	Systems Engineer
Role	Defines and refines system architecture, interfaces, and component behavior
Domain	Telecom, aerospace, automotive, medical devices, finance, other safety-/mission-critical domains
Goals	Make design decisions consistent with requirements and standards; understand the impact of changes quickly
Frustrations	Manual traceability, ambiguous requirements, disconnected tools, and incomplete justification on design decisions
AI Familiarity	Medium to high
Success Metrics	Reduced effort to define, trace from requirements to implementation, fewer design defects linked to misunderstood or incorrect assumptions

Verification & Validation Engineer

Field	Value
Name	Verification & Validation Engineer
Role	Plans, designs, and executes tests to ensure requirements are met
Domain	Same as systems engineer
Goals	Confirm coverage against requirements and safety goall, understand the rationale for test selection and prioritization
Frustrations	Difficulty mapping tests to requirements and acceptance criteria; manual evidence compilation for audits
AI Familiarity	Medium
Success Metrics	Faster creation and maintenance of traceability matrices, smoother audits and reviews

Requirements Engineer

Field	Value
Name	Requirements Engineer
Role	Defines, maintains, and curates requirements and product lifecycle data
Domain	Any complex engineered system
Goals	Maintain consistent, non-contradictory requirements; manage change requests; ensure end-to-end traceability
Frustrations	High manual effort, inconsistent naming and IDs, poor impact analysis tooling
AI Familiarity	Medium
Success Metrics	Reduced cycle time for change requests; improved completeness of traceability

Product Manager / Engineering Manager

Field	Value
Name	Product Manager / Engineering Manager
Role	Aligns engineering deliverables with business goals and roadmaps
Domain	Cross-cutting, across multiple product lines
Goals	Make decisions with clear, evidence-based tradeoffs, communicate impact of changes
Frustrations	Fragmented information across teams and tools, lack of clear decision rationale
AI Familiarity	Medium
Success Metrics	Faster and more informed decision cycles, reduced surprises late in the product lifecycle

Use Cases

Use Case Summary (Engineering-Focused)

ID	Name	Description	Primary Persona	Priority
UC-1	Requirements Traceability & Impact Analysis	Build and maintain bidirectional traceability between requirements, design, code, and tests; assess impact of changes	Requirements / PLM Engineer, Systems Engineer	P0
UC-2	Design Decision Justification	Generate evidence-backed design decision records linking to requirements, alternatives, and standards	Systems / Design Engineer	P0
UC-3	Standards & Compliance Dossier Assembly	Assemble compliance evidence packages from distributed artifacts	Verification & Validation Engineer, Compliance	P1
UC-4	Incident / Defect Root-Cause Analysis	Investigate defects or incidents, linking them back to contributing requirements, design decisions, and tests	Systems Engineer, V&V Engineer	P1
UC-5	Engineering Knowledge Capture & Reuse	Capture lessons learned, patterns, and prior analyses for reuse across projects	Engineering Manager	P1

Detailed Use Case – UC-1: Requirements Traceability & Impact Analysis

Goal

Maintain accurate, bidirectional traceability between requirements, design artifacts, source code, and tests, and perform impact analysis when requirements or design elements change.

Preconditions

User is authenticated and has access to the relevant project and repositories.
Requirements, design artifacts, code, and test results are accessible via configured connectors (e.g., DOORS/ReqIF, Jira, Git, test management tools).
Requirements have stable identifiers, at least at the project level.

Main Flow

User selects a scope (e.g., project, subsystem, feature) and initiates a traceability or impact analysis request.
Planner agent interprets the request (e.g., “show missing traces,” “analyze impact of requirement X changes”).
Researcher agents fetch relevant artifacts (requirements, design docs, models, code, tests, defects) from configured tools.
Evidence is extracted and aligned into candidate trace links (e.g., requirement → design section → code module → test case).
Critic agent evaluates link quality and identifies gaps, inconsistencies, and suspicious links.
Synthesizer agent generates:
- A traceability matrix or graph.
- A narrative summary of impact (e.g., “Requirement REQ-123 affects modules A/B/C and tests T-10/T-11”).
User reviews, confirms, or edits links and impact assessment.
System persists the curated trace links and optionally pushes updates back to lifecycle tools (where integration allows).
User exports the traceability view or impact report for design reviews or change control boards.

Alternatives / Edge Cases

Artifacts are missing or inconsistent: system flags incomplete traces and suggests remediation actions.
Access is restricted for some repositories: system clearly indicates missing scopes and partial confidence.
Requirements are ambiguous or duplicate: system highlights potential duplicates/conflicts for human resolution.

Success Criteria

Trace coverage and correctness, as judged by subject-matter experts, meets agreed thresholds.
Time to perform impact analysis for a change request is significantly reduced versus baseline.
Audit or review findings report fewer traceability-related defects.

Functional Requirements

ID	Name	Description	Related Use Cases	Priority	Acceptance Criteria
FR-1	Engineering Question Decomposition	Decompose engineering questions (e.g., impact, trace gaps, design trade-offs) into sub-questions and tasks	UC-1, UC-2, UC-4	P0	For representative engineering questions, plans are judged reasonable and complete by senior engineers
FR-2	Engineering Artifact Retrieval	Perform hybrid retrieval (keyword + semantic + structured) over requirements, design docs, code, test results, standards	UC-1, UC-2, UC-3, UC-4	P0	Top-k artifacts include a majority of expert-identified relevant items in evaluation tasks
FR-3	Engineering Evidence Extraction	Extract relevant fragments (requirements clauses, design sections, code snippets, test logs) as evidence units	UC-1, UC-2, UC-3, UC-4	P0	Extracted evidence aligns with relevant sections of source artifacts in sampled cases
FR-4	Provenance and Traceability Tracking	Maintain explicit links between requirements, design decisions, code modules, tests, and defects	UC-1, UC-2, UC-3, UC-4, UC-5	P0	Generated trace graphs and matrices have no “orphan” claims; all links resolve to primary artifacts
FR-5	Multi-Agent Engineering Coordination	Coordinate planner, researcher, critic, and synthesizer agents to operate on engineering artifacts	All	P1	Logs show coherent, explainable task allocation and agent outputs per session
FR-6	Human-in-the-Loop Engineering Editing	Support engineer-driven refinement of plans, evidence sets, trace links, and reports	All	P0	Users can accept/reject links and edits; system preserves revision history and allows re-run
FR-7	Export and Tool Integration	Export outputs (trace matrices, impact reports, compliance dossiers) and integrate with lifecycle tools	UC-1, UC-2, UC-3, UC-5	P1	Users can export artifacts in agreed formats (CSV, PDF, ReqIF, JSON) and optionally sync with selected tools
FR-8	Session and Project Management	Persist and resume engineering sessions and projects with full context	All	P0	Users can resume sessions with preserved trace graphs, evidence, and notes across devices

Non-Functional Requirements (NFR)

Quality Attributes

ID	Category	Requirement	Metric / Target
NFR-1	Accuracy	Minimize unsupported or incorrect claims in engineering contexts	≤ 2% of sampled claims lack valid supporting engineering evidence (requirements, design, tests, standards)
NFR-2	Latency	Maintain interactive experience for light queries	p95 latency < 1 second for simple refinement operations
NFR-3	Throughput	Support concurrent engineering sessions	≥ 1000 active sessions in production environment
NFR-4	Availability	Ensure high uptime for engineering-critical teams	≥ 99.5% monthly availability
NFR-5	Cost Efficiency	Keep average compute cost per engineering session bounded	Average GPU cost per engineering session below defined threshold
NFR-6	Auditability	Provide full audit trail for engineering sessions	100% of sessions have logs with tool calls, agents involved, and outputs
NFR-7	Interpretability	Expose rationale for key actions and source choices	Engineers can inspect per-step reasoning summaries or logs and reconstruct key traces

UX / Usability

Requirement	Description
Clear engineering citation surfacing	Engineers can easily view and navigate to underlying artifacts (requirements, design docs, code, tests, standards) for each claim or link
Session continuity	Engineers can resume sessions from any device with preserved progress, including open analysis threads and partial trace graphs
Progressive disclosure	UI shows high-level summaries (e.g., impact overviews) first, with detailed trace graphs and evidence on demand
Safe defaults	Warnings and disclaimers accompany high-risk assumptions or model inferences; UI encourages double-checking for safety-critical decisions

Data, Constraints, and Dependencies

Data Sources

Source	Type	Access Method	License / Terms	Notes
Requirements Repository (e.g., DOORS/ReqIF, Jama, Polarion)	Internal requirements	API / export connectors	Organization internal data policy	Includes functional, non-functional, safety, and regulatory requirements
Design Repository (e.g., Confluence, SharePoint, Model Repos)	Internal design docs/models	API / file connectors	Internal policies	Architecture docs, interface specs, models (SysML/UML, Simulink, etc.)
Code Repositories (e.g., Git)	Source code	Git APIs / SSH / mirrors	Internal policies, OSS licenses	Implementation-level evidence and design patterns
Test Management & CI Logs	Test cases, results, logs	APIs / file ingestion	Internal policies	Evidence for verification and validation
Standards & Regulations (e.g., PDFs from standards bodies)	External standards	Secure document store / curated library	Vendor or standards body licenses	Normative references for compliance

Technical Constraints

ID	Constraint	Description	Impact
C-1	Compute budget	Fixed GPU/CPU budget per month	Limits model sizes, context windows, and concurrency targets
C-2	Data residency	Engineering data must remain in specific regions	Influences deployment topology and cloud/on-prem splits
C-3	Approved model list	Only vetted LLM families may be used for regulated domains	Restricts experimentation with new models in production
C-4	Network egress limitations	Limits on external requests from production clusters	Encourages local indexing and caching; external web access may be disabled per tenant

External Dependencies

Dependency	Type	Owner / Provider	Risk if Unavailable
Lifecycle Tool APIs (e.g., DOORS, Jira, Jama, ALM)	External/internal	Tool owners / IT	High – degraded traceability and integration
Identity Provider	Internal	IT / Security	High – engineers cannot authenticate or access project data
Vector Database	Internal	Data Platform	High – retrieval becomes limited or slow
Logging / Metrics	Internal	SRE / Platform	Medium – reduced observability and compliance auditability

Competitive Landscape

Competitor / Tool	Segment	Key Capabilities	Weaknesses / Gaps	Our Differentiators
Traditional ELM + RM tools (e.g., DOORS, Jama, Polarion)	Requirements & lifecycle management	Robust requirements storage, baselines, trace fields	Heavy manual maintenance, weak AI/automation, limited cross-repo intelligence	Deep Evidence Agents add automated evidence gathering, impact analysis, and reasoning on top of existing ELM stacks
Generic LLM Chat / Copilot tools	Generic coding and Q&A assistants	Conversational answers, code completions	Weak provenance, no explicit engineering traceability	Engineering-specific workflows, strict provenance and trace graphs
Enterprise search & discovery tools	Enterprise-wide search and basic analytics	Indexing and search across document repositories	Limited multi-step reasoning, no engineering semantics	Engineering domain models, multi-agent reasoning, and standards-aware evidence handling
Synera and similar AI-in-engineering platforms	Generative design & automation	Workflow automation, simulation and optimization in CAD/CAE	Less focused on textual requirements, traceability, and evidence-grounded reasoning	Focus on deep evidence and requirements/traceability workflows, complementing simulation and design automation

Assumptions

ID	Assumption	Rationale	Risk if False
A-1	Access to core internal engineering artifacts	Required for meaningful traceability and impact analysis	Platform may be perceived as “toy” or superficial
A-2	Availability of GPU resources	Needed for LLM inference at target latency	System may need to fall back to degraded modes or smaller models
A-3	Users possess baseline engineering literacy	Needed to interpret trace graphs, evidence, and limitations	Risk of misinterpreting agent outputs or over-trusting suggestions
A-4	Internal data owners approve corpus ingestion	Required for use of requirements, design docs, code, and tests	Reduced value from missing key repositories; partial traceability

Metrics and KPIs

Product KPIs

KPI	Definition	Target	Measurement Method
Time-to-Impact-Analysis	Time from change request to acceptable impact report	≥ 50% reduction vs baseline tools	Time-tracking and telemetry
Traceability Completeness	Fraction of requirements with links to design/code/tests	≥ 95% for in-scope projects	Automated trace graph evaluation
Citation / Evidence Completeness	Fraction of engineering claims with at least one evidence link	≥ 98%	Automated evaluation of reports and trace graphs
User Satisfaction (Engineers)	Average post-session rating (1–5)	≥ 4.5	In-product feedback prompts

Business KPIs

KPI	Definition	Target	Measurement Method
Engineering Team Adoption	Number of engineering teams actively using tool	5+ pilot teams in first year, 10+ in second year	Account and usage data
Reduction in Audit Findings	Percentage reduction in traceability-related findings from audits	≥ 30% reduction vs prior cycle	Audit reports and CAPA tracking
Retention	Users returning within 30 days	≥ 60% for target cohorts	Cohort retention analysis

Security, Privacy, Compliance, and Governance

Security Requirements

ID	Requirement	Description	Priority
SEC-1	Authentication & Authorization	All access uses organization SSO and role-based access controls aligned with engineering project structures	P0
SEC-2	Encryption	Engineering data in transit and at rest encrypted per organizational and regulatory standards	P0
SEC-3	Tenant / Project Isolation	Sessions and data isolated by tenant, program, and project	P0
SEC-4	Secure tool invocation	Tools (retrieval, SCM, lifecycle systems) follow allowlists and policies; no arbitrary code execution on production systems	P0

Privacy Requirements

ID	Requirement	Description	Applicable Regulations
PRIV-1	PII Handling	Clearly define and limit how PII appears in engineering corpora and logs	GDPR, CCPA, internal
PRIV-2	Data Minimization	Collect and store only necessary user/session and artifact metadata	GDPR principles
PRIV-3	Retention & Deletion	Implement retention windows and deletion workflows, including project-level data removal where required	Org data policy

Compliance and Policy Alignment

Policy / Regulation	Impact on Product	Required Controls
Internal AI Use Policy	Defines permitted AI use cases and limitations for engineering tools	Guardrails, logging, and periodic compliance review
Data Usage Policy	Governs use of internal engineering and external data	Source whitelists, access control, audit logs
Domain-specific rules	E.g., ISO 26262, DO-178C, IEC 61508, FDA submissions	Domain-specific disclaimers, process hooks, and explicit non-automation of regulated approval steps

Risks and Mitigations

ID	Risk	Likelihood	Impact	Mitigation
R-1	Hallucinations / unsupported claims in engineering outputs	Medium	High	Strict evidence enforcement, critic agents, domain-specific evaluation datasets, and continuous evaluation
R-2	External or lifecycle tool API outages / rate limits	Medium	Medium	Caching, local indexing, robust connectors, clear degraded-mode behavior
R-3	Legal / licensing issues with standards and proprietary docs	Low	High	Legal review of data sources, explicit license tracking, curated ingestion
R-4	User over-reliance on AI outputs for safety-critical decisions	Medium	High	Warnings, training, enforced human review steps, and domain-specific disclaimers
R-5	Data leakage or privacy violations	Low	High	Strong access controls, encryption, redaction, and regular security audits
R-6	Model drift / quality degradation	Medium	Medium	Scheduled evaluation on engineering-specific benchmarks, model versioning, and retraining policies

Release Plan and Roadmap

Phased Delivery

Phase	Timeframe	Scope	Key Deliverables	Exit Criteria
Phase 1 – MVP	Q1–Q2	Single-agent retrieval + summarization; basic engineering provenance	Prototype UI, connectors to at least one requirements repo and code repo, initial evaluation suite	1–2 pilot engineering teams, acceptable quality and usability feedback
Phase 2 – Multi-Agent Engineering	Q3	Planner, researcher, critic agents; improved traceability and impact analysis	Multi-agent orchestrator, initial trace graph model, engineering-focused evaluation	Accuracy, coverage, and usability targets met in traceability and impact analysis tasks
Phase 3 – Enterprise Rollout	Q4	Scaling, governance, domain models, monitoring and alerting	Production deployment, SLO dashboards, access controls, support for multiple projects and tenants	SLA met, 5+ engineering teams fully onboarded with positive ROI metrics

Dependencies on Architecture / Platform

Item	Description	Owner / Team	Needed By
Vector search & Graph store	Indexing and graph-based retrieval for engineering artifacts	Data Platform	Phase 2
Model serving platform	Hosting and scaling of LLMs	ML Platform	Phase 1
Lifecycle connectors	Connectors to requirements, design, code, and test tools	Platform / Integration Team	Phase 1–2
Observability stack	Logging, metrics, tracing	SRE / Platform	Phase 1

Open Questions

ID	Question	Owner	Target Resolution Date
OQ-1	Which primary LLM family to standardize on for engineering tasks?	ML Platform Lead	TBD
OQ-2	Which long-term storage format and technology for provenance / trace graphs?	Data Architect	TBD
OQ-3	Which domains and standards (e.g., ISO 26262 vs DO-178C) get domain-specialized models and evaluation first?	Product Lead	TBD

Traceability to Architecture (ADD)

This section links key PRD items to architecture components and views described in the Architecture Description Document.

PRD Item	Description	ADD Section / Component	Notes
FR-1, FR-5	Engineering question decomposition and multi-agent coordination	Planning & Coordination Module, Orchestrator	Multi-agent planning flows
FR-2, FR-3	Engineering artifact retrieval and evidence extraction	Retrieval & Exploration Module, Ingestion Layer	Hybrid search, parsers, connectors
FR-4	Provenance and traceability tracking	Evidence Management & Provenance Model	Graph or document store
NFR-1, NFR-6	Accuracy, auditability	Evaluation & Monitoring	Quality metrics, logs, and dashboards
G-5, SEC-*	Safety and governance	Safety & Governance Layer	Guardrails, policies, and policy engine

Appendix

Glossary

Term	Definition
Agent	An autonomous process that uses models and tools to perform engineering tasks
Provenance	The trace from an output claim or decision back to the supporting sources and evidence (requirements, design, code, tests, standards)
RAG	Retrieval-Augmented Generation: combining retrieval with generative models
Corpus	A collection of engineering documents and artifacts used for retrieval and analysis
Session	A logically scoped, persistent interaction between a user and the platform
Traceability	Ability to follow the life of a requirement forwards and backwards through design, implementation, verification, and operation

Reference Links

Architecture Description Document: AI Deep Research / Deep Evidence Agent – Architecture Description v1
UX and UI prototypes (e.g., Figma link)
Data inventories and catalog entries for engineering repositories
Evaluation frameworks and benchmark descriptions for engineering tasks
Internal AI policy and data usage policy documents

Problem Statement​

Current Situation​

Goals​

User Personas​

Systems Engineer (Primary)​

Verification & Validation Engineer​

Requirements Engineer​

Product Manager / Engineering Manager​

Use Cases​

Use Case Summary (Engineering-Focused)​

Detailed Use Case – UC-1: Requirements Traceability & Impact Analysis​

Functional Requirements​

Non-Functional Requirements (NFR)​

Quality Attributes​

UX / Usability​

Data, Constraints, and Dependencies​

Data Sources​

Technical Constraints​

External Dependencies​

Competitive Landscape​

Assumptions​

Metrics and KPIs​

Product KPIs​

Business KPIs​

Security, Privacy, Compliance, and Governance​

Security Requirements​

Privacy Requirements​

Compliance and Policy Alignment​

Risks and Mitigations​

Release Plan and Roadmap​

Phased Delivery​

Dependencies on Architecture / Platform​

Open Questions​

Traceability to Architecture (ADD)​

Appendix​

Glossary​

Reference Links​