Skip to main content

PRD - Deep Evidence Agent

Problem Statement

Current Situation

AI agents offer unprecedented assistance to engineering organizations by continually monitoring and ingesting all sorts of internal to the organization data repositories. In its most elaborate form, graph-based retrieval augmented generation (GraphRAG) systems can create internal Knowledge Bases (KB) and allow foundational Large Language Models (fLLMs) generate information that is conditioned on these KBs. However, existing tools fall short in enabling deep, evidence-grounded generative AI tasks that ensure traceability, reliability and truthfulness. We provide below few scenarios where current AI agents fail to deliver on their promises.

Countless hours are spent in large engineering organizations manually tracing design decisions back to original requirements documents. Take for example how a tool like IBM's Rational DOORS is used in safety-critical engineering domains to manage requirements and offer traceability to design and test artifacts.

The decades-old tool is by many accounts a glorified combination of a relational database, MS Word and Excel. Current AI assistants can more or less replicate DOORS and even offer with custom plugins a seamless migration path off it, but only a handful of companies have the resources to invest in such migration and in the re-education of their engineering base. The conservative nature of engineers also stands in the way.

Todays Engineering Lifecycle Management (ELM) space is thirsty for an infusion of AI capabilities and a complete rethinking of engineering processes that are full of manual, tedious and error prone work: Capturing functional and non-functional requirements, tracing design decisions and test results, acceptance criteria and constantly doing change management tasks such as updating documentation from a hundreds of factors that can introduce variations. Such error prone engineering processes increase both product risk and over time technical debt and can lead to very expensive redesigning or patching up much later. .

Many counteract the ability of AI agents based on pretrained LLMs such as ChatGPT, Claude and others to be in fact very helpful in engineering tasks. However, such models are not designed to do deep evidence-grounded reasoning and the very nature of pretraining at scale, using trillions of internet tokens, goes in the exact opposite direction of engineering AI agents whose utility is based on specificity, accuracy and traceability to evidence. Deep Evidence Agents (DEAs) need to be designed from the ground up to support engineering processes that are evidence-grounded, traceable and auditable. These AI agents don't try to be pleasant, conversational assistants but rather focused, task-oriented researchers that help engineers do their work better, faster and with higher confidence. They also know when to say "I dont know".

Summarising the pain points faced by engineering organizations today when trying to use AI agents for deep engineering tasks:

Pain PointDescriptionImpact
Information overloadEngineers make design decisions with incomplete information or assumptions.High
Manual evidence handlingCopy-paste of excerpts, tables, and figures into word documents or slidesHigh
Lack of traceabilityClaims not reliably linked to primary sourcesHigh
Fragmented toolingSearch, note-taking, and reporting tools are disconnectedMedium
Limited reusePrior lessons learned is hard to discover and adapt for new questions or assumptionsMedium

Goals

IDGoalDescriptionPriority
G-1End-to-end engineering workflowsEngineering task → plan → search → evidence → synthesis → reportP0
G-2Provenance-aware engineering outputsEvery factual claim, design justification, or test conclusion traceable to one or more primary artifacts (requirements, design models, code, test reports, standards)P0
G-3Multi-agent reasoning for engineering tasksPlanner, researcher, critic, and summarizer agents with clear coordination over engineering artifactsP1
G-4Human-in-the-loop engineering reviewInteractive refinement of plans, evidence sets, and drafts by engineers, architects, and QAP0
G-5Safe and compliant model usageGuardrails, policies, and logging aligned with engineering standards (where applicable)P0
G-6Integration with engineering data lakesAbility to use internal engineering corpora (requirements tools, PLM, code, test systems) under access controls policiesP1

User Personas

Systems Engineer (Primary)

FieldValue
NameSystems Engineer
RoleDefines and refines system architecture, interfaces, and component behavior
DomainTelecom, aerospace, automotive, medical devices, finance, other safety-/mission-critical domains
GoalsMake design decisions consistent with requirements and standards; understand the impact of changes quickly
FrustrationsManual traceability, ambiguous requirements, disconnected tools, and incomplete justification on design decisions
AI FamiliarityMedium to high
Success MetricsReduced effort to define, trace from requirements to implementation, fewer design defects linked to misunderstood or incorrect assumptions

Verification & Validation Engineer

FieldValue
NameVerification & Validation Engineer
RolePlans, designs, and executes tests to ensure requirements are met
DomainSame as systems engineer
GoalsConfirm coverage against requirements and safety goall, understand the rationale for test selection and prioritization
FrustrationsDifficulty mapping tests to requirements and acceptance criteria; manual evidence compilation for audits
AI FamiliarityMedium
Success MetricsFaster creation and maintenance of traceability matrices, smoother audits and reviews

Requirements Engineer

FieldValue
NameRequirements Engineer
RoleDefines, maintains, and curates requirements and product lifecycle data
DomainAny complex engineered system
GoalsMaintain consistent, non-contradictory requirements; manage change requests; ensure end-to-end traceability
FrustrationsHigh manual effort, inconsistent naming and IDs, poor impact analysis tooling
AI FamiliarityMedium
Success MetricsReduced cycle time for change requests; improved completeness of traceability

Product Manager / Engineering Manager

FieldValue
NameProduct Manager / Engineering Manager
RoleAligns engineering deliverables with business goals and roadmaps
DomainCross-cutting, across multiple product lines
GoalsMake decisions with clear, evidence-based tradeoffs, communicate impact of changes
FrustrationsFragmented information across teams and tools, lack of clear decision rationale
AI FamiliarityMedium
Success MetricsFaster and more informed decision cycles, reduced surprises late in the product lifecycle

Use Cases

Use Case Summary (Engineering-Focused)

IDNameDescriptionPrimary PersonaPriority
UC-1Requirements Traceability & Impact AnalysisBuild and maintain bidirectional traceability between requirements, design, code, and tests; assess impact of changesRequirements / PLM Engineer, Systems EngineerP0
UC-2Design Decision JustificationGenerate evidence-backed design decision records linking to requirements, alternatives, and standardsSystems / Design EngineerP0
UC-3Standards & Compliance Dossier AssemblyAssemble compliance evidence packages from distributed artifactsVerification & Validation Engineer, ComplianceP1
UC-4Incident / Defect Root-Cause AnalysisInvestigate defects or incidents, linking them back to contributing requirements, design decisions, and testsSystems Engineer, V&V EngineerP1
UC-5Engineering Knowledge Capture & ReuseCapture lessons learned, patterns, and prior analyses for reuse across projectsEngineering ManagerP1

Detailed Use Case – UC-1: Requirements Traceability & Impact Analysis

Goal

Maintain accurate, bidirectional traceability between requirements, design artifacts, source code, and tests, and perform impact analysis when requirements or design elements change.

Preconditions

  • User is authenticated and has access to the relevant project and repositories.
  • Requirements, design artifacts, code, and test results are accessible via configured connectors (e.g., DOORS/ReqIF, Jira, Git, test management tools).
  • Requirements have stable identifiers, at least at the project level.

Main Flow

  1. User selects a scope (e.g., project, subsystem, feature) and initiates a traceability or impact analysis request.
  2. Planner agent interprets the request (e.g., “show missing traces,” “analyze impact of requirement X changes”).
  3. Researcher agents fetch relevant artifacts (requirements, design docs, models, code, tests, defects) from configured tools.
  4. Evidence is extracted and aligned into candidate trace links (e.g., requirement → design section → code module → test case).
  5. Critic agent evaluates link quality and identifies gaps, inconsistencies, and suspicious links.
  6. Synthesizer agent generates:
    • A traceability matrix or graph.
    • A narrative summary of impact (e.g., “Requirement REQ-123 affects modules A/B/C and tests T-10/T-11”).
  7. User reviews, confirms, or edits links and impact assessment.
  8. System persists the curated trace links and optionally pushes updates back to lifecycle tools (where integration allows).
  9. User exports the traceability view or impact report for design reviews or change control boards.

Alternatives / Edge Cases

  • Artifacts are missing or inconsistent: system flags incomplete traces and suggests remediation actions.
  • Access is restricted for some repositories: system clearly indicates missing scopes and partial confidence.
  • Requirements are ambiguous or duplicate: system highlights potential duplicates/conflicts for human resolution.

Success Criteria

  • Trace coverage and correctness, as judged by subject-matter experts, meets agreed thresholds.
  • Time to perform impact analysis for a change request is significantly reduced versus baseline.
  • Audit or review findings report fewer traceability-related defects.

Functional Requirements

IDNameDescriptionRelated Use CasesPriorityAcceptance Criteria
FR-1Engineering Question DecompositionDecompose engineering questions (e.g., impact, trace gaps, design trade-offs) into sub-questions and tasksUC-1, UC-2, UC-4P0For representative engineering questions, plans are judged reasonable and complete by senior engineers
FR-2Engineering Artifact RetrievalPerform hybrid retrieval (keyword + semantic + structured) over requirements, design docs, code, test results, standardsUC-1, UC-2, UC-3, UC-4P0Top-k artifacts include a majority of expert-identified relevant items in evaluation tasks
FR-3Engineering Evidence ExtractionExtract relevant fragments (requirements clauses, design sections, code snippets, test logs) as evidence unitsUC-1, UC-2, UC-3, UC-4P0Extracted evidence aligns with relevant sections of source artifacts in sampled cases
FR-4Provenance and Traceability TrackingMaintain explicit links between requirements, design decisions, code modules, tests, and defectsUC-1, UC-2, UC-3, UC-4, UC-5P0Generated trace graphs and matrices have no “orphan” claims; all links resolve to primary artifacts
FR-5Multi-Agent Engineering CoordinationCoordinate planner, researcher, critic, and synthesizer agents to operate on engineering artifactsAllP1Logs show coherent, explainable task allocation and agent outputs per session
FR-6Human-in-the-Loop Engineering EditingSupport engineer-driven refinement of plans, evidence sets, trace links, and reportsAllP0Users can accept/reject links and edits; system preserves revision history and allows re-run
FR-7Export and Tool IntegrationExport outputs (trace matrices, impact reports, compliance dossiers) and integrate with lifecycle toolsUC-1, UC-2, UC-3, UC-5P1Users can export artifacts in agreed formats (CSV, PDF, ReqIF, JSON) and optionally sync with selected tools
FR-8Session and Project ManagementPersist and resume engineering sessions and projects with full contextAllP0Users can resume sessions with preserved trace graphs, evidence, and notes across devices

Non-Functional Requirements (NFR)

Quality Attributes

IDCategoryRequirementMetric / Target
NFR-1AccuracyMinimize unsupported or incorrect claims in engineering contexts≤ 2% of sampled claims lack valid supporting engineering evidence (requirements, design, tests, standards)
NFR-2LatencyMaintain interactive experience for light queriesp95 latency < 1 second for simple refinement operations
NFR-3ThroughputSupport concurrent engineering sessions≥ 1000 active sessions in production environment
NFR-4AvailabilityEnsure high uptime for engineering-critical teams≥ 99.5% monthly availability
NFR-5Cost EfficiencyKeep average compute cost per engineering session boundedAverage GPU cost per engineering session below defined threshold
NFR-6AuditabilityProvide full audit trail for engineering sessions100% of sessions have logs with tool calls, agents involved, and outputs
NFR-7InterpretabilityExpose rationale for key actions and source choicesEngineers can inspect per-step reasoning summaries or logs and reconstruct key traces

UX / Usability

RequirementDescription
Clear engineering citation surfacingEngineers can easily view and navigate to underlying artifacts (requirements, design docs, code, tests, standards) for each claim or link
Session continuityEngineers can resume sessions from any device with preserved progress, including open analysis threads and partial trace graphs
Progressive disclosureUI shows high-level summaries (e.g., impact overviews) first, with detailed trace graphs and evidence on demand
Safe defaultsWarnings and disclaimers accompany high-risk assumptions or model inferences; UI encourages double-checking for safety-critical decisions

Data, Constraints, and Dependencies

Data Sources

SourceTypeAccess MethodLicense / TermsNotes
Requirements Repository (e.g., DOORS/ReqIF, Jama, Polarion)Internal requirementsAPI / export connectorsOrganization internal data policyIncludes functional, non-functional, safety, and regulatory requirements
Design Repository (e.g., Confluence, SharePoint, Model Repos)Internal design docs/modelsAPI / file connectorsInternal policiesArchitecture docs, interface specs, models (SysML/UML, Simulink, etc.)
Code Repositories (e.g., Git)Source codeGit APIs / SSH / mirrorsInternal policies, OSS licensesImplementation-level evidence and design patterns
Test Management & CI LogsTest cases, results, logsAPIs / file ingestionInternal policiesEvidence for verification and validation
Standards & Regulations (e.g., PDFs from standards bodies)External standardsSecure document store / curated libraryVendor or standards body licensesNormative references for compliance

Technical Constraints

IDConstraintDescriptionImpact
C-1Compute budgetFixed GPU/CPU budget per monthLimits model sizes, context windows, and concurrency targets
C-2Data residencyEngineering data must remain in specific regionsInfluences deployment topology and cloud/on-prem splits
C-3Approved model listOnly vetted LLM families may be used for regulated domainsRestricts experimentation with new models in production
C-4Network egress limitationsLimits on external requests from production clustersEncourages local indexing and caching; external web access may be disabled per tenant

External Dependencies

DependencyTypeOwner / ProviderRisk if Unavailable
Lifecycle Tool APIs (e.g., DOORS, Jira, Jama, ALM)External/internalTool owners / ITHigh – degraded traceability and integration
Identity ProviderInternalIT / SecurityHigh – engineers cannot authenticate or access project data
Vector DatabaseInternalData PlatformHigh – retrieval becomes limited or slow
Logging / MetricsInternalSRE / PlatformMedium – reduced observability and compliance auditability

Competitive Landscape

Competitor / ToolSegmentKey CapabilitiesWeaknesses / GapsOur Differentiators
Traditional ELM + RM tools (e.g., DOORS, Jama, Polarion)Requirements & lifecycle managementRobust requirements storage, baselines, trace fieldsHeavy manual maintenance, weak AI/automation, limited cross-repo intelligenceDeep Evidence Agents add automated evidence gathering, impact analysis, and reasoning on top of existing ELM stacks
Generic LLM Chat / Copilot toolsGeneric coding and Q&A assistantsConversational answers, code completionsWeak provenance, no explicit engineering traceabilityEngineering-specific workflows, strict provenance and trace graphs
Enterprise search & discovery toolsEnterprise-wide search and basic analyticsIndexing and search across document repositoriesLimited multi-step reasoning, no engineering semanticsEngineering domain models, multi-agent reasoning, and standards-aware evidence handling
Synera and similar AI-in-engineering platformsGenerative design & automationWorkflow automation, simulation and optimization in CAD/CAELess focused on textual requirements, traceability, and evidence-grounded reasoningFocus on deep evidence and requirements/traceability workflows, complementing simulation and design automation

Assumptions

IDAssumptionRationaleRisk if False
A-1Access to core internal engineering artifactsRequired for meaningful traceability and impact analysisPlatform may be perceived as “toy” or superficial
A-2Availability of GPU resourcesNeeded for LLM inference at target latencySystem may need to fall back to degraded modes or smaller models
A-3Users possess baseline engineering literacyNeeded to interpret trace graphs, evidence, and limitationsRisk of misinterpreting agent outputs or over-trusting suggestions
A-4Internal data owners approve corpus ingestionRequired for use of requirements, design docs, code, and testsReduced value from missing key repositories; partial traceability

Metrics and KPIs

Product KPIs

KPIDefinitionTargetMeasurement Method
Time-to-Impact-AnalysisTime from change request to acceptable impact report≥ 50% reduction vs baseline toolsTime-tracking and telemetry
Traceability CompletenessFraction of requirements with links to design/code/tests≥ 95% for in-scope projectsAutomated trace graph evaluation
Citation / Evidence CompletenessFraction of engineering claims with at least one evidence link≥ 98%Automated evaluation of reports and trace graphs
User Satisfaction (Engineers)Average post-session rating (1–5)≥ 4.5In-product feedback prompts

Business KPIs

KPIDefinitionTargetMeasurement Method
Engineering Team AdoptionNumber of engineering teams actively using tool5+ pilot teams in first year, 10+ in second yearAccount and usage data
Reduction in Audit FindingsPercentage reduction in traceability-related findings from audits≥ 30% reduction vs prior cycleAudit reports and CAPA tracking
RetentionUsers returning within 30 days≥ 60% for target cohortsCohort retention analysis

Security, Privacy, Compliance, and Governance

Security Requirements

IDRequirementDescriptionPriority
SEC-1Authentication & AuthorizationAll access uses organization SSO and role-based access controls aligned with engineering project structuresP0
SEC-2EncryptionEngineering data in transit and at rest encrypted per organizational and regulatory standardsP0
SEC-3Tenant / Project IsolationSessions and data isolated by tenant, program, and projectP0
SEC-4Secure tool invocationTools (retrieval, SCM, lifecycle systems) follow allowlists and policies; no arbitrary code execution on production systemsP0

Privacy Requirements

IDRequirementDescriptionApplicable Regulations
PRIV-1PII HandlingClearly define and limit how PII appears in engineering corpora and logsGDPR, CCPA, internal
PRIV-2Data MinimizationCollect and store only necessary user/session and artifact metadataGDPR principles
PRIV-3Retention & DeletionImplement retention windows and deletion workflows, including project-level data removal where requiredOrg data policy

Compliance and Policy Alignment

Policy / RegulationImpact on ProductRequired Controls
Internal AI Use PolicyDefines permitted AI use cases and limitations for engineering toolsGuardrails, logging, and periodic compliance review
Data Usage PolicyGoverns use of internal engineering and external dataSource whitelists, access control, audit logs
Domain-specific rulesE.g., ISO 26262, DO-178C, IEC 61508, FDA submissionsDomain-specific disclaimers, process hooks, and explicit non-automation of regulated approval steps

Risks and Mitigations

IDRiskLikelihoodImpactMitigation
R-1Hallucinations / unsupported claims in engineering outputsMediumHighStrict evidence enforcement, critic agents, domain-specific evaluation datasets, and continuous evaluation
R-2External or lifecycle tool API outages / rate limitsMediumMediumCaching, local indexing, robust connectors, clear degraded-mode behavior
R-3Legal / licensing issues with standards and proprietary docsLowHighLegal review of data sources, explicit license tracking, curated ingestion
R-4User over-reliance on AI outputs for safety-critical decisionsMediumHighWarnings, training, enforced human review steps, and domain-specific disclaimers
R-5Data leakage or privacy violationsLowHighStrong access controls, encryption, redaction, and regular security audits
R-6Model drift / quality degradationMediumMediumScheduled evaluation on engineering-specific benchmarks, model versioning, and retraining policies

Release Plan and Roadmap

Phased Delivery

PhaseTimeframeScopeKey DeliverablesExit Criteria
Phase 1 – MVPQ1–Q2Single-agent retrieval + summarization; basic engineering provenancePrototype UI, connectors to at least one requirements repo and code repo, initial evaluation suite1–2 pilot engineering teams, acceptable quality and usability feedback
Phase 2 – Multi-Agent EngineeringQ3Planner, researcher, critic agents; improved traceability and impact analysisMulti-agent orchestrator, initial trace graph model, engineering-focused evaluationAccuracy, coverage, and usability targets met in traceability and impact analysis tasks
Phase 3 – Enterprise RolloutQ4Scaling, governance, domain models, monitoring and alertingProduction deployment, SLO dashboards, access controls, support for multiple projects and tenantsSLA met, 5+ engineering teams fully onboarded with positive ROI metrics

Dependencies on Architecture / Platform

ItemDescriptionOwner / TeamNeeded By
Vector search & Graph storeIndexing and graph-based retrieval for engineering artifactsData PlatformPhase 2
Model serving platformHosting and scaling of LLMsML PlatformPhase 1
Lifecycle connectorsConnectors to requirements, design, code, and test toolsPlatform / Integration TeamPhase 1–2
Observability stackLogging, metrics, tracingSRE / PlatformPhase 1

Open Questions

IDQuestionOwnerTarget Resolution Date
OQ-1Which primary LLM family to standardize on for engineering tasks?ML Platform LeadTBD
OQ-2Which long-term storage format and technology for provenance / trace graphs?Data ArchitectTBD
OQ-3Which domains and standards (e.g., ISO 26262 vs DO-178C) get domain-specialized models and evaluation first?Product LeadTBD

Traceability to Architecture (ADD)

This section links key PRD items to architecture components and views described in the Architecture Description Document.

PRD ItemDescriptionADD Section / ComponentNotes
FR-1, FR-5Engineering question decomposition and multi-agent coordinationPlanning & Coordination Module, OrchestratorMulti-agent planning flows
FR-2, FR-3Engineering artifact retrieval and evidence extractionRetrieval & Exploration Module, Ingestion LayerHybrid search, parsers, connectors
FR-4Provenance and traceability trackingEvidence Management & Provenance ModelGraph or document store
NFR-1, NFR-6Accuracy, auditabilityEvaluation & MonitoringQuality metrics, logs, and dashboards
G-5, SEC-*Safety and governanceSafety & Governance LayerGuardrails, policies, and policy engine

Appendix

Glossary

TermDefinition
AgentAn autonomous process that uses models and tools to perform engineering tasks
ProvenanceThe trace from an output claim or decision back to the supporting sources and evidence (requirements, design, code, tests, standards)
RAGRetrieval-Augmented Generation: combining retrieval with generative models
CorpusA collection of engineering documents and artifacts used for retrieval and analysis
SessionA logically scoped, persistent interaction between a user and the platform
TraceabilityAbility to follow the life of a requirement forwards and backwards through design, implementation, verification, and operation
  • Architecture Description Document: AI Deep Research / Deep Evidence Agent – Architecture Description v1
  • UX and UI prototypes (e.g., Figma link)
  • Data inventories and catalog entries for engineering repositories
  • Evaluation frameworks and benchmark descriptions for engineering tasks
  • Internal AI policy and data usage policy documents