due-diligence
investor-tips
ai startup
Modern Due Diligence for AI-Native Startups

Adhrita Nowrin
Dec 10, 2025
Modern investors need to diligence AI-native startups differently. Models scale in unpredictable ways, costs fluctuate with provider APIs, and accuracy can drift overnight.
This framework outlines how to test model risk, data rights, infra cost sensitivity, evaluation reproducibility, and per-request unit economics before issuing a term sheet.
Key facts:
Cost to serve must be rebuilt from invoices and model parameters to verify positive contribution per request.
Accuracy claims require a seeded, reproducible evaluation harness with a stable gold set and clear acceptance thresholds.
Data rights and provenance need documented licences or user consent, plus opt-out and PII policies.
Provider concentration should be limited or paired with a credible portability plan that maintains quality and margin.
Safety and governance include abuse handling, audit trails, and incident response.
Definitions and formulas:
Cost to serve per request: (Input tokens × provider rate) + safety filters + storage
Contribution per request: Price per request − cost to serve per request.
Eval harness: Fixed prompts, seeded randomness, gold labels and scoring metrics to measure quality over time.
Provider concentration: Share of cost or traffic tied to a single model vendor.
SLO: Service level objective, typically p95 latency and success rate.
Diligence Pillars:
AI products can scale quickly, but margins and quality often collapse under real load. This checklist helps you validate durability before you issue a term sheet.
Model risk
Identify the foundation model, fine-tunes, and prompts that materially affect output.
Test portability**:** Can they switch providers without breaking quality or margin?
Review the roadmap for open-weight or on-prem options in case pricing shifts.
Data rights and provenance
Confirm licences, user consent and opt-outs for training and fine-tuning data.
Audit PII handling, retention windows and data residency policies.
Request a data lineage sample linking inputs to outputs for governance maturity.
Infra cost sensitivity
Rebuild per-request cost from invoices, model parameters and moderation filters.
Stress test under expected load and latency targets.
Map cost step-ups at token or throughput thresholds that could erode margin.
Accuracy and evaluation
Require a seeded, reproducible evaluation harness with a fixed gold set.
Quantify lift from fine-tunes or RAG vs. base models.
Track drift across prompt and version changes.
Unit economics and pricing
Compute contribution per request and per customer at current mix.
Test price corridors with ±15% shifts and confirm margin stability.
Validate that unit economics and performance margins remain stable as usage scales, without compromising on safety or moderation systems.
Safety and governance
Review abuse detection, escalation and incident response playbooks.
Ensure audit trails exist for data access, prompt changes and model versions.
Validate compliance needs for the sector, for example SOC 2 or GDPR.
Red flags that warrant a pause
No data rights on training sources or PII in training without consent.
Accuracy claims without a reproducible evaluation harness.
Margins turning negative at scale
Heavy dependence on one provider with no portability plan
FAQs
1) How do I verify accuracy claims quickly?
Use their evaluation harness with fixed prompts, seeded randomness and a gold set. Reproduce results independently.
2) How do I confirm cost to serve is real?
Rebuild it from provider invoices, model parameters, and moderation/storage costs, then compare to pricing per request.
3) What indicates defensibility beyond a popular base model?
Proprietary data rights, closed-loop feedback that improves quality, measurable lift from fine-tunes and proven portability across providers.
4) What should be in the first term sheet conditions for AI deals?
Data rights verification, version-controlled evaluation harness, and cost-to-serve reporting by segment.




