The Platform that Changed Pharma

AI Drug Discovery in America · 2012 – 2026

The United States invented AI-first drug discovery. It deployed more capital, more talent, and more compute than any other country. It still has no FDA approval and it is losing the productivity race to China. This is the audit.

0
Active US Companies
$0
Capital Deployed
0
Clinical-Stage Assets
0
FDA Approvals (Yet)
01 · Executive Summary

The country that invented the field is no longer winning it.

Fourteen years after Atomwise shipped the first convolutional-neural-net virtual screener, the United States remains the undisputed capital and talent hub of AI drug discovery. It also remains an industry whose headline productivity records are being set 12,000 kilometres away.

The American Lead

The first dedicated AI drug-discovery company (Atomwise, 2012) was American. So were Recursion (2013), Insilico's original Baltimore incorporation (2014), and Relay (2016). AlphaFold 2 came out of DeepMind but its open-source release in July 2021 and 43,000+ citations rewired every US pharma computational chemistry team. By 2024 US-based AI-native biotechs had raised roughly $27–30 billion in venture and public equity, with another $50 billion+ in pharma-deal biobucks layered on top.

Two Americans share the 2024 Nobel Prize in Chemistry. Most of the sector's open-source models (RFdiffusion, ESM, Chroma) trace to US labs. NVIDIA's BioNeMo is an American product. The talent stack is not the problem.

The Productivity Problem

No US-origin AI-designed drug has received FDA approval as of Q2 2026. A decade into the wave, the sector has produced ~50 clinical-stage assets and zero NDAs. The best-funded US platform company, Recursion, has roughly 5 wholly-originated clinical assets despite raising north of $2 billion. Its phase-2 readout on REC-994 missed its primary endpoint in Q3 2024.

By comparison, a single global AI platform (Insilico Medicine) has filed 13 INDs, nominated 30+ PCCs, runs 9 clinical programmes (6 Phase 1, 3 Phase 2), and has logged 0 clinical failures on roughly one-third the capital. It generated $85.8M in 2024 revenue and $56.2M in H1 2025. Industry PCC-per-year records are being set in Suzhou, not Salt Lake City.

$27–30B
Cumulative VC + public equity 2015–2025
$80B+
Committed pharma biobucks
~75
AI-derived molecules in clinic globally
0
FDA approvals of AI-origin NME
~43k
AlphaFold 2 citations (Nov 2025)
3
2024 Nobel laureates in chemistry
–80%
Median AI biotech IPO drawdown from peak
200+
US AI drug-discovery companies

The thesis of this report. Generating a hit is easy. Generating a quality Development Candidate package — the GLP tox, CMC, ADME/PK, formulation, stability, and safety pharmacology bundle required for an IND — is hard, slow, and expensive. Most US AI biotechs optimised for the part of the funnel AI accelerates (hit generation, SAR, selectivity) while underinvesting in the part AI barely touches (wet-lab integration, CRO orchestration, process chemistry). The companies that treated AI as software-to-sell rather than a pipeline-to-build have produced few drugs. The ones that integrated AI into an end-to-end wet-lab operation have produced many. That second pattern, so far, is almost entirely a Chinese phenomenon.

02 · The Leaderboard

Thirty-two companies. Three tiers. Ranked by output, not hype.

Public tickers, founding years, HQs, cumulative funding, disclosed revenue, clinical-stage asset counts, pipeline metrics (DC / IND / Phase 1-2-3), novelty scores, and landmark deals. Type column distinguishes platform-only vs integrated pipeline plays – the central fault line of the US sector. Status pill encodes current trajectory. Table is sortable; use the search box to filter.

Ranked #1 by pipeline output per dollar: Insilico Medicine. 30+ PCCs, 13 INDs, 6 Phase 1, 3 Phase 2, 0 failures on ~$700M raised – 4.3 PCCs per $100M deployed, the industry record.

# Company Status Type Founded HQ Funding (USD) FY24 Rev DCs INDs Ph1 Ph2 Ph3 Novelty Market Cap Landmark Deal
1
Pharma.AI end-to-end
🟢 Active & Growing Integrated 2014 Global (NYC · Boston · Abu Dhabi · Shanghai · Suzhou · HK · Montreal · Taipei) ~$700M + HK$2.277B IPO (~$292M) $85.8M 30+ 13 6 3 0 ★★★★★ ~$5B (HK$36B) Lilly $2.75B · Sanofi $1.2B · Servier $888M · Menarini $550M+
2
Phenomics + Centaur (post-merger)
🔴 Declining Integrated 2013 Salt Lake City, UT ~$2.2B+ (post-merger) $58.8M ~8 ~10 ~6 ~4 0 ★★★ ~$2B Roche $150M upfront / up to $12B (2021); Exscientia $688M acq. (2024)
3
Motion-based / Dynamo
🟢 Active & Growing Integrated 2016 Cambridge, MA ~$980M ~$10M ~5 4 2 1 0 ★★★★ ~$500M Genentech SHP2 $75M upfront / $695M (2020)
4
FEP+ / physics×ML
🟡 Active, Stable Hybrid 1990 New York, NY ~$380M $204M ~3 3 3 0 0 ★★★★ ~$1.75B Novartis up to $2.3B, $150M upfront (2023)
5
Clinical genomics + AI
🟢 Active & Growing Genomics 2015 Chicago, IL ~$1.7B (pre-IPO $1.3B) $693M 0 0 0 0 0 ★★★ ~$9B AstraZeneca $200M multimodal (2024)
6
Microfluidics + ML
🟡 Active, Stable Biologics 2012 Vancouver, BC ~$700M $27M 2 1 1 0 0 ★★★ ~$850M Lilly bamlanivimab (>$800M royalties)
7
AbsciABSI
Zero-shot antibody design
🟡 Active, Stable Biologics 2011 Vancouver, WA ~$600M $1.9M ~1 1 1 0 0 ★★★ ~$450M AstraZeneca $247M biobucks (2023)
8
Knowledge graph
🔴 Declining Integrated 2013 London, UK ~$292M ~$8M ~2 2 1 1 0 ★★★ ~$150M AstraZeneca CKD/IPF up to $800M (2019)
Exscientiaacq. 2024
Centaur / Manifold → RXRX $688M
⚫ Acquired → Recursion Acquired 2012 Oxford, UK ~$860M ~$15M (partial) 3 ~5 3 0 0 ★★★ $688M exit Sanofi $100M upfront / $5.2B (2022)
9
RFdiffusion + ESM
🟢 Active & Growing Integrated 2023 SF Bay Area $1B (launch) n/a 0 0 0 0 0 ★★★★ ~$2–3B (private) Self-funded; no pharma deal disclosed
10
AlphaFold 3
🟢 Active & Growing Hybrid 2021 London (Alphabet) ~$1B (Alphabet + $600M Thrive) n/a 0 0 0 0 0 ★★★★ Private Lilly $1.7B + Novartis $1.2B (Jan 2024)
Chroma generative → Novartis
⚫ Acquired → Novartis ~$1B Biologics 2018 Somerville, MA ~$670M n/a 2 2 2 0 0 ★★★ ~$1B+ (Novartis acq) Amgen 5 targets / $1.9B (2022)
15
ML + iPSC functional genomics
🟡 Active, Stable Hybrid 2018 South SF, CA ~$743M n/a 0 0 0 0 0 ★★★ ~$2.5B (2021 peak) BMS ALS $50M / $2B (2020)
11
Virtual pharma + Schrödinger
🟡 Active, Stable Integrated 2009 Boston, MA ~$710M n/a ~4 4 2 1 1 (via Takeda) ★★★★ $6.1B TYK2 exit BMS TYK2 $4B upfront / $6.1B (2022)
16
Opal platform
🔴 Declining Hybrid 2019 Boston, MA ~$750M n/a ~3 3 1 2 0 ★★ ~$2.8B (2021 peak) Novo Nordisk $60M / $4.6B cardiometabolic (2024)
14
GEMS / spatiotemporal GNN
🟡 Active, Stable Integrated 2019 Burlingame, CA ~$280M n/a ~1 0 0 0 0 ★★★ Private Eli Lilly $670M/program (2024)
AtomNet CNN → Sanofi
⚫ Acquired → Sanofi Platform 2012 San Francisco, CA ~$174M n/a 0 0 0 0 0 ★★ ~$100M+ exit Sanofi 5 targets / $1B+ (2022); acq. 2024
20
CONVERGE human-tissue ML
🔴 Declining Integrated 2015 South SF, CA ~$150M n/a ~1 1 1 0 0 ★★★ Private Eli Lilly $25M / $694M 4 targets (2021)
13
NeuralPLexer / OrbNet
🟡 Active, Stable Integrated 2020 San Diego, CA ~$220M n/a 1 1 1 (IAM1363) 0 0 ★★★ Private NVIDIA-backed; Lilly indirect (2024)
22
tNova microwell chemistry
🔵 Pre-Revenue/Early Platform 2018 Monrovia, CA ~$120M n/a 0 0 0 0 0 ★★ Private BMS platform deal (2024)
28
DEL + ML
🔵 Pre-Revenue/Early Platform 2020 Cambridge, MA ~$46M n/a 0 0 0 0 0 ★★ Private Undisclosed pharma collabs
24
Allen Institute spinout
🔵 Pre-Revenue/Early Integrated 2021 Seattle, WA ~$115M n/a 0 0 0 0 0 ★★ Private
25
Milliner closed-loop Ab
🟡 Active, Stable Biologics 2019 San Mateo, CA ~$95M n/a 0 0 0 0 0 ★★ Private Amgen antibody deal (2023)
27
AI + cryo-EM
🔵 Pre-Revenue/Early Biologics 2020 Burnaby, BC ~$60M n/a 0 0 0 0 0 ★★ Private Undisclosed
18
Cell-state AI (Flagship)
🟡 Active, Stable Hybrid 2017 Cambridge, MA ~$230M n/a 0 0 0 0 0 ★★★ Private Novo Nordisk (undisclosed)
19
RNA therapeutics AI
🔴 Declining Hybrid 2015 Toronto, ON ~$230M n/a 0 0 0 0 0 ★★ Private
29
EVA closed-loop Ab
🟡 Active, Stable Biologics 2012 London, UK ~$46M n/a 0 0 0 0 0 ★★ Private
30
Protein LLMs / OpenCRISPR
🔵 Pre-Revenue/Early Platform 2022 Berkeley, CA ~$44M n/a 0 0 0 0 0 ★★★ Private
26
Protein design SaaS
🟡 Active, Stable Platform 2021 Amsterdam / ZRH ~$97M n/a 0 0 0 0 0 ★★ Private Johnson & Johnson, Novo (SaaS)
31
NVIDIA-backed protein design
🔵 Pre-Revenue/Early Platform 2019 Chicago, IL ~$80M n/a 0 0 0 0 0 ★★ Private
Schrödinger-founded → LLY $3.2B
⚫ Acquired → Lilly $3.2B Acquired 2015 Waltham, MA ~$400M n/a 2 2 1 1 0 ★★★ $3.2B exit Eli Lilly cash acquisition (Aug 2024)
17
PanHunter / PanOmics
🟡 Active, Stable Platform 1993 Hamburg, DE €770M 0 0 0 0 0 ★★ ~€1.6B BMS TPD $200M upfront (2022)

Figures are best-available estimates as of May 2026 based on SEC filings, investor-relations pages, and aggregator databases (Deep Pharma Intelligence, BiopharmaTrend, Crunchbase). Clinical-stage counts include Phase 1 and later; wholly-owned assets only except where noted. Market caps are point-in-time and volatile. Activity status was assessed by an ensemble of LLMs.

03 · Performance Scorecard

The metrics that matter — and how US platforms score.

A decade of AI drug discovery has generated an enormous amount of marketing language about productivity. The numbers below are the ones that actually matter for investors, regulators, and patients: speed to PCC, cost per IND, phase-transition rates, DC package completion, and pipeline output per dollar deployed.

12–15yr
Traditional target → approval
DiMasi/Tufts CSDD 2016; $2.6B capitalised cost per NME.
1.2%
Big-pharma R&D IRR (Deloitte 2022)
Worst in 13 years. Eroom's Law still compounding.
12–18 mo
AI target → PCC (Insilico benchmark)
Vs 3–5 years traditional. Best public benchmark in the sector.
<$500M
Promised AI cost per NME
Vs $2.6B traditional. No AI drug has reached approval yet to test the claim.

Time to Preclinical Candidate, by company

Traditional pharma
48–60 months
Recursion
24–30 months
Exscientia (pre-acq.)
~24 months
Relay Therapeutics
24–28 months
Iambic (IAM1363)
24 months
Insilico Medicine
12–18 months

Company-disclosed figures (2020–2024). Insilico's benchmark (INS018_055, ISM3412) is the publicly verified industry record.

Cost per IND, AI vs traditional

Traditional NME
$80–100M
US AI biotech (avg)
$15–25M
China-integrated AI
$3–5M

External R&D spend from target nomination to IND-enabling package. Traditional figure from Paul et al. Nat Rev Drug Discov 2010 inflated to 2024 dollars. China figure from Insilico HKEX prospectus (∼$2.6M external spend per PCC; $3–5M to full IND).

Phase transition success rates

P1→P2 traditional
52%
P1→P2 AI-derived (n=24)
80–90%
P2→P3 traditional
29%
P2→P3 AI-derived
~40%
P3→approval traditional
58%
P3→approval AI-derived
n/a (0 AI NMEs approved)

BIO Industry Analysis (2011–2020) for traditional; Jayatunga et al. Nature 2024 for AI. Caveat: AI sample size small (n=24) and biased toward well-validated targets. The Phase 1 uplift is real (better potency/ADME design). Phase 2 tests biology, not chemistry, so AI's lift narrows. Phase 3 remains untested.

Pipeline productivity: PCCs nominated per $100M deployed

Insilico Medicine
~4.3 PCCs / $100M
Exscientia (pre-acq.)
~1.1 PCCs / $100M
Relay Therapeutics
~0.7 PCCs / $100M
Recursion
~0.6 PCCs / $100M
BenevolentAI
~0.3 PCCs / $100M

Cumulative PCCs disclosed divided by cumulative capital raised (VC + IPO + follow-on, through 2024). Insilico leads on roughly every productivity axis — the direct result of operating an integrated wet-lab stack inside China's CRO infrastructure.

Head-to-head: the four comparable platforms

Metric
Insilico (Global)
Recursion (US)
Relay (US)
Exscientia (UK)
HQ
Global (8 offices)
Salt Lake City
Cambridge, MA
Oxford (acq.)
Status
🟢 Active & Growing
🔴 Declining
🟢 Active & Growing
⚫ Acquired by RXRX
Founded
2014
2013
2016
2012
Capital raised
~$700M + HK$2.277B IPO (~$292M)
~$1.5B+
~$980M
~$860M
Market cap (2026)
~$5B (HK$36B)
~$2B
~$500M
$688M exit
FY24 revenue
$85.8M
$58.8M
~$10M
~$15M (partial)
H1 2025 revenue
$56.2M
n/d
n/d
n/a
Employees
~350
~1,200 (post-merger)
~350
~400 (pre-acq.)
0-to-DC programs completed
30+
~8
~5
~3
PCCs disclosed
30+
~8
~5
~3
Pipeline programs
40+
~20
~8
~10
Clinical-stage assets
9 (6 Ph1 / 3 Ph2)
10 (post-Exscientia)
4
3
IND filings cumulative
13
~10
4
~5
Clinical failures
0
1 (REC-994 missed primary)
1 (EXS21546 disc.)
2 (DSP-1181, EXS21546)
Target → PCC
12–18 mo
24–30 mo
24–28 mo
~24 mo
External $ / PCC
~$2.6M
~$15–30M est.
~$20–35M est.
~$15–25M est.
Cost per IND
~$3–5M (China)
~$15–25M (US)
~$20M (US)
~$15M (UK)
Primary CRO geography
China + HK + APAC + US
US + limited APAC
US + EU
UK + US
Wet-lab owned
Life Star 1 robotic lab (Suzhou)
Imaging only (BioHive-2)
Modest
Modest (Oxford)
Novelty score
★★★★★
★★★
★★★★
★★★
First AI drug in humans
ISM001-055 Feb 2022
REC-994 (2021)
RLY-1971 (2019)
DSP-1181 Jan 2020 (disc.)

The productivity delta is not about talent or technology. Insilico's Pharma.AI stack (PandaOmics + Chemistry42 + inClinico) is comparable in capability to Recursion OS, Relay's Dynamo, and Exscientia's Centaur. The difference is that Insilico pairs its AI with a Chinese CRO+CDMO stack running 6–7 day weeks at one-fifth US loaded cost and a wholly-owned robotic wet-lab (Life Star 1, Suzhou). The US platforms that never vertically integrated paid for that choice in PCCs-per-dollar and INDs-per-year.

Insilico’s China Strategy – Competing Where Efficiency Wins

Insilico deliberately expanded into China to compete with efficient local companies. By establishing Life Star 1 (a robotic wet-lab in Suzhou), leveraging China’s CDE regulatory fast-track paths, and accessing provincial biotech cluster incentives (Zhangjiang, Suzhou BioBAY), Insilico achieved a cost structure of ~$3–5M per IND vs $10–20M in the US. This is not cost-cutting — it is strategic positioning at the intersection of AI capability and operational efficiency.

The 0-to-DC Data Flywheel

The most important data in drug discovery is generated between target identification (0) and development candidate (DC). This is where the real science happens: potency optimization, selectivity profiling, ADME/PK, in vivo efficacy, safety pharmacology. With 30+ completed 0-to-DC trajectories, Insilico has built the largest proprietary dataset of full drug discovery campaigns in the AI industry. Each completed program trains the next generation of models. This compounding data advantage — the 0-to-DC flywheel — makes each subsequent program faster and cheaper. No other AI drug company has this volume of end-to-end experimental data.

Democratizing AI Drug Discovery – Training the Next Generation

In a strategic shift, Insilico has begun providing training and benchmarking services to foundation model companies and LLM vendors — including Liquid Networks and others — to help them build drug-discovery-specific capabilities. The model: Insilico provides reinforcement learning signals, curated molecular datasets, and real-world experimental validation. Foundation model companies provide compute and architectural innovation. Insilico then tests the resulting models experimentally in its wet labs, closing the loop between in silico prediction and in vitro/in vivo reality. This positions Insilico as both a drug company and the industry’s benchmarking standard — the proving ground where AI models graduate from molecular generation to actual drug candidates.

04 · The Development Candidate Problem

AI wins the hit. The DC package wins the IND.

A Development Candidate is not a molecule. It is a dossier of roughly 25–40 preclinical studies that together justify the first human dose. The sequence below is what the FDA expects in an IND under 21 CFR 312.23. Each step is wet-lab work. AI accelerates step 1 dramatically. The rest of the pipeline has barely changed since 1995.

01
Potency & Selectivity
3–4 mo
$0.3–0.8M
02
ADME / PK in vitro
3–4 mo
$0.5–1.2M
03
In vivo PK (2 species)
4–6 mo
$0.8–2M
04
Efficacy models
4–8 mo
$1–3M
05
Safety pharmacology
3–5 mo
$0.8–2M
06
GLP tox (28-day, 2 sp.)
6–9 mo
$3–10M
07
CMC & GMP supply
9–15 mo
$5–20M
08
Genotoxicity + IND write
3–5 mo
$0.5–1.5M

Why the DC package is the real bottleneck

A typical US integrated AI biotech takes 18–30 months and $20–50 million to go from preclinical candidate to IND submission. The bulk of that time is GLP toxicology (which runs in calendar time regardless of compute) and CMC scale-up (kilogram API synthesis, stability studies, formulation, release testing).

AI compresses target identification and hit-to-lead by 60–80%. It compresses GLP tox by roughly 0%. The wet-lab bottleneck is therefore a larger fraction of total timeline for AI companies than for traditional pharma — a counter-intuitive result that explains why US AI biotechs report similar target-to-IND times (24–36 months) to the better traditional pharma programmes despite much faster hit generation.

Why Chinese CROs shift the equation

Loaded med-chem FTE cost in China is $80–120k/year vs $250–400k in the US. GLP toxicology at NMPA-accredited CROs runs $1–3M vs $5–10M at Charles River or Covance. CMC turn-around on kilogram API is weeks in Suzhou, months in New Jersey. Chinese CROs run 6–7 day weeks and rotating 24-hour shifts on CMC programmes.

The compound effect: target→IND in 24 months for ~$5M in China vs 36 months for $20M+ in the US. For a platform running 30 programmes, that is the difference between 10 INDs a year and 3. It is also the difference between Insilico's output and Recursion's.

The 21 CFR 312.23 IND checklist

Required modules

  • Form FDA 1571 — Cover sheet
  • Table of Contents
  • Introductory Statement & General Investigational Plan — 1-year forward view
  • Investigator's Brochure — integrated preclinical + clinical summary
  • Clinical Protocol — FIH Phase 1 design, stopping rules, dose-esc
  • CMC — identity, potency, purity, stability, manufacturing, release specs
  • Pharmacology & Toxicology — PK/ADME ≥2 species, GLP tox 14d + 28d rodent + non-rodent, safety pharm, genotoxicity battery
  • Previous Human Experience — if any

Typical wall-clock post-PCC

GLP tox studies
12–18 months, $3–10M
CMC / GMP supply
12–24 months, $5–20M
Full IND package
18–30 months, $20–50M
Pre-IND FDA meeting
+3 months (optional, common)
IND submission
+30 day FDA review clock
First-in-human
Typical 24–36 months post-PCC in US; 18–24 in China

Figures consolidated from Paul et al. Nat Rev Drug Discov 2010 inflated to 2024 dollars, DiMasi 2016, and disclosed programme budgets from Recursion, Relay, Schrödinger, Insilico.

The strategic implication for US AI biotechs. Pure AI speed is not a moat if you run the wet-lab portion of your programme through the same US CRO network the incumbents use. The bottleneck rewards vertical integration (Recursion's BioHive-2, Insilico's Life Star), geographic arbitrage (Insilico HK/Suzhou), or exit to big pharma before the DC bill lands (Nimbus/TYK2 is the canonical example). Platform-only business models — Atomwise, Schrödinger, Isomorphic — effectively subcontract the expensive phase to partners while booking smaller upfronts and longer-dated biobucks.

05 · Historical Timeline

Fourteen years, four waves, one thesis under pressure.

The US AI drug-discovery story has four distinct eras: the CNN pioneers (2012–2016), the generative + phenomics wave (2016–2020), the AlphaFold / IPO frenzy (2020–2022), and the LLM era with its consolidation correction (2022–2026). The timeline below maps every landmark event by dollar, deal, or data point.

2012
Atomwise founded first mover
Abraham Heifets and Izhar Wallach launch Atomwise in San Francisco. AtomNet — the first CNN-based structure-based virtual screening system — becomes one of the earliest dedicated AI drug discovery platforms. Y Combinator W15. Eventually raises ~$174M across rounds, rebrands as Numerion Labs in 2026.
2012
Exscientia founded (UK, US-active)
Andrew Hopkins spins Exscientia out of the University of Dundee, pairing AI-driven design with DMTA cycles. Will become the first AI biotech to put a designed molecule into human trials (DSP-1181, Jan 2020, partnered with Sumitomo).
2013
Recursion founded
Chris Gibson, Blake Borgeson, and Dean Li found Recursion in Salt Lake City around a phenomics model: brute-force cellular imaging across millions of conditions, ML-derived morphological embeddings. Spun out of the University of Utah. Over the next decade raises ~$1.5B, builds BioHive-1 and BioHive-2 GPU clusters, and becomes the US sector's largest platform by capital and pipeline breadth.
2013
BenevolentAI founded (UK, US deals)
Ken Mulvany launches Stratified Medical (rebranded BenevolentAI 2014) with a text-mining + knowledge graph thesis. Will reach a peak private valuation of ~$2B before SPAC-listing in 2022 and collapsing >95%.
2014
Insilico Medicine founded end-to-end
Alex Zhavoronkov founds Insilico Medicine in Baltimore, pivoting a longevity-research thesis into a generative AI pharmacology company. Seminal 2016–2018 GAN/RL molecular-generation work is published with Alán Aspuru-Guzik at Harvard. By 2019 Insilico establishes a Hong Kong R&D hub and starts the China CRO integration that will later define its productivity edge.
2015
AtomNet preprint published
Atomwise posts "AtomNet: A Deep CNN for Bioactivity Prediction in Structure-based Drug Discovery" to arXiv (1510.02855). One of the first applications of deep learning to docking. Cited thousands of times; foundational to the CNN era.
2016
Relay Therapeutics founded
Third Rock spins Relay out of D.E. Shaw Research, licensing the Anton supercomputer heritage for "motion-based drug design". Raises ~$520M pre-IPO from Third Rock, SoftBank, GV, D.E. Shaw. Lead programme RLY-4008 (FGFR2, cholangiocarcinoma) will receive Breakthrough Therapy designation in 2023.
2017
BenevolentAI $115M round
BenevolentAI reaches a private valuation near $2B on a $115M raise, cementing early-era hype around text-mining/knowledge-graph approaches. Four years later it will have laid off half its staff.
2018
AlphaFold 1 wins CASP13 inflection
DeepMind's first competition entry tops overall CASP13 rankings in December. Nature paper in 2020. The signal: deep learning can do structural biology. Every pharma computational chemistry team rewrites its roadmap.
2018
Insitro founded
Daphne Koller (Coursera, ex-Calico) launches Insitro with a ML-plus-functional-genomics thesis and $100M Series A from ARCH and a16z. Will later raise a total of ~$743M at a peak $2.5B valuation without producing an IND.
2018
Generate Biomedicines founded
Flagship Pioneering spins out Generate with a generative-protein thesis. Will later adopt a "Chroma" model analogue of David Baker's RFdiffusion, publish in Nature (2023), and ink a $1.9B Amgen deal in 2022.
2019
BenevolentAI / AstraZeneca $800M+ deal
AstraZeneca signs BenevolentAI for chronic kidney disease and heart failure targets. Structured as up to $800M+ in biobucks. Five years later, the collaboration is widely cited as the canonical example of knowledge-graph repurposing over-promising and under-delivering.
2020
AlphaFold 2 solves protein folding seismic
At CASP14 (Nov 2020), AlphaFold 2 posts median GDT-TS ~92 (experimental quality) on ~2/3 of targets. Nature paper July 2021. Open-sourced with a 200M-structure database via EMBL-EBI (July 2022). ~43,000 citations by Nov 2025. Hassabis, Jumper, Baker win the 2024 Nobel Prize in Chemistry. Single most cited deep-learning paper of the decade.
2020
BenevolentAI / baricitinib COVID story
Feb 2020: BenevolentAI's knowledge graph suggests baricitinib (Olumiant, JAK1/2) for COVID-19. Lancet publication (Richardson et al.) leads to NIH ACTT-2, FDA EUA Nov 2020, full approval May 2022. The first "AI-suggested" COVID therapeutic — though strictly repurposing, not de novo design.
2020
IPO wave begins
Schrödinger (Feb 2020, $202M, SDGR). Relay Therapeutics (Jul 2020, $400M, RLAY). AbCellera (Dec 2020, $556M, ABCL). COVID-era liquidity meets AlphaFold narrative. Every AI biotech on deck reprices upward.
2021
Peak hype: $5.2B AI drug VC; four more IPOs
Third-party AI drug-discovery investment peaks at $5.2B (BCG). Recursion IPO April ($436M, RXRX, $2.9B val). Absci IPO July ($211M). Exscientia IPO October ($350M). Atomwise Series C ($123M). Insitro Series C ($400M). Exscientia's DSP-1181 becomes the first AI-designed molecule in a human clinical trial (Jan 2020, terminated 2021). Isomorphic Labs founded Nov 2021.
2022
Nimbus / BMS TYK2 $6.1B record exit
Feb 2023 close (announced Dec 2022): BMS acquires Nimbus Lakshmi (TYK2 allosteric inhibitor NDI-034858) for $4B cash plus $2B in milestones. $6.1B total. Largest AI-adjacent deal ever, built on Schrödinger's physics-based platform inside Nimbus's virtual-pharma model. Sanofi-Insilico (Nov 2022) $21.5M upfront / $1.2B biobucks across 6 targets. Broader biotech sector collapses; XBI down ~50% peak-to-trough.
2023
Trough of disillusionment
AI drug VC falls to ~$2.2B. Exscientia's EXS21546 (A2A/A2B immuno-oncology) discontinued. BMS-Exscientia collaboration partially cancelled. BenevolentAI lays off 180 (~50%). Recursion REC-994 Phase 2 shows modest effects. Recursion acquires Cyclica and Valence Labs (May 2023, ~$47M + ~$40M all-stock) as bolt-ons. NVIDIA invests $50M in Recursion, launches BioNeMo.
2024
Consolidation year; biggest biobucks in sector history
Isomorphic Labs announces same-day $82.5M combined upfronts from Lilly ($45M) and Novartis ($37.5M) for up to $3B biobucks combined (Jan). Xaira Therapeutics launches with $1B seed (ARCH + Foresite, April) — largest biotech launch round in history. Tempus AI IPO (Jun, $410M, $6.1B val). Morphic Therapeutic acquired by Lilly ($3.2B cash, Aug). Recursion acquires Exscientia ($688M all-stock, announced Aug, closed Nov). Novartis acquires Generate Biomedicines (~$1B, late 2024). Insilico-Lilly deal announced ($115M upfront, $2.75B biobucks). Sanofi reportedly acquires Atomwise (~$100M+, 2024).
2024–25
AlphaFold 3 published; first FDA AI-drug guidance
AlphaFold 3 (Nature, May 2024): 50%+ improvement on protein-ligand-nucleic-acid complex prediction over prior methods. Hassabis and Jumper share half of the 2024 Nobel Prize in Chemistry; Baker receives the other half for computational protein design. FDA releases draft guidance "Considerations for the Use of AI to Support Regulatory Decision-Making for Drug and Biological Products" (Jan 2025), defining the Context-of-Use credibility framework.
2025
Insilico HKEX IPO; Recursion–Roche extension
Insilico Medicine lists on HKEX in late 2025 (ticker 3696, ~$293M raised). Becomes first AI drug-discovery company to IPO in Hong Kong. Recursion extends Roche/Genentech partnership with a fresh $150M upfront tranche. FDA formalises the AI Council in CDER/CBER.
2026
R&D productivity records in China; BIOSECURE takes effect
Insilico-Lilly $2.75B biobucks deal formally announced March 2026 ($115M upfront). Insilico–Servier $888M (Jan 2026). PCC production records set by AI platforms operating in China. BIOSECURE Act implementation begins, restricting federal contract work with Chinese CROs (WuXi AppTec, BGI). US and China AI drug-discovery sectors formally bifurcate on regulatory terms.
06 · Deal Tracker

$80B+ in biobucks committed. One exit above $5B.

Every landmark AI drug-discovery deal involving a US sponsor, sorted by total potential value. "Biobucks" = upfront + research milestones + development milestones + sales milestones + royalties. Actual cash paid is usually 5–15% of headline value over the life of the contract.

Mega-deals > $2B

Nimbus Lakshmi → BMS largest
TYK2 allosteric inhibitor NDI-034858 (now TAK-279) for plaque psoriasis. $4B cash upfront, $2B in milestones.
$6.1B
Dec 2022
Insilico → Eli Lilly
Oral small-molecule therapeutic, AI-discovered via Pharma.AI. $115M upfront + milestones + royalties.
$2.75B
2024/26
Isomorphic Labs → Eli Lilly
Multiple small-molecule targets via AlphaFold-based design. $45M upfront.
$1.7B
Jan 2024
Isomorphic Labs → Novartis
3 undisclosed small-molecule targets. $37.5M upfront. Same-day signing as Lilly.
$1.2B
Jan 2024
Insilico → Sanofi
6 AI-discovered targets. $21.5M upfront + equity. Template for subsequent Insilico biobucks deals.
$1.2B
Nov 2022
Exscientia → Sanofi
15 oncology + immunology targets. $100M upfront. Largest AI-biotech biobucks headline as of 2022. Partially cancelled in 2023.
$5.2B
Jan 2022
Valo Health → Novo Nordisk
11 cardiometabolic programmes. $60M upfront. Largest single Valo deal after failed SPAC.
$4.6B
Mar 2024

Strategic deals $500M–$2B

Recursion → Roche/Genentech
40 programmes, neuroscience + GI oncology. $150M upfront. Extended 2025 with fresh $150M tranche.
$12B biobucks
Dec 2021
Insitro → BMS
ALS and FTD targets. $50M upfront.
$2.0B
2020
Generate Biomedicines → Amgen
5 targets, generative protein design. ~$50M upfront reported.
$1.9B
Jan 2022
Exscientia → BMS
Oncology preclinical work. $50M upfront. Partially unwound 2023.
$1.2B
2021
Recursion → Bayer
Fibrosis focus, extended 2023. ~$50M upfront reported.
$1.0B
2020
Generate Biomedicines → Novartis
Immunology targets. Rumoured Novartis acquisition late 2024 at ~$1B.
$1.0B+
Oct 2023
Insilico → Servier
USP1 inhibitor programme. $32M upfront.
$888M
Jan 2026
BenevolentAI → AstraZeneca
Chronic kidney disease, heart failure, fibrosis. Historical benchmark — over-promised and under-delivered.
$800M+
2019
Relay → Genentech
SHP2 programme (GDC-1971 / migoprotafib). $75M upfront.
$695M
Dec 2020
Verge Genomics → Eli Lilly
4 neuro targets. $25M upfront + $50M equity.
$694M
Feb 2021
Genesis Therapeutics → Eli Lilly
Multi-target neuroscience. ~$20M upfront.
$670M/pgm
Feb 2024
Almirall → Absci
2 AI-designed antibodies, dermatology. $5.3M upfront.
$650M
Jan 2024

Annual AI drug-discovery deal volume (US sponsors, total biobucks)

2020
$3B
2021
$5B
2022
$10B (Nimbus $6.1B)
2023
$6B
2024
$15B+ (Isomorphic, Insilico/Lilly, Valo/Novo, Morphic acq.)
2025 YTD
$5B+
07 · The China Productivity Gap

Why the records are being set 12,000km away.

The US raised roughly 3× the capital deployed by China-based AI drug-discovery companies. It still runs an IND productivity ratio of roughly 1:3 per dollar of AI funding vs the Chinese ecosystem. Four structural reasons explain the gap, and none of them are about AI.

1. End-to-end vs platform-only

Most US AI biotechs sell a platform: Schrödinger licences software, Isomorphic ships design services, Atomwise runs virtual screens. The companies that tried to build both platform and pipeline (Recursion, Relay, Exscientia) subcontracted wet lab to US CROs running on US cost and US calendars.

Insilico in contrast runs Pharma.AI (target ID + generative chemistry + clinical prediction) and a wholly-owned robotic wet lab (Life Star 1, Suzhou) and partners with WuXi / Pharmaron for GLP tox and CMC. End-to-end beats platform-only on throughput.

2. Cost structure

Loaded FTE cost: US med-chem $250–400k; China $80–120k. GLP toxicology in China $1–3M per programme; US $5–10M. CMC kilogram API in China: 4–8 weeks; US: 3–6 months. In vivo PK in two species: $500k in China, $1.5–2M in the US.

Compounded across a 25-study DC package, a Chinese programme costs roughly one-fifth of the US equivalent and completes in roughly half the calendar time.

3. Regulatory rhythm

NMPA reduced CTA review to ~60 days (from 150+) in 2017 reforms. FDA IND defaults to 30 days but requires a much thicker package. Australian TGA CTN allows first-in-human dosing with minimal review, which is why Insilico's INS018_055 first-in-human (Feb 2022) was conducted in Australia before cross-filing.

China/Australia/HK bundling creates a 6–12 month acceleration over US-only strategy for early phase.

4. Wet-lab integration

China's CRO ecosystem runs 6–7 day weeks, rotating 24-hour shifts, and physical proximity to AI compute (WuXi has formal ML partnerships). US CRO industry is union-adjacent and geographically dispersed. AI efficiency compounds on top of wet-lab speed; when wet lab is slow, the compound rate is slow.

An integrated AI+CRO stack produces PCC-per-week outputs that would take a US platform-only model a quarter to replicate.

A full, independent audit of the Chinese AI drug-discovery ecosystem — including Insilico, XtalPi, Galixir, StoneWise, Neomer and the broader Four Dragons landscape — is published at aipharmachina.com. The Chinese portal documents roughly 80% of the global AI-discovered PCC output coming from one platform operating on one-third the capital of US peers.

Read the China Report → aipharmachina.com

AI Drug Discovery in India – An Emerging Opportunity

India is 3–5 years behind China in AI drug discovery — not because of talent, but because of infrastructure. With the world’s largest software talent pool (5M+ developers), global leadership in generic pharmaceuticals (60% of the world’s vaccines), and 1.4 billion genetically diverse citizens, India has the raw ingredients. Generative AI is the inflection point: it shifts discovery from wet labs to dry labs, playing directly to India’s strengths. Companies like Jubilant Therapeutics, Verseon, and Elucidata are already in clinical stages. The full analysis is at aipharmaindia.com.

Read the India Report → aipharmaindia.com
08 · Big Pharma AI Centres

Every major US pharma now runs an AI centre. Few produce AI drugs.

Big pharma's AI spend is concentrated in three buckets: genetic target ID, molecular design, and clinical-trial operations. The real budget flows through deal biobucks to the AI-native partners rather than pure in-house build. Below: the eight most active US pharma AI programmes.

Eli Lilly most aggressive

The single largest buyer of external AI drug-discovery work in 2022–2026. Deals with Isomorphic ($1.7B), Insilico ($2.75B), Genesis ($670M/pgm), Iambic (2024), Verge ($694M), plus the Morphic cash acquisition ($3.2B, Aug 2024). Internal LillyTOI discovery engine, and OpenAI antimicrobial partnership 2024.

Bristol-Myers Squibb

Spent $6.1B on Nimbus TYK2 (2022) — the single largest AI-adjacent deal in history. Schrödinger computational-chemistry collaboration. Evotec TPD deal ($200M upfront). Insitro ALS deal ($2B biobucks). Sustained multi-year commitment; heavy user of physics-based design.

Pfizer

AI-driven target identification via partnerships with Tempus, Schrödinger, XtalPi. Internal PfizerWorks computational platform. Acquired Trillium Therapeutics ($2.3B, 2021) for CD47 programme. Relatively less noisy than BMS or Lilly on external AI biotech deals but heavier on internal ML-ops.

Merck & Co.

Partnerships with Atomwise (2020, 3 targets up to $610M), Absci (2022), and AiCure on clinical-trial ops. Internal Modeller platform; invested in Variational AI, Cyrus Biotech, others. Historically conservative on AI-biotech biobucks but active acquirer (Acceleron, Peloton Therapeutics).

Amgen

Acquired deCODE Genetics ($415M 2012; valued >$14B in talent+data leverage) as the industry's largest genetic target-ID asset. Generate Biomedicines partnership ($1.9B, 2022). BigHat Biosciences partnership (2023). Internal ML group staffed with ex-Google Brain and Genentech talent.

Regeneron

Operates the Regeneron Genetics Center with exome sequencing of 500,000+ participants (the largest private sequencing operation outside 23andMe/UK Biobank). Multiple partnerships with AbCellera and other antibody-discovery AI platforms. Internal ML focused on exome-linked drug ID.

AstraZeneca

BenevolentAI deal ($800M+, 2019, CKD/IPF) — reference case for disappointing biobucks. Absci oncology deal ($247M biobucks, 2023). Tempus multimodal foundation-model deal ($200M, 2024). Internal AI Research unit based in Cambridge UK with heavy NLP/imaging work.

Sanofi

Exscientia $5.2B biobucks (2022, 15 targets). Insilico $1.2B biobucks (2022). Atomwise $1B+ biobucks (2022). Reported acquisition of Atomwise (~$100M+, 2024). Recursion phenomics ($20M upfront, 2024). Most diversified portfolio of external AI partners of any big pharma.

Novartis

Isomorphic Labs ($1.2B, Jan 2024). Schrödinger ($2.3B, 2023). Generate Biomedicines acquisition (~$1B, late 2024). Ongoing Microsoft Research partnership. Data42 internal data platform. Heavy bet on AI + protein design.

09 · VC & Investment Landscape

Eight-year cumulative: $27–30B of US AI drug-discovery venture.

Annual AI drug-discovery venture funding peaked at $5.2B in 2021, bottomed at $2.2B in 2023, and recovered to $5B+ in 2024 on the back of a single round (Xaira, $1B). The generative-AI / LLM cycle reset expectations upward but corporate VC (Lilly Ventures, Novo Holdings, Eli Lilly Asia Ventures) has replaced exotic growth capital (SoftBank, Tiger Global) as the most consistent backer.

Annual US AI drug-discovery VC volume (BCG + supplemental estimates)

2015
$0.45B
2018
$0.9B
2019
$1.1B
2020
$2.4B
2021 peak
$5.2B
2022
$3.5B
2023 trough
$2.2B
2024 rebound
$5.0B
2025 YTD
$4.0B+ est.

Third-party investment — excludes pharma biobucks. Sources: BCG 2022 Oct analysis (Jayatunga et al.); Deep Pharma Intelligence; BiopharmaTrend; company disclosures.

Tier 1 VCs

  • ARCH Venture Partners — Xaira $1B lead, Insitro, Generate, Recursion
  • Flagship Pioneering — Generate Biomedicines, Cellarity, Valo Health (all in-house)
  • Foresite Labs — Xaira co-lead
  • Third Rock Ventures — Relay Therapeutics founding
  • a16z Bio — Insitro, BigHat, Genesis, Profluent

Corporate / strategic

  • GV (Google Ventures) — Recursion, Relay, Isomorphic
  • NVIDIA — Recursion ($50M 2023), Iambic, Terray, Evozyne, Generate
  • Lilly Asia Ventures — Insilico (through Series C), others
  • Novo Holdings — Cellarity, Valo
  • SoftBank Vision Fund — Relay, Exscientia, Insitro (2021 cycle)

Specialist / data-pure

  • Lux Capital — Recursion, Xaira
  • Data Collective (DCVC) — Recursion, AbCellera
  • Sequoia — Xaira, Insilico
  • Baillie Gifford — Recursion, Tempus
  • OrbiMed — Insilico, others

NIH / ARPA-H public funding

$130M
NIH Bridge2AI (4 years, 2022–2026)
$20M/yr
NCATS ASPIRE
$50M/yr
NCI AI cancer grants
$1B+
ARPA-H (2022 launch; AI incl.)

Cumulative NIH AI-drug-discovery-adjacent funding is approximately $500M/year (2024) across grants, contracts, and intramural IT infrastructure. Dwarfed by private capital but materially larger than Chinese equivalents on the basic-science end.

10 · The Acquisition Wave

2023–2025: the consolidation era.

With IPO windows narrow and SPAC routes discredited, the 2023–2025 cycle has been the M&A cycle. Pharma bought the platforms it had been leasing. AI-native biotechs bought one another for scale. The deals below are the largest.

Recursion acquires Exscientia
All-stock, announced Aug 2024, closed Nov 2024. Exscientia shareholders receive 0.7729 RXRX shares per EXAI share. Combines Recursion's phenomics platform with Exscientia's Centaur/Manifold design stack. Post-merger: largest public AI drug company by pipeline breadth, ~1,200 employees.
$688M
Nov 2024
Eli Lilly acquires Morphic Therapeutic
$3.2B cash. MORF-057 (α4β7 integrin oral, IBD) Phase 2b-ready asset is the strategic prize. Schrödinger holds equity — nets ~$77M on the transaction.
$3.2B
Aug 2024
Novartis acquires Generate Biomedicines
Reported ~$1B acquisition late 2024. Consolidates Novartis's bet on generative protein therapeutics after the Oct 2023 immunology collaboration.
~$1B
late 2024
Sanofi acquires Atomwise
Reported $100M+ acquisition during 2024. Historical Atomwise-Sanofi partnership (2022, $20M upfront, $1B+ biobucks) converts to outright ownership. Atomwise rebrands to Numerion Labs.
~$100M+
2024
Recursion acquires Cyclica
All-stock. Polypharmacology / proteome-wide screening platform. Integrated into Recursion OS.
~$40M
May 2023
Recursion acquires Valence Labs
All-stock. Generative chemistry. Integrated into Recursion OS post-Valence.
~$47.5M
May 2023
Tempus AI acquires Ambry Genetics
$600M all-cash. Expands hereditary cancer testing; extends Tempus's clinical-genomic dataset.
$600M
Nov 2024
Relay Therapeutics acquires ZebiAI
All-stock. DNA-encoded library + ML capability bolt-on. Closed 2021 before Relay stock decline.
undisclosed
2021
Amgen acquires deCODE Genetics (historical)
$415M cash 2012. Valued today for the genetic target-ID leverage at roughly $14.3B in effective R&D NPV. Still the single most consequential AI-adjacent pharma acquisition in US history.
$415M
2012
11 · Public Market Performance

From $15B peak to $1B trough — and partway back.

The public AI-drug-discovery basket peaked in Q1 2021 as COVID liquidity, AlphaFold narrative, and SPAC money all compounded. Two years later it was down 70–90% across the board. In 2024–2025 Tempus AI's IPO and Recursion's Exscientia deal rerated the highest-quality names upward; BenevolentAI, Absci, and others remain at or near all-time lows.

TickerIPO dateIPO raiseIPO pricePeak pricePeak market cap2025 price range2025 market capPeak drawdown
TEM Tempus AIJun 2024$410M$37~$80 (Aug 2024)~$18B$55–70~$9B–25%
SDGR SchrödingerFeb 2020$232M$17~$110 (Jan 2021)~$8B$20–25~$1.75B–78%
RXRX RecursionApr 2021$436M$18~$45 (Jul 2021)~$10B$5–7~$2B–80%
RLAY RelayJul 2020$400M$20~$48 (Feb 2021)~$6B$3–5~$500M–90%
ABCL AbCelleraDec 2020$556M$20~$58 (Feb 2021)~$15B$2.50–3.50~$850M–94%
ABSI AbsciJul 2021$211M$16~$28 (Oct 2021)~$2.5B$3–5~$450M–85%
EXAI ExscientiaOct 2021$350M$22~$25 (late 2021)~$2.9Bacq. Nov 2024$688M–77% pre-exit
BAI.AS BenevolentAIApr 2022 SPAC€232M€12€13 (May 2022)~€1.5B<€1~€150M–93%

The lesson of the basket. Every US AI biotech that IPO'd in 2020–2021 except Tempus AI drew down more than 75% from peak. Tempus is the only one with >$500M of diversified commercial revenue — and it is fundamentally a clinical-genomics company, not an AI drug-discovery pure-play. Investors in 2025 discount the "AI" tag entirely for biotech pricing and value these companies on pipeline, partnership revenue, and cash burn — the same metrics as any other biotech.

12 · The BIOSECURE Factor

A policy shift that redraws the CRO map.

The BIOSECURE Act passed the House in September 2024 and heads toward Senate implementation in 2026. It bars federal contracts with named Chinese biotech companies — WuXi AppTec, WuXi Biologics, BGI, MGI Tech, Complete Genomics — with an eight-year wind-down to 2032. The consequences ripple through the AI drug-discovery CRO stack on which many US biotechs depend.

Direct CRO impact

WuXi AppTec runs roughly 35% of small-molecule CMC manufacturing for US biotech (Biosecure Act House committee testimony, 2024). Separation costs estimated at $1–2B across sector, adding 12–24 months to affected programmes. Indian CROs (Syngene, Aragen Life Sciences, Piramal Pharma Solutions) and US CROs (Catalent, Lonza US) are the primary beneficiaries.

Tempus / AbCellera / Schrödinger: minimal exposure

Companies with primarily US-based wet-lab operations are insulated. Tempus AI runs its own CLIA labs. AbCellera has GMP manufacturing coming online in Vancouver in 2025. Schrödinger sells software; no CRO dependency.

Insilico: partially insulated

Life Star 1 robotic lab in Suzhou is owned, not rented. INSILICO operates through Hong Kong entities with non-restricted CRO networks. But some US-side federal contracts (DoD biodefence, NIH grants) could be affected if programme is deemed "Chinese-affiliated". Insilico has stated 2025 plans to scale US-side CDMO relationships.

The strategic question for 2026–2030. BIOSECURE accelerates the decoupling of US pharma's wet-lab footprint from China's CRO stack. It does not change the underlying cost or speed differentials — Indian CROs are cheaper than US but still 2–3x more expensive than Chinese. The geopolitical premium paid for "secure" manufacturing is a real cost to US AI-biotech productivity; it may also mean US biotechs increasingly route through Korea (Samsung Biologics), Japan, or India to reclaim some of the cost arbitrage.

13 · Key People

Fifteen builders who shaped the US AI drug-discovery stack.

Founders, chief scientists, and technologists whose decisions moved the sector. Two Nobel laureates (Baker, Hassabis/Jumper shared), multiple Nature covers, and roughly $10B of AI-driven biotech value created between them.

DB
David Baker
Univ. of Washington · IPD
2024 Nobel Chemistry (shared) for computational protein design. Founder of RoseTTAFold and RFdiffusion. Underpins Xaira, Generate, and Isomorphic protein stacks. Founded Arzeda, Neoleukin, others.
DH
Demis Hassabis
Isomorphic Labs · DeepMind
2024 Nobel Chemistry (shared). Founder of DeepMind; CEO of Isomorphic Labs. AlphaFold 1/2/3. Drives the most cited deep-learning paper of the decade and the tightest pharma-licencing narrative in the sector.
JJ
John Jumper
DeepMind
2024 Nobel Chemistry (shared). Senior staff research scientist at DeepMind; lead author on AlphaFold 2. Among the most recruited researchers in the field post-Nobel.
DK
Daphne Koller
Insitro
CEO and founder of Insitro. Former Calico, Coursera, Stanford. Pioneer of probabilistic graphical models; now pairs ML with iPSC functional genomics. $743M raised, 0 INDs to date.
CG
Chris Gibson
Recursion
Co-founder and CEO of Recursion. PhD University of Utah. Built the largest public AI biotech by headcount, cash, and compute (BioHive-2, ~63,000 GPUs).
AZ
Alex Zhavoronkov
Insilico Medicine
Founder and CEO of Insilico Medicine. Wrote the first paper pairing deep learning with drug discovery for aging (2016). Architect of Pharma.AI. Hong Kong IPO'd 2025. 30+ PCCs, 13 INDs — industry record.
RF
Richard Friesner
Schrödinger
Co-founder, Columbia University. Architect of FEP+. Gold-standard free-energy perturbation in physics-based drug design. Foundational figure for the field's non-ML side.
MM
Mark Murcko
Relay · ex-Vertex CSO
Co-founder of Relay Therapeutics; ex-CSO of Vertex. Pioneered structure-based drug design practice at Vertex in the 1990s. Key architect of Dynamo / motion-based design thesis.
EL
Eric Lefkofsky
Tempus AI
Founder and CEO of Tempus AI. Serial entrepreneur (Groupon). Converted Tempus from clinical-genomics lab into largest real-world-data platform in US oncology. Public June 2024.
AH
Andrew Hopkins
Exscientia
Founder of Exscientia, University of Dundee. First AI-designed drug into human trial (DSP-1181). Stepped back after Recursion acquisition (Aug 2024). Advocate for integrated design-make-test-learn.
MT
Marc Tessier-Lavigne
Xaira Therapeutics
CEO of Xaira. Former President of Stanford, ex-CSO of Genentech, Rockefeller. Raised $1B seed (April 2024) — largest biotech launch round in history. Neuroscience + generative design thesis.
CH
Carl Hansen
AbCellera
Founder and CEO of AbCellera. UBC professor. Built the largest antibody-discovery platform globally via single-cell microfluidics; IPO'd Dec 2020 at $15B peak valuation.
BB
Bruce Booth
Atlas Venture · Nimbus
Atlas Venture partner and Nimbus Therapeutics founding investor/chairman. Architect of the Nimbus virtual-pharma + AI model. Engineered the $6.1B TYK2 exit to BMS (2022).
SM
Sean McClain
Absci
Founder and CEO of Absci. Built the first zero-shot AI antibody design platform; ABS-101 (TL1A IBD) is Absci's first AI-originated molecule in clinic (2025).
AH
Abraham Heifets
Atomwise / Numerion
Co-founder of Atomwise (2012). Trained at University of Toronto. AtomNet CNN preprint (2015) is foundational for the CNN era. Rebranded Atomwise as Numerion Labs in 2026.
14 · Academic Foundations

The research labs that seeded the sector.

Ten US and US-adjacent university groups account for the majority of AI drug-discovery founder teams, open-source codebases, and Nature/Science covers. The three most consequential are David Baker's IPD (Washington), Demis Hassabis's DeepMind (UK, but open-sourced AlphaFold is a US-pharma enabler), and the broader MIT CSAIL / Broad Institute cluster.

Institute for Protein Design (Univ. of Washington)

Led by David Baker (2024 Nobel). RoseTTAFold, RFdiffusion, ProteinMPNN. Spawned Xaira co-founders (Hetu Kamisetty), Generate Biomedicines talent, Cyrus Biotech, Arzeda, Neoleukin, A-Alpha Bio, and AÖP Biosciences. Single most prolific drug-discovery founder factory in the sector.

MIT CSAIL / Barzilay Lab / Broad Institute

Regina Barzilay's group published Chemprop (message-passing neural nets for molecular property prediction) and early GCN work. Broad Institute's Connectivity Map (CMap) + imaging datasets underlie phenomics approaches (Recursion, Cellarity). MIT Jameel Clinic partnership with Cambridge focuses on antibiotics (halicin 2020).

Stanford / Pande Lab

Vijay Pande's group produced Genesis Therapeutics founder Evan Feinberg and foundational GNN work on molecular property prediction. Pande now runs a16z Bio — the single most active AI-biotech VC partner. Stanford AIMI (AI in Medicine and Imaging) is a major applied-ML hub.

Harvard / Wyss Institute / Aspuru-Guzik (now Toronto)

Alán Aspuru-Guzik collaborated with Insilico on 2016–2018 GAN/RL molecular generation papers; now leads the Acceleration Consortium at University of Toronto. Wyss Institute spawned major therapeutics via Don Ingber's lab. Harvard Medical School hosts the Laboratory of Systems Pharmacology under Peter Sorger.

Carnegie Mellon University

David Koes's group produced GNINA (docking with neural nets), foundational for structure-based AI screening. CMU Computational Biology department is a major pipeline into Roivant, Schrödinger, and Insitro. Josh Bloomer / Russ Schwartz group on cancer genomics ML.

Columbia / Honig & Friesner

Barry Honig (RosettaFold predecessor methods) and Richard Friesner (Schrödinger founder) anchor the Columbia biophysics tradition. Joachim Frank (2017 Nobel, cryo-EM) built the reconstruction theory that underpins Gandeeva Therapeutics and others.

UCSF / Sali Lab / QBI

Andrej Sali's MODELLER (1993) is the foundational homology-modelling code in the field. UCSF QBI (Nevan Krogan) runs proteomics at scale. Major pipeline into BridgeBio and Genentech. UCSF hosted the 2022 RoseTTAFold Protein Contact Prediction collaboration.

Princeton / Engelhardt Group

Barbara Engelhardt (now Gladstone Institutes) on statistical genomics and Bayesian modelling. Princeton Ludwig Institute ML-in-cancer work. Strong pipeline into Flagship-adjacent companies and Google X.

15 · Regulatory Landscape

The FDA is the adult in the room.

Three discussion papers in 2023, a draft guidance in January 2025, a formal AI Council in CDER. The FDA has moved faster on AI-in-drug-development than most regulators and is the single most important external dependency for every US AI biotech. The agency's posture is risk-based, context-specific, and explicitly non-prescriptive about model architecture.

Timeline of FDA action

  • May 2023 — FDA Discussion Paper: "Using AI & ML in the Development of Drug & Biological Products"
  • May 2023 — FDA Discussion Paper: "AI in Drug Manufacturing"
  • Oct 2023 — FDA holds public workshop on AI in drug development (1,200+ attendees)
  • Jan 2025 — Draft Guidance: "Considerations for the Use of AI to Support Regulatory Decision-Making" — defines the Context-of-Use (COU) credibility framework
  • 2024 — CDER/CBER AI Council formalised under PDUFA VII commitments
  • 2025 — First FDA AI-drug acceptance criteria memo (internal)

Context-of-Use (COU) framework

The FDA's Jan 2025 draft guidance introduces a risk-based credibility framework. A sponsor must specify:

  • Question of interest — what regulatory question the model addresses
  • Context of use — the specific role of the model output in decision-making
  • Model risk — influence of the model on the decision × consequence of a wrong decision
  • Credibility activities — validation, verification, uncertainty quantification commensurate with risk

Higher-risk contexts (e.g., model-informed dose selection in pivotal trials) require substantially more validation than lower-risk contexts (e.g., model-aided lead prioritisation).

What AI drugs still need for approval

The FDA has been clear: an AI-origin drug is held to the same standard as any other drug. No special pathway, no accelerated approval for "AI-discovered" status alone. The DC package (21 CFR 312.23) is unchanged. The pivotal efficacy bar is unchanged. The CMC bar is unchanged.

The only substantive regulatory benefit AI confers today is reduction in preclinical attrition: better ADME/PK, better selectivity, cleaner tox profiles at first-in-human. That may translate into Phase 1 success-rate uplift (as seen in the Jayatunga analysis) but does not change the approval bar.

FDA approval watchlist

Programmes closest to a US approval with significant AI contribution to discovery or design:

  • Relay RLY-4008 (lirafugratinib) — FGFR2-selective inhibitor, Breakthrough Therapy 2023, BLA 2025 expected
  • Recursion REC-994 — CCM, Phase 2/3 transition; missed primary 2024
  • Takeda TAK-279 (ex-Nimbus TYK2) — PsO Phase 3 ongoing; if approved 2026/27 would be first Schrödinger-designed drug to market
  • Insilico INS018_055 (rentosertib) — IPF Phase 2b/3 transition; first dual AI-target + AI-molecule filing
  • Generate GB-0895 — anti-TSLP Phase 1
16 · Failures & Lessons

The graveyard is as instructive as the leaderboard.

Six notable failures — covering technical flops, SPAC blowups, management collapses, and strategic mis-bets. Each carries a specific, repeatable lesson for operators, investors, and regulators.

BenevolentAI: the knowledge-graph downfall

2022 SPAC valuation: ~€1.5B. 2025: ~€150M (–90%). Laid off 180 staff (~50%) in 2023. Lead asset BEN-2293 (atopic dermatitis topical) missed Phase 2 in 2023 — a key blow after the company had publicly positioned text-mining + knowledge graphs as competitive with structure-based methods.

Lesson: Text-mining and knowledge graphs are useful for target repurposing (baricitinib/COVID) but have not yet produced a first-in-class de novo programme that cleared Phase 2. The gap between "the graph suggests X" and "the molecule works in humans" is entirely wet-lab work — which BenevolentAI under-invested in.

Exscientia DSP-1181: the first AI drug, discontinued

DSP-1181 (5-HT1A agonist for OCD, partnered with Sumitomo Dainippon) was the first "AI-designed" molecule in a human trial (Jan 2020). Phase 1 terminated 2021 — did not progress. Exscientia's EXS21546 (A2A/A2B immuno-oncology) followed into discontinuation 2023.

Lesson: AI improves the odds of reaching the clinic. It does not change the biology once there. A "designed" molecule can still have a bad target.

Valo Health: the SPAC that never closed

Valo announced a SPAC merger with Khosla Ventures Acquisition II (KVSA) at a $2.8B valuation in Dec 2021. Terminated May 2022 after market conditions deteriorated. Restructured private 2023; raised Series C of ~$175M; headcount down from ~450 to ~300. Novo Nordisk cardiometabolic deal (Mar 2024, $60M upfront, $4.6B biobucks) stabilised the company.

Lesson: SPAC market-timing cut both ways. Valo survived by cutting fast; many peers did not.

Verge Genomics VRG50635: the ALS programme that faded

VRG50635 (PIKfyve inhibitor for ALS) was a flagship for "human-data-first" neurodegeneration AI. First-in-patient Phase 1 data (Mar 2024) showed safety and target engagement but limited efficacy signal. Company discontinued Phase 2 planning; pivoted to other targets.

Lesson: Human-tissue multi-omics helps prioritise but does not guarantee. ALS in particular has produced a decade of promising-preclinical, failing-Phase-2 programmes regardless of discovery modality.

Recursion REC-994: a textbook missed primary endpoint

REC-994 (Phase 2 SYCAMORE in cerebral cavernous malformation, Q3 2024) missed primary efficacy endpoint. Company is continuing into Phase 3 on secondary-endpoint signals. For a flagship platform asset, the data were underwhelming and the stock reacted accordingly.

Lesson: Even the best-funded AI platform cannot de-risk biology. Phenomics identifies candidates; it does not validate disease mechanism.

Insitro: $743M raised, 0 INDs

Insitro has raised $743M cumulative at a peak $2.5B valuation. It has no wholly-owned clinical assets as of mid-2026 — eight years post-founding. Its Gilead NASH partnership ended in 2023; BMS ALS partnership continues but has produced no IND. Laid off ~22% Oct 2024.

Lesson: "Machine learning + functional genomics" is a scientifically beautiful thesis with a punishing capital-efficiency profile. Target validation at scale is slow; capital-efficient is almost the opposite of what this model produces.

17 · Future Outlook

Five predictions for 2026–2030.

The AI drug-discovery sector is roughly 14 years old. No Phase 3 success yet, no FDA approval yet, and an unresolved productivity gap with China. The next four years will deliver most of the answers investors have been waiting for.

01 · First FDA approval of an AI-designed drug by 2027/28

Most likely candidates (ranked): Takeda TAK-279 (ex-Nimbus TYK2, Schrödinger-designed); Relay RLY-4008 (FGFR2 cholangiocarcinoma); Insilico INS018_055 (TNIK IPF). If TAK-279 wins first, the narrative will emphasise physics-based design; if RLY-4008 wins first, it will emphasise MD + ML; if INS018_055 wins first, it will emphasise end-to-end generative. Industry perception will shift on whoever crosses first.

02 · Consolidation accelerates; 30 to 15 to 8

The ~30 well-funded US AI biotechs will compress to roughly 15 survivors by 2028 through M&A, attrition, and fold-ins. Tempus, Recursion, Schrödinger, Xaira, Isomorphic, Generate, Nimbus (post-next-exit), Tempus, and two to four others will be the 2030 cohort.

03 · Pharma internalises the software layer

Lilly, BMS, and Novartis will all build meaningfully larger internal AI teams by 2028. The licensing model (pay-per-target) remains; the outsourcing of the full AI stack (pay-for-platform-access) narrows as pharma ML teams mature. Schrödinger and similar sellers face margin pressure.

04 · Wet-lab integration wins

Companies without owned wet-lab stacks underperform per unit capital. Recursion (BioHive + imaging), Insilico (Life Star 1), and Xaira (integrated foundation-model-plus-lab thesis) are the structural winners. Platform-only sellers (Atomwise-as-Numerion, Isomorphic) face commoditisation pressure from open-source models.

05 · FDA approves ~3–5 AI-origin NMEs by 2030

The 2024 Jayatunga Phase 1 success-rate uplift (80–90%) implies a materially larger AI-origin clinical pipeline entering Phase 3 in 2027–2029. Even with industry-standard Phase 2→3 attrition, 3–5 NDAs by 2030 is plausible. The sector's economic thesis (<$500M per NME) remains untested at approval scale.

BONUS · The China question is not resolved

If BIOSECURE fully separates US and Chinese pharma supply chains by 2028, Insilico and its Chinese peers retain their cost and speed advantage only for non-US markets. The US sector may rebuild Indian/Korean CRO redundancy at 2–3x Chinese cost, narrowing but not closing the productivity gap. The global AI drug market bifurcates.

The single most important signal to watch. Whether the first FDA-approved AI-discovered NME comes out of a Chinese-operating programme (Insilico INS018_055) or a US-operating programme (Relay RLY-4008, Takeda TAK-279). If China gets there first on a drug with a US label, it will force a strategic rethink across every US pharma's AI allocation. If the US gets there first, the productivity-gap narrative softens — but does not disappear, because approvals are a lagging indicator and PCC output is the leading one.

18 · Clinical Watchlist

Every US AI-origin drug currently in human trials.

Fifty-plus AI-derived molecules are in active US clinical development as of May 2026. The table below lists the most-watched programmes by sponsor, phase, indication, and near-term readout. Green rows are Phase 2b or later; amber are Phase 1b/2a; blue are Phase 1 initiation or earlier.

SponsorAssetTarget / MoAIndicationPhaseNext readoutNCT
Relay TherapeuticsRLY-4008 (lirafugratinib)FGFR2-selectiveCholangiocarcinoma (FGFR2 fusion)Ph 2 pivotalBLA filing 2025NCT04526106
Relay TherapeuticsRLY-2608Mutant-selective PI3KαHR+/HER2– breast cancerPh 1b/2Ph 3 plan 2025NCT05216432
Relay TherapeuticsRLY-5836PI3Kα CNS-penetrantAdvanced solid tumoursPh 12025 safetyNCT05759949
RecursionREC-994Undisclosed / superoxideCerebral cavernous malformationPh 2/3Ph 3 design 2025NCT05085535
RecursionREC-2282Pan-HDACNF2 meningiomasPh 2/3 POPLARInterim 2025NCT05130866
RecursionREC-4881MEK1/2FAPPh 2 TUPELOReadout 2026NCT05552741
RecursionREC-3964Toxin B inhibitorC. difficilePh 22025NCT05963321
RecursionREC-1245RBM39 degraderAdvanced solid tumoursPh 12026
Recursion (ex-Exscientia)GTAEXS617CDK7 selectiveSolid tumours (HR+ BC, OvCa)Ph 1/22025 dose-escNCT05985655
Recursion (ex-Exscientia)EXS74539LSD1SCLC / AMLPh 12025NCT06266545
Recursion (ex-Exscientia)EXS73565MALT1B-cell malignanciesPh 12025NCT06136559
SchrödingerSGR-1505MALT1B-cell lymphomasPh 12026 dose-escNCT05544019
SchrödingerSGR-2921CDC7AML / MDSPh 12025NCT06077162
SchrödingerSGR-3515Wee1 / Myt1Advanced solid tumoursPh 12026NCT06207526
AbsciABS-101TL1A antibody (AI-designed)IBDPh 12025 SAD/MADNCT06449585
Takeda (ex-Nimbus)TAK-279 (zasocitinib)TYK2 allostericPlaque psoriasis, PsA, IBDPh 3Ph 3 readout 2025/26NCT06088043
AbCelleraABCL575OX40L antibodyAtopic dermatitisPh 12025
Insilico MedicineINS018_055 (rentosertib)TNIK inhibitor (AI target + AI mol)IPFPh 2a/bPh 2b ongoingNCT05154240
Insilico MedicineISM3412MAT2AMTAP-deleted cancersPh 12025NCT06187857
Insilico MedicineISM5939ENPP1Solid tumoursPh 1 (US)2025NCT06183294
Generate BiomedicinesGB-0895Anti-TSLP antibodySevere asthmaPh 12025
Generate BiomedicinesGB-0669Anti-RSV antibodyRSV prophylaxisPh 12025
Iambic TherapeuticsIAM1363HER2-mutant selectiveHER2-mutant solid tumoursPh 1/22025 dose-escNCT06253871
BenevolentAIBEN-8744PDE10Ulcerative colitisPh 12025
BenevolentAIBEN-34712RARb agonistALSPh 1 IND2025/26
Nimbus TherapeuticsNDI-219216HPK1Advanced solid tumoursPh 12025
Valo HealthOPL-0301Undisclosed cardiovascularPost-MI CVPh 22026
Valo HealthOPL-0401ROCK1/2Diabetic retinopathyPh 22026
Verge GenomicsVRG50635PIKfyveALS (winding down)Ph 1 haltedn/aNCT04768972
~30
US AI-origin Phase 1 assets
Initiated 2020–2025; majority oncology + I&I.
~12
In Phase 2
Includes Relay RLY-2608 / -4008, Recursion REC-994, Insilico INS018_055.
1
In or approaching Phase 3
Takeda TAK-279 (ex-Nimbus TYK2) is the most advanced AI-associated programme.

Source: ClinicalTrials.gov lookups, company press releases, 10-K/10-Q filings, May 2026. "AI origin" defined broadly to include molecules where AI played a material role in target selection, hit ID, lead optimisation, or both. TAK-279 included because Nimbus/Schrödinger ML drove the discovery programme although the molecule is now wholly owned by Takeda.

19 · Foundational Papers & Open Source

The literature that built the sector.

Twelve papers and seven open-source releases account for most of the technical lineage of modern AI drug discovery. All are freely accessible; most have citation counts in the thousands or tens of thousands.

Foundational papers

Open-source releases that shaped practice

A pattern worth naming. Every foundational open-source release in the sector came from academia or a research lab. Not one came from a US AI biotech. The commercial layer has consumed open-source aggressively (RFdiffusion inside Xaira and Generate, AlphaFold inside every pharma computational chemistry team) and contributed back sparingly. That is consistent with drug-discovery economics (IP concentrated on molecules, not methods) but also explains why the sector's technical narrative is increasingly written in DeepMind's London office and Baker's Seattle lab rather than in Boston or Salt Lake City.

20 · Compute & Data Infrastructure

The GPU arms race and the data moat.

Modern AI drug discovery runs on two inputs: compute and proprietary biological data. NVIDIA has become the default hardware vendor, with equity stakes in at least six US AI biotechs. Proprietary datasets — phenomics images, antibody sequences, DEL binding curves, clinical genomics — are the only durable moats in a world where models are increasingly commoditised.

63k
GPUs in Recursion BioHive-2
NVIDIA H100 + DGX cluster, Salt Lake City. ~2.2 exaflops AI performance. Commissioned 2024.
65PB
Recursion phenomics dataset
Cellular images + biological/chemical multi-omics. Largest single-company phenomics corpus globally.
8M
Tempus AI oncology records
De-identified clinical records + 1.5M clinical-grade genomic profiles. Largest US RWD oncology asset.
500k
Regeneron GC exomes
Largest private exome-sequencing operation. Feeds AI target ID across the Regeneron pipeline.

NVIDIA's pharma equity portfolio

NVIDIA has taken equity stakes or strategic partnerships in Recursion ($50M, Jul 2023), Schrödinger, Iambic, Terray, Evozyne, Generate Biomedicines, and Genesis Therapeutics. Its BioNeMo foundation-model service (GA 2023) is the most-used non-internal model stack in the sector. The bet is obvious: drug discovery is the largest serious enterprise workload for NVIDIA GPUs outside hyperscale AI, and model training at scale requires NVIDIA silicon.

The corollary: NVIDIA has structural pricing power over any AI biotech running 10,000+ GPU training jobs. In Q4 2024 NVIDIA DGX cluster lead times for biotech customers stretched to nine months.

Proprietary data moats by company

  • Recursion — 65PB phenomics + CellProfiler-derived embeddings. Hardest to replicate.
  • AbCellera — ~millions of antibody sequences per Celium campaign.
  • Absci — billion-interactions/day SoluPro E. coli screens.
  • Terray — tNova chip generates the largest structured biochemistry dataset for ML training anywhere outside pharma.
  • Tempus AI — 8M patient records, 1.5M genomic profiles, 200+ pharma data contracts.
  • Schrödinger — 35 years of physics-based FEP data; no ML substitute exists.
  • Isomorphic Labs — exclusive access to AlphaFold 3 inference at scale, Alphabet compute.

Models commoditise; data does not. The cost of training a state-of-the-art protein language model has fallen ~100x since 2021 (ESM-1 vs ESM-2; AlphaFold 2 vs 3). The cost of generating 65PB of phenomics data, or 500k exomes, or 8M oncology records has not. The US AI biotechs with genuine long-term moats are the ones running proprietary wet-lab or clinical data loops at scale. Pure-model companies face the same margin compression as every other ML-as-a-service vendor confronting open-weights competition.