The challenge
Clinical registry abstraction is one of the hidden bottlenecks in multi-site research. The Stroke Thrombectomy and Aneurysm Registry (STAR) — founded at MUSC, now spanning 85+ sites with over 15,000 enrolled patients — requires structured abstraction across 341 fields per case, drawn from the H&P, procedure note, and discharge summary.
Doing that by hand takes hours per patient. Sending the source notes to a cloud LLM is fast — but the notes are PHI, the institutions are HIPAA-covered, and the route through cloud inference is the route no IRB approves.
The setup
CuratAI’s ingest stage runs an open-weights 7B-parameter language model locally, on a 12 GB consumer-class GPU. The same GPU you can buy at Micro Center.
We validated against 29 patient charts from STAR. Across the 341 STAR fields, 170 were evaluable against the site’s REDCap ground truth. The remaining fields were either not applicable to the case or had no extractable answer in the source documents.
The result
| Field type | n | CuratAI 7B (local) | Cloud baseline |
|---|---|---|---|
| Yes / No | 49 | 91.6% | 92.3% |
| Multiple-choice | 23 | 75.6% | 79.6% |
| Cascaded | 94 | 90.1% | 93.2% |
| Free text | 4 | 53.4% | 57.1% |
| Overall | 170 | 87.7% | 90.9% |
A local 7B model reaches 87.7% overall accuracy — within 3.2 percentage points of a cloud-hosted model an order of magnitude larger.
The workflow that makes it work
Raw accuracy isn’t the whole story. CuratAI surfaces a confidence score with every extracted field:
- 86% of fields land at confidence 1 with ~89% accuracy — auto-accept.
- 14% land at confidence 2 or 3 — surfaced for human review.
- End-to-end: 26 minutes per patient across all 341 STAR fields.
The result is a workflow that’s both fast enough to be useful and conservative enough to be trustworthy — without sending a single line of clinical text outside the institution.
Why this generalizes
STAR is a hard registry to abstract well: 341 fields, cascaded dependencies, free-text observations. The same engine ports to GWTG-Stroke, oncology registries, and custom institutional registries — registry-agnostic prompts, same architecture.
Reference
Submitted to Neurosurgery, 2026. Pre-print and detailed methodology available on request.