Facial features
Mask-based defacing removes facial geometry while leaving brain, skull base, and surrounding anatomy untouched. Penn benchmarked four defacing algorithms — CuratAI's delivered the best privacy–utility trade-off.
CuratAI · AI-Native Multimodal Data Curation Platform
Retrieval, ingest (unstructured → structured), de-identification across all five PHI vectors, AI-assisted annotation, custom AI plugin slot, cross-institution sharing. Inside the institutions that already do the work.
Stage 1 · Retrieval
Pull from PACS, EHR, and pathology systems. Batch via REDCap, Excel, or CSV exports — whatever shape your research data is already in. One pipeline for every modality you work with.
PACS, EHR, pathology. Batch pulls driven by REDCap, Excel, or CSV.
DICOM, NIfTI, WSI, PDF, clinical notes. One pipeline.
Configure once. Studies arrive automatically as they're scanned or charted.
Stage 2 · Ingest
Pulls structured fields out of free-text clinical notes, reports, images, and PDFs. Adapts to each target registry's structure. A local 7B model on a 12 GB consumer GPU is enough; accuracy depends on the registry.
The Stroke Thrombectomy and Aneurysm Registry (founded at MUSC, 85+ sites, 15,000+ patients) requires structured abstraction across 341 fields per case from the H&P, procedure note, and discharge summary. We validated on 29 patients, 170 evaluable fields, against the site's REDCap ground truth.
| Field type | n | CuratAI 7B | Cloud baseline |
|---|---|---|---|
| Yes / No | 49 | 91.6% | 92.3% |
| Multiple-choice | 23 | 75.6% | 79.6% |
| Cascaded | 94 | 90.1% | 93.2% |
| Free text | 4 | 53.4% | 57.1% |
| Overall | 170 | 87.7% | 90.9% |
Confidence-triaged workflow
Local 7B reaches 87.7% overall on a complex neurovascular registry — within 3.2 points of the cloud baseline. Registry-agnostic prompts; the same engine ports to GWTG-Stroke, oncology, and custom institutional registries.
Automated Population of the STAR Neurovascular Registry Using a Local Language Model. Submitted to Neurosurgery, 2026.
Stage 3 · De-identify
Headers, dates, burned-in pixel text, facial features, and clinical reports. Each gets a dedicated mechanism — naive blanket-blackouts destroy research utility.
Mask-based defacing removes facial geometry while leaving brain, skull base, and surrounding anatomy untouched. Penn benchmarked four defacing algorithms — CuratAI's delivered the best privacy–utility trade-off.
OCR plus local-LLM reasoning classifies each detected region. PHI is replaced; measurements, scale bars, side markers, and modality annotations are preserved — so the image stays readable for research.
A patient-specific random offset removes the absolute date but preserves the intervals — baseline to follow-up, procedure to discharge, treatment to event — exactly.
Stage 4 · Annotate
The hidden bottleneck in every research pipeline is the graduate student clicking 12,000 scans. Prediction and interpolation turn a 6-month clicking project into a 3-week supervision project.
Runs in the browser. No per-workstation install, no IT ticket to onboard a new annotator.
Annotator draws on representative slices; the model predicts and fills in the rest of the 3-D segmentation. Reviewer corrects what's needed.
PI, annotator, and reviewer roles with shared queues and live cursors. Reviewer approval baked in.
Every edit logged with user, timestamp, and reversible diff. Inter-observer variability is visible in the viewer.
Linked axial, sagittal, coronal, and MIP views, plus 3-D rendering. DICOM-RT, NIfTI, and DICOM-SEG export.
Stage 5 · AI Plugins
CuratAI's plugin slot is what makes it a platform: your research question, our stack, your model — running on your hardware, inside your firewall, on data that stays inside the institution.
Any language — Python, C++, R, MATLAB, Julia. CuratAI handles the data plumbing, the cohort selection, the I/O contracts.
GPU on the same workstation, on a local Linux server, or on your institutional compute. No cloud, no telemetry, no PHI off-site.
Your model trains and infers on data that's already passed pixel-level PHI removal, audit logging, and IRB-scoped project access.
Stage 6 · Collaborate
A dataset becomes research-grade when it's shareable across collaborators without ever being exported as raw files.
Share is blocked unless every artifact has cleared the de-identification pipeline; nothing leaves the institution while it still contains PHI.
Scoped, time-bound access to a specific de-identified project — no raw-file handoffs.
Same project visible to collaborators at other institutions, IRB-scoped.
Re-identification is possible internally with the linking file; never on the receiving side.
Deployment
CuratAI runs entirely inside your institution's network. The full stack — application, database, on-premise large language model, custom-model runner — installs on a Windows workstation or Linux server you control. No cloud component, no telemetry, no outbound connection that carries patient data. The linking file required for re-identification stays inside your firewall.
And then — the same stack
A research group can de-identify on Monday, annotate by Friday, train the following weeks, publish in months, and ship as an FDA-cleared product on the same stack. Same company, same deployment footprint, same team.
See the clinical AI products built on CuratAI's stack: INTContour (FDA-cleared auto-contouring), OncoAI Suite, QuantBrain, and INTDose.