Extract: Parse documents into structured data

COMPLIANCE: HIPAA + BAA
TRAINING: Never
RETENTION: Dropped on response

Complete response shape. Source bboxes, OCR confidence, and PPTX/DOCX input in one call.

capability	Extract	aws textract	llamaparse	reducto
text extraction	✓	✓	✓	✓
text accuracy	81.9%	60.7%	69.1%	70.1%
per-span bbox	✓	✓	—	✓
per-span confidence	✓	—	—	✓
OCR	✓	✓	!*	✓
pptx / docx input	✓	—	—	—
markdown output	—	—	✓	✓

Production hosted APIs compared on the response shape from a single call; text accuracy is from the 130-page human-labeled gold benchmark.

custom benchmark

Send us your docs. We'll show you how it performs on yours, not ours.

Book a benchmark call

confidence review· synthetic-claim.pdfOCR spans

synthetic claim form

Intake summary

patientA. Rivera

member id8K2G-19Q

dob03 / 14 / 1978

prior authYES

diagnosisI50.9

providerBeacon Health

flagged spansOCR confidence

member_id

42%

8K2G-19Q

p.12 · bbox [142, 214, 223, 229]

prior_auth

68%

YES

p.12 · bbox [401, 318, 429, 333]

diagnosis

83%

I50.9

p.12 · bbox [146, 382, 181, 397]

patient

98%

A. Rivera

p.12 · bbox [142, 176, 207, 191]

Review the uncertain. Skip the rest.

Per-span confidence puts uncertain OCR text first, so a 200k-page run shows the spans that need review.

Built for healthcare extraction. EOBs, prior auths, clinical notes, intake forms, claims.

documents we handle

EOBsPrior authsClinical notesDischarge summariesIntake formsLab reportsImaging studiesClaims

how we handle them

BAA on request
PHI-safe mode
In-memory processing
Per-span bbox citations

See healthcare benchmark

Start for free.
Usage-based from there.

free

$0

1,000 pages · no card · lifetime

Full API access, no rate gates
Self-serve dashboard with usage
Email support
$3 / 1,000 pages after free credit

Get Started

custom

Enterprise

for teams with higher workloads · volume discounts · SLAs

Dedicated region + private networking
HIPAA + BAA available
Slack channel with engineering
Production SLAs and priority queues

Book a demo

Questions
before you ship.

If something isn't here, email hello@extract.page.

Yes. HIPAA + BAA is available on request. We have signed BAAs with healthcare customers in production. Talk to us about your compliance requirements.
We don't train on your data. Source documents are processed in memory and dropped as soon as the response returns. Extracted images are uploaded to our object store so you can fetch them via image_url. The custom tier supports customer-managed encryption, configurable retention, and dedicated regions.
PDF, PPTX, and DOCX. Scanned PDFs are handled automatically. OCR runs inline with no separate surcharge.
A list of chunks. Each chunk carries page_content, page_no, a bbox in PDF points, and a per-span confidence score. Image chunks include an image_url to the rendered region.
Yes. Send us 20-50 representative documents and we'll run the same eval on your corpus. Healthcare and other regulated docs are handled under BAA. Book a benchmark call.
Our pricing is $3 per 1,000 pages, all-in. No credits, no seat fees, no monthly minimums. Compared to providers that price by credit (Reducto, LlamaParse) or by operation type (Textract, Azure DI), most teams find we're cheaper in total spend once you account for tables, forms, and OCR. We're happy to scope your monthly volume on a benchmark call.
500 pages and 150 MB per synchronous request. For larger jobs, we have an async endpoint that handles batch jobs up to 1M pages with webhook delivery on completion. Email hello@extract.page or book a call for access.
Available on the custom tier. Dedicated regions, private networking, and on-prem options are available for teams with strict security or data residency requirements.
Yes. Available on the custom tier: negotiated rate per page, dedicated regions, private networking, production SLAs, and a Slack channel with the engineering team.
The API returns 402 when your balance is exhausted. Top up your balance from the dashboard in $10, $30, $100, $500 increments, or any amount. Keys keep working the moment a top-up lands.

Parse documents into structured data

Review the uncertain. Skip the rest.

Start for free.Usage-based from there.

Questionsbefore you ship.

Ship extraction in an afternoon.

Start for free.
Usage-based from there.

Questions
before you ship.