Backed by Y Combinator

Parse documents into structured data

Text, tables, and figures in one call. Accurate parsing, at least 2x faster than other parsers, at $3 per 1,000 pages.

POST /v1/extractstatus extracting
0ms
elapsed
0/169
chunks
3
pages
COMPLIANCE
HIPAA + BAA
TRAINING
Never
RETENTION
Dropped on response
proven at scale
70,000,000+
pages processed

Built from YouLearn's production document pipeline before becoming an API.

Complete response shape. Source bboxes, OCR confidence, and PPTX/DOCX input in one call.

capabilityExtractaws textractllamaparsereducto
text extraction
text accuracy81.9%60.7%69.1%70.1%
per-span bbox
per-span confidence
OCR!*
pptx / docx input
markdown output

Production hosted APIs compared on the response shape from a single call; text accuracy is from the 130-page human-labeled gold benchmark.

custom benchmark

Send us your docs. We'll show you how it performs on yours, not ours.

Book a benchmark call
confidence review· synthetic-claim.pdfOCR spans
synthetic claim form
Intake summary
patientA. Rivera
member id8K2G-19Q
dob03 / 14 / 1978
prior authYES
diagnosisI50.9
providerBeacon Health
flagged spansOCR confidence
member_id
42%
8K2G-19Q
p.12 · bbox [142, 214, 223, 229]
prior_auth
68%
YES
p.12 · bbox [401, 318, 429, 333]
diagnosis
83%
I50.9
p.12 · bbox [146, 382, 181, 397]
patient
98%
A. Rivera
p.12 · bbox [142, 176, 207, 191]

Review the uncertain. Skip the rest.

Per-span confidence puts uncertain OCR text first, so a 200k-page run shows the spans that need review.

Built for healthcare extraction. EOBs, prior auths, clinical notes, intake forms, claims.

documents we handle
EOBsPrior authsClinical notesDischarge summariesIntake formsLab reportsImaging studiesClaims
how we handle them
  • BAA on request
  • PHI-safe mode
  • In-memory processing
  • Per-span bbox citations

Start for free.
Usage-based from there.

free
$0
1,000 pages · no card · lifetime
  • Full API access, no rate gates
  • Self-serve dashboard with usage
  • Email support
  • $3 / 1,000 pages after free credit
custom
Enterprise
for teams with higher workloads · volume discounts · SLAs
  • Dedicated region + private networking
  • HIPAA + BAA available
  • Slack channel with engineering
  • Production SLAs and priority queues

Questions
before you ship.

If something isn't here, email hello@extract.page.

  • Yes. HIPAA + BAA is available on request. We have signed BAAs with healthcare customers in production. Talk to us about your compliance requirements.

  • We don't train on your data. Source documents are processed in memory and dropped as soon as the response returns. Extracted images are uploaded to our object store so you can fetch them via image_url. The custom tier supports customer-managed encryption, configurable retention, and dedicated regions.

  • PDF, PPTX, and DOCX. Scanned PDFs are handled automatically. OCR runs inline with no separate surcharge.

  • A list of chunks. Each chunk carries page_content, page_no, a bbox in PDF points, and a per-span confidence score. Image chunks include an image_url to the rendered region.

  • Yes. Send us 20-50 representative documents and we'll run the same eval on your corpus. Healthcare and other regulated docs are handled under BAA. Book a benchmark call.

  • Our pricing is $3 per 1,000 pages, all-in. No credits, no seat fees, no monthly minimums. Compared to providers that price by credit (Reducto, LlamaParse) or by operation type (Textract, Azure DI), most teams find we're cheaper in total spend once you account for tables, forms, and OCR. We're happy to scope your monthly volume on a benchmark call.

  • 500 pages and 150 MB per synchronous request. For larger jobs, we have an async endpoint that handles batch jobs up to 1M pages with webhook delivery on completion. Email hello@extract.page or book a call for access.

  • Available on the custom tier. Dedicated regions, private networking, and on-prem options are available for teams with strict security or data residency requirements.

  • Yes. Available on the custom tier: negotiated rate per page, dedicated regions, private networking, production SLAs, and a Slack channel with the engineering team.

  • The API returns 402 when your balance is exhausted. Top up your balance from the dashboard in $10, $30, $100, $500 increments, or any amount. Keys keep working the moment a top-up lands.

Ship extraction in an afternoon.

1,000 free pages. No card. Pay $3 per 1,000 after.