Text, tables, and figures in one call. Accurate parsing, at least 2x faster than other parsers, at $3 per 1,000 pages.
Built from YouLearn's production document pipeline before becoming an API.
Complete response shape. Source bboxes, OCR confidence, and PPTX/DOCX input in one call.
| capability | Extract | aws textract | llamaparse | reducto |
|---|---|---|---|---|
| text extraction | ✓ | ✓ | ✓ | ✓ |
| text accuracy | 81.9% | 60.7% | 69.1% | 70.1% |
| per-span bbox | ✓ | ✓ | — | ✓ |
| per-span confidence | ✓ | — | — | ✓ |
| OCR | ✓ | ✓ | !* Only in premium mode. Higher latency and credit cost than fast mode. | ✓ |
| pptx / docx input | ✓ | — | — | — |
| markdown output | — | — | ✓ | ✓ |
Production hosted APIs compared on the response shape from a single call; text accuracy is from the 130-page human-labeled gold benchmark.
Send us your docs. We'll show you how it performs on yours, not ours.
Per-span confidence puts uncertain OCR text first, so a 200k-page run shows the spans that need review.
Built for healthcare extraction. EOBs, prior auths, clinical notes, intake forms, claims.
If something isn't here, email hello@extract.page.
Yes. HIPAA + BAA is available on request. We have signed BAAs with healthcare customers in production. Talk to us about your compliance requirements.
We don't train on your data. Source documents are processed in memory and dropped as soon as the response returns. Extracted images are uploaded to our object store so you can fetch them via image_url. The custom tier supports customer-managed encryption, configurable retention, and dedicated regions.
PDF, PPTX, and DOCX. Scanned PDFs are handled automatically. OCR runs inline with no separate surcharge.
A list of chunks. Each chunk carries page_content, page_no, a bbox in PDF points, and a per-span confidence score. Image chunks include an image_url to the rendered region.
Yes. Send us 20-50 representative documents and we'll run the same eval on your corpus. Healthcare and other regulated docs are handled under BAA. Book a benchmark call.
Our pricing is $3 per 1,000 pages, all-in. No credits, no seat fees, no monthly minimums. Compared to providers that price by credit (Reducto, LlamaParse) or by operation type (Textract, Azure DI), most teams find we're cheaper in total spend once you account for tables, forms, and OCR. We're happy to scope your monthly volume on a benchmark call.
500 pages and 150 MB per synchronous request. For larger jobs, we have an async endpoint that handles batch jobs up to 1M pages with webhook delivery on completion. Email hello@extract.page or book a call for access.
Available on the custom tier. Dedicated regions, private networking, and on-prem options are available for teams with strict security or data residency requirements.
Yes. Available on the custom tier: negotiated rate per page, dedicated regions, private networking, production SLAs, and a Slack channel with the engineering team.
The API returns 402 when your balance is exhausted. Top up your balance from the dashboard in $10, $30, $100, $500 increments, or any amount. Keys keep working the moment a top-up lands.
1,000 free pages. No card. Pay $3 per 1,000 after.