Skip to main content
The same document intelligence that powers The Drive AI, available as an API. Upload a PDF, spreadsheet, image, or URL — get back typed JSON matching your schema.
This is not a full API for The Drive AI product. These are standalone document intelligence endpoints — the same ones we use internally — exposed for developers and AI agents that need to process documents programmatically.

Extract

Pull structured fields from a document. Fast, schema-enforced, type-safe.

Analyze

Compute, reason, and derive answers that aren’t explicitly written in the document.

Cross-Analyze

Validate and cross-reference across 2-5 documents simultaneously.

How it works

  1. Define a schema — describe the fields you want, with types
  2. Send a document — upload a file or pass a URL
  3. Get structured data — typed JSON, confidence scores, source citations
from thedriveai import TheDriveAI

client = TheDriveAI(api_key="tda_live_...")

result = client.extract(
    file="invoice.pdf",
    schema={
        "vendor": {"type": "string", "description": "Company name"},
        "total": {"type": "number", "description": "Total amount due"},
        "line_items": {
            "type": "array",
            "description": "Each line item",
            "items": {
                "type": "object",
                "properties": {
                    "description": {"type": "string"},
                    "amount": {"type": "number"}
                }
            }
        }
    },
)

print(result.data["vendor"])      # "Acme Corp"
print(result.data["total"])       # 1234.56
print(result.confidence["total"]) # 0.95

Define schemas your way

Write raw JSON, or use Pydantic (Python) and Zod (TypeScript) to define schemas with the tools you already use.
from pydantic import BaseModel, Field
from thedriveai import TheDriveAI

class Invoice(BaseModel):
    vendor: str = Field(description="Company name")
    total: float = Field(description="Total amount due")
    is_paid: bool = Field(description="Whether the invoice is paid")

client = TheDriveAI(api_key="tda_live_...")
result = client.extract(file="invoice.pdf", schema=Invoice)
The SDK converts your Pydantic models and Zod schemas to the typed format automatically — including nested objects, arrays, enums, and optional fields.

Supported formats

CategoryFormats
DocumentsPDF, DOCX, DOC, ODT, RTF
SpreadsheetsXLSX, CSV, TSV
PresentationsPPTX
DataJSON, XML
ImagesJPG, PNG, TIFF, WEBP
WebAny public URL (rendered with a headless browser)

Authentication

All requests require an X-API-Key header. Get your key at dev.thedrive.ai.
curl -X POST https://dev.thedrive.ai/api/v1/extract \
  -H "X-API-Key: tda_live_..." \
  -F file=@invoice.pdf \
  -F 'schema={"vendor": {"type": "string", "description": "Company name"}}'

Install

pip install thedriveai

Pricing

EndpointCost
Extract1 credit/page (min 3), 5 credits for websites
Analyze2 credits/page (min 5), 10 credits for websites
Cross-Analyze5 credits/doc + 3 credits/page (min 10)
Markdown1 credit
Thumbnail1 credit
Free tier includes 100 credits/month. Purchase more.