Skip to main content

1. Get an API key

Sign up at dev.thedrive.ai and create an API key from the dashboard. Free tier includes 100 credits/month.

2. Install the SDK

pip install thedriveai

3. Extract data

from thedriveai import TheDriveAI

client = TheDriveAI(api_key="tda_live_...")

result = client.extract(
    file="invoice.pdf",
    schema={
        "invoice_number": {"type": "string", "description": "The invoice number"},
        "vendor": {"type": "string", "description": "Company name"},
        "total": {"type": "number", "description": "Total amount due"},
        "is_paid": {"type": "boolean", "description": "Whether the invoice is marked as paid"},
    },
)

print(result.data)
# {"invoice_number": "INV-2024-0042", "vendor": "Acme Corp", "total": 1234.56, "is_paid": false}

print(result.confidence)
# {"invoice_number": 0.96, "vendor": 0.94, "total": 0.91, "is_paid": 0.85}

4. Or use Pydantic / Zod

Instead of writing raw JSON schemas, use the tools you already know.
from pydantic import BaseModel, Field
from thedriveai import TheDriveAI

class Invoice(BaseModel):
    invoice_number: str = Field(description="The invoice number")
    vendor: str = Field(description="Company name")
    total: float = Field(description="Total amount due")
    is_paid: bool = Field(description="Whether the invoice is marked as paid")

client = TheDriveAI(api_key="tda_live_...")
result = client.extract(file="invoice.pdf", schema=Invoice)

print(result.data["vendor"])  # "Acme Corp"
print(result.data["total"])   # 1234.56
Both approaches support nested objects, arrays, enums, and optional fields. See Schemas for the full reference.

5. Extract from a URL

You can pass a URL instead of uploading a file. Works with any public URL — the API renders JavaScript-heavy pages with a headless browser.
result = client.extract(
    url="https://example.com/pricing",
    schema={
        "plans": {
            "type": "array",
            "description": "Available pricing plans",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string", "description": "Plan name"},
                    "price": {"type": "number", "description": "Monthly price"},
                }
            }
        }
    },
)

What’s in the response

Every extraction returns:
FieldDescription
dataExtracted values, type-enforced to match your schema
confidencePer-field confidence scores (0.0 - 1.0)
citationsPer-field source text snippets from the document
field_statusPer-field found/not_found status with type info
credits_usedCredits consumed for this request

Error handling

from thedriveai import TheDriveAI, TheDriveAIError

try:
    result = client.extract(file="doc.pdf", schema={...})
except TheDriveAIError as e:
    print(e.status_code)  # 400, 413, 503, etc.
    print(e.detail)       # Error detail from the API
StatusMeaning
400Invalid schema, missing file/URL, or bad request
401Missing or invalid API key
413File too large (max 50 MB)
503AI service temporarily unavailable

What’s next

Analyze

Compute and reason over documents — sums, comparisons, risk assessments.

Cross-Analyze

Validate an invoice against a contract, reconcile spreadsheets, compare agreements.

Schemas

Full type reference — arrays, nested objects, enums, required fields, Pydantic, Zod.

Async & Webhooks

Process documents asynchronously with webhook notifications.