Skip to main content
Unlike /extract which pulls explicitly stated values, /analyze uses a multi-step reasoning agent that can compute, infer, and derive answers that aren’t written in the document.

When to use Analyze vs Extract

Use caseEndpoint
”What is the invoice number?”/extract
”What is the sum of all line items?”/analyze
”Does this contract auto-renew?”/analyze
”Rate the legal risk: low/medium/high”/analyze
”Pull the vendor name and address”/extract
”Are any line items duplicated?”/analyze
Rule of thumb: if the answer is directly stated in the document, use /extract. If it requires computation, comparison, or judgment, use /analyze.

Example

from thedriveai import TheDriveAI

client = TheDriveAI(api_key="tda_live_...")

result = client.analyze(
    file="contract.pdf",
    schema={
        "total_value": {
            "type": "number",
            "description": "Total contract value (sum all payment amounts)"
        },
        "auto_renews": {
            "type": "boolean",
            "description": "Does this contract auto-renew?"
        },
        "termination_notice_days": {
            "type": "integer",
            "description": "How many days notice required to terminate?"
        },
        "risk_level": {
            "type": "string",
            "enum": ["low", "medium", "high"],
            "description": "Overall legal risk level"
        },
    },
)

print(result.data)
# {"total_value": 250000, "auto_renews": true, "termination_notice_days": 90, "risk_level": "medium"}

print(result.reasoning["auto_renews"])
# "Section 8.2 states 'This agreement shall automatically renew for successive one-year terms
#  unless either party provides written notice of non-renewal at least 90 days prior...'"

print(result.sources["total_value"])
# ["Year 1: $100,000 (Section 4.1)", "Year 2: $150,000 (Section 4.2)"]
Works with Pydantic and Zod too — see Schemas.

Response

FieldDescription
dataComputed answers, type-enforced to match your schema
confidencePer-field confidence scores (0.0 - 1.0)
reasoningPer-field step-by-step explanation of how the answer was derived
sourcesPer-field text snippets from the document supporting the answer
stepsAgent tool call trace (only when include_steps=true)
total_pagesNumber of pages in the document
credits_usedCredits consumed

Confidence scores

RangeMeaning
0.95 - 1.0Directly stated or computed from unambiguous data
0.80 - 0.94Clearly derivable, minor interpretation required
0.60 - 0.79Reasonably inferred, some assumptions made
0.30 - 0.59Ambiguous or indirect evidence
0.00 - 0.29Speculative or not found

How the agent works

The analyze endpoint runs a multi-step reasoning agent that has access to:
ToolPurpose
searchSemantic + keyword search. Understands synonyms and handles typos.
read_pagesRead specific pages by number
regex_searchFind patterns — dates, amounts, percentages, IDs
computeExecute Python code for calculations. All math goes through code, never mental math.
extract_tableGet structured table data from PDFs, CSVs, or spreadsheets
view_page_imageVisual analysis of scanned pages, charts, or diagrams (PDFs only)
The agent plans its approach, navigates to relevant sections, gathers evidence, cross-references, and verifies its answer before responding. Set include_steps=true to see the full reasoning trace.

Pricing

InputCost
Documents2 credits/page (minimum 5)
Websites10 credits flat