Analyze Document

curl --request POST \
  --url https://api.example.com/api/v1/analyze \
  --header 'Content-Type: multipart/form-data' \
  --form 'schema=<string>' \
  --form 'file=<string>' \
  --form 'url=<string>' \
  --form 'url_headers=<string>' \
  --form include_steps=false \
  --form sync=true \
  --form 'webhook_url=<string>'

{
  "confidence": {
    "is_overdue": 0.72,
    "total_cost": 0.95
  },
  "content_type": "application/pdf",
  "credits_used": 10,
  "data": {
    "is_overdue": true,
    "total_cost": 1500
  },
  "filename": "invoice.pdf",
  "model_used": "gpt-5.1",
  "reasoning": {
    "is_overdue": "Due date 2025-01-15 has passed.",
    "total_cost": "Summed 3 line items: $500 + $600 + $400"
  },
  "sources": {
    "is_overdue": [
      "Due Date: January 15, 2025"
    ],
    "total_cost": [
      "Item A: $500",
      "Item B: $600",
      "Item C: $400"
    ]
  },
  "success": true,
  "text_length": 3200,
  "total_pages": 5
}

Analysis

Analyze Document

Analyze a document by asking questions. Unlike /extract, this endpoint can compute, reason, and derive values that aren’t explicitly written in the document.

Upload a file (or provide a URL) along with a typed schema. Each field must have a type and description.

Example schema:

{
  "total_cost": {"type": "number", "description": "Sum all line item amounts"},
  "is_auto_renew": {"type": "boolean", "description": "Does this contract auto-renew?"},
  "risk_level": {"type": "string", "enum": ["low", "medium", "high"], "description": "Legal risk level"}
}

Supported types: string, number, integer, boolean, array, object

Every answer includes:

The computed answer (type-enforced)
Reasoning explaining how it was derived
Source text snippets from the document
Confidence level

Pricing: 2 credits/page (min 5) for documents, 10 credits for websites.

POST

api

analyze

Analyze Document

curl --request POST \
  --url https://api.example.com/api/v1/analyze \
  --header 'Content-Type: multipart/form-data' \
  --form 'schema=<string>' \
  --form 'file=<string>' \
  --form 'url=<string>' \
  --form 'url_headers=<string>' \
  --form include_steps=false \
  --form sync=true \
  --form 'webhook_url=<string>'

{
  "confidence": {
    "is_overdue": 0.72,
    "total_cost": 0.95
  },
  "content_type": "application/pdf",
  "credits_used": 10,
  "data": {
    "is_overdue": true,
    "total_cost": 1500
  },
  "filename": "invoice.pdf",
  "model_used": "gpt-5.1",
  "reasoning": {
    "is_overdue": "Due date 2025-01-15 has passed.",
    "total_cost": "Summed 3 line items: $500 + $600 + $400"
  },
  "sources": {
    "is_overdue": [
      "Due Date: January 15, 2025"
    ],
    "total_cost": [
      "Item A: $500",
      "Item B: $600",
      "Item C: $400"
    ]
  },
  "success": true,
  "text_length": 3200,
  "total_pages": 5
}

Headers

X-API-Key

string

Body

multipart/form-data

schema

string

required

JSON schema — keys are field names, values describe what to compute or answer

file

string | null

File to analyze

url

string | null

URL to fetch file from

url_headers

string | null

JSON-encoded headers for URL auth

include_steps

boolean

default:false

Include the agent's reasoning trace (tool calls) in each answer

sync

boolean

default:true

Set to false to process asynchronously. Returns a task_id to poll.

webhook_url

string | null

URL to POST the result to when async processing completes.

Response

Successful Response

Response for document analysis — flat parallel dicts matching /extract.

data

Data · object

required

Computed/derived values keyed by field name

filename

string

required

Original filename or URL

model_used

string

required

Model used for analysis

text_length

integer

required

Length of extracted text

success

boolean

default:true

confidence

Confidence · object

Per-field confidence scores (0.0–1.0)

Show child attributes

sources

Sources · object

Per-field source text snippets from the document

Show child attributes

reasoning

Reasoning · object

Per-field step-by-step explanation of how the answer was derived

Show child attributes

steps

Steps · object

Per-field agent reasoning trace (tool calls). Empty unless include_steps=true.

Show child attributes

content_type

string | null

Detected MIME type

total_pages

integer

default:0

Total pages in the document (0 for non-paginated files)

credits_used

integer

default:0

Credits consumed

schema_format

enum<string> | null

Schema format: always 'typed'

Available options:

typed

Batch Extract Cross-Document Analysis