API Reference
Base URL: https://api.factivelabs.com
All endpoints require Bearer token authentication unless noted. Responses are JSON.
POST /api/v1/verify
Submit content for fact-checking. Extracts individual claims and verifies each one against web sources. Supports three response modes: synchronous, streaming (SSE), and asynchronous (job polling).
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| content | string | Conditional | Raw text to verify. Required when content_type is text, html, or any of the social/AI-paste types (twitter, reddit, instagram, chatgpt, perplexity, gemini, claude, gist). Max 250K characters. |
| url | string | Conditional | URL to fetch and verify. Required when content_type is url, youtube, or tiktok. |
| file | string | Conditional | Base64-encoded file content. Required when content_type is pdf, docx, doc, rtf, or image. |
| content_type | string | Optional | Input format. Default text. Full set:
text, html, url, youtube, tiktok,
pdf, docx, doc, rtf, image,
twitter, reddit, instagram,
chatgpt, perplexity, gemini, claude, gist.
See the Input Formats guide for which field (content/url/file) each type expects. |
| mode | string | Optional | "claim" (single claim) or "document" (extract all claims). Auto-detected by length if omitted. |
| stream | boolean | Optional | Return SSE event stream instead of JSON. Default: false. |
| async | boolean | Optional | Return job ID immediately and poll for results. Default: false. |
| max_claims | integer | Optional | Maximum claims to extract and verify (1–200). Default: 50. |
| context | string | Optional | Additional context to improve verification accuracy (max 5,000 chars). |
| skip_table_claims | boolean | Optional | Auto-skip granular table/statistical claims that can't be web-verified. Saves cost. Default: true. |
| second_pass | boolean | Optional | Enable Sonnet second-pass verification for disputed claims in time-sensitive categories. Default: false. |
| include_sections | array[int] | Optional | Section indices to fact-check (from /api/v1/analyze-structure). Mutually exclusive with exclude_sections. |
| exclude_sections | array[int] | Optional | Section indices to skip. Mutually exclusive with include_sections. |
| webhook_url | string | Optional | If async mode, POST results to this URL when complete. |
| callback_metadata | object | Optional | Opaque JSON echoed back unchanged in the response. |
| profile | string | Optional | Pipeline profile override. Most callers should leave this unset and let the API auto-select. Available profiles change over time; ask your account contact before pinning one. |
| extract_only | boolean | Optional | If true, run extraction only and skip verification. Returns claims with verdict: null. Cheaper than /verify + faster than calling /api/v1/extract separately when you already have a /verify client. Default: false. |
| claims | array[object] | Optional | Pre-extracted claims to verify directly — skips the extraction stage. Each item must have text (the claim) and may optionally include sentence, start_offset, end_offset. Useful when you already have claims from a prior /verify call with extract_only: true. |
| use_private_corpus | boolean | Optional | Route claims through the customer's private corpus instead of the public web. Requires documents uploaded via POST /api/v1/private-corpus/upload. Default: false. |
| corpus_mode | string | Optional | How private-corpus results combine with public web sources when use_private_corpus: true. Values: "corpus_only" (default — corpus only), "corpus_and_web" (corpus first, fall back to web on inconclusive). |
| corpus_scope | string | Optional | One-paragraph description of what the corpus contains (max 2,000 chars). Improves corpus-routing accuracy when use_private_corpus: true. If omitted, the API uses the auto-generated scope from GET /api/v1/private-corpus/scope. |
| exclude_domains | array[string] | Optional | Domains to exclude from web-search retrieval (e.g. ["reuters.com", "infowars.com"]). Bare hostnames only — subdomains are excluded automatically. Cap: 100 entries per request. Note: Gemini-grounded fallback paths can’t honor this filter; claims that fall through to fallback may still cite blacklisted sources. See Source Blacklist. |
| end_user_id | string | Optional | Sub-tenant identifier (max 128 chars). Use to isolate your end-users’ data under your single API key. See Sub-tenanting. |
Response (Synchronous)
{
"id": "fc_abc123def456",
"status": "complete",
"input_type": "text",
"title": "",
"extracted_text": "The Earth is approximately 4.5 billion years old.",
"claims": [
{
"text": "The Earth is approximately 4.5 billion years old",
"sentence": "The Earth is approximately 4.5 billion years old.",
"sentence_text": "The Earth is approximately 4.5 billion years old.",
"start_offset": 0,
"end_offset": 49,
"span_start": 0,
"span_end": 49,
"verdict": "confirmed",
"summary": "Confirmed by radiometric dating evidence.",
"explanation": "Scientific evidence from radiometric dating consistently supports...",
"categories": ["science"],
"verified_by": "Sonar",
"skipped": false,
"context_flags": {},
"unclear_reason": null,
"sources": [
{
"title": "Age of Earth - USGS",
"url": "https://www.usgs.gov/...",
"domain": "usgs.gov",
"publisher": "USGS",
"snippet": "The age of the Earth is estimated at 4.54 billion years...",
"date": ""
}
]
}
],
"counts": {
"total": 1,
"confirmed": 1,
"disputed": 0,
"inconclusive": 0,
"skipped": 0
},
"highlights": [...],
"processing_time_ms": 3200,
"usage": {
"claim_count": 1,
"skipped_claims": 0,
"cost_usd": 0.01
}
}
Response (Async)
HTTP 202 Accepted
{
"id": "fc_abc123def456",
"status": "queued",
"created_at": "2026-05-14T10:30:00Z",
"message": "Job queued. Poll GET /api/v1/jobs/{id} for results."
}
SSE Events (Streaming)
event: claim_verified
data: {"claim": {"text": "...", "verdict": "confirmed", "summary": "...", "explanation": "...", "skipped": false, "context_flags": {}, "unclear_reason": null, "sources": [...]}, "skipped": false}
event: highlights
data: {"highlights": [...], "source_length": 1234, "source_hash": "a1b2c3d4e5f6"}
event: complete
data: {"id": "fc_abc123", "status": "complete", "title": "", "claims": [...], "counts": {"total": 3, "confirmed": 3, "disputed": 0, "inconclusive": 0, "skipped": 0}, "usage": {"claim_count": 3, "skipped_claims": 0, "cost_usd": 0.03}}
POST /api/v1/verify/paragraph
Fact-check one paragraph of text in a single stateless call. Each request is fully self-contained — no session, no shared state. The canonical endpoint for streaming-style integrations: call it once per \n\n-separated paragraph in your LLM's output.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| text | string | Required | One complete paragraph of text. Max 20,000 characters. Typically the output between two \n\n breaks of an LLM stream. |
| context | string | Optional | Surrounding context to disambiguate references (e.g. the user's question or the previous paragraph). Max 5,000 characters. |
| profile | string | Optional | Pipeline profile override. Leave unset for auto-selection. |
| use_private_corpus | boolean | Optional | Route claims through your private corpus instead of the public web. Default: false. |
| corpus_scope | string | Optional | One-paragraph description of what the corpus contains (max 2,000 chars). Improves corpus-routing accuracy. |
| second_pass | boolean | Optional | Enable Sonnet second-pass verification for disputed claims. Default: false. |
| max_claims | integer | Optional | Max claims to extract from this paragraph (1–200). Default: 50. |
| exclude_domains | array[string] | Optional | Domains to exclude from web-search retrieval. Same semantics as on POST /api/v1/verify. Cap: 100 entries per request. See Source Blacklist. |
| end_user_id | string | Optional | Sub-tenant identifier (max 128 chars). See Sub-tenanting. |
Response (JSON)
{
"paragraph_id": "pg_abc123def456",
"chars_received": 412,
"claims": [ /* array of Claim objects, each fully verified */ ],
"counts": {
"total": 3,
"confirmed": 2,
"disputed": 0,
"inconclusive": 1,
"skipped": 0
},
"processing_time_ms": 9420
}
POST /api/v1/verify/paragraph/stream
Streaming variant of /verify/paragraph. Same input, but the response is a server-sent-events (SSE) stream so claim cards can render before verdicts arrive.
Request Body
Same as /verify/paragraph.
SSE Events
Events arrive in this order:
| Event | Description |
|---|---|
paragraph_claims | Fires once after extraction. Payload includes the extracted claims with their span/sentence offsets but no verdicts yet. Use to render placeholder cards immediately. |
verify_result | Fires once per claim as each verification finishes (out of input order — fastest claim arrives first). Payload is a fully-populated Claim object. |
done | Fires once at the end with paragraph_id, chars_received, total_claims, and processing_time_ms. |
error | Fired if verification fails for the whole paragraph (rare; per-claim failures arrive as verify_result with verdict: "inconclusive"). |
Example event payload
event: paragraph_claims
data: {"paragraph_id": "pg_abc123def456", "claims": [{"text": "...", "sentence": "...", "start_offset": 0, "end_offset": 49}]}
event: verify_result
data: {"text": "...", "verdict": "confirmed", "summary": "...", "explanation": "...", "sources": [...], "verified_by": "Sonar"}
event: done
data: {"paragraph_id": "pg_abc123def456", "chars_received": 412, "total_claims": 3, "processing_time_ms": 9420}
POST /api/v1/extract
Extract individual claims from content using ProRata's extraction engine. No verification is performed — returns claims only. Significantly faster and cheaper than the verify endpoint.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| content | string | Required | The text to extract claims from. Maximum length depends on plan. |
| content_type | string | Required | Input format: text, url, youtube, tiktok, pdf, docx, image. |
| max_claims | integer | Optional | Maximum claims to extract (1–2000). Default: 2000. |
| context | string | Optional | Additional context to improve extraction accuracy. |
| skip_table_claims | boolean | Optional | Filter out granular table/statistical claims. Default: true. |
Response
{
"id": "ex_abc123def456",
"status": "complete",
"claims": [
{
"text": "The Earth is approximately 4.5 billion years old",
"sentence": "The Earth is approximately 4.5 billion years old.",
"start_offset": 0,
"end_offset": 49,
"filtered": false
}
],
"claims_count": 1,
"processing_time_ms": 820,
"usage": {
"claims_extracted": 1,
"claims_filtered": 0,
"content_length": 49,
"cost_usd": 0.002
}
}
POST /api/v1/extract-text
Extract plain text from a file or URL. No claim extraction, no verification, no billing. Returns the extracted text with a human-readable document title. Use this to preview content before running it through the verify endpoint.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| content_type | string | Required | Input format: text, url, youtube, tiktok, pdf, docx, image. |
| content | string | Conditional | Raw text input. Required when content_type is text. |
| url | string | Conditional | URL to fetch. Required when content_type is url, youtube, or tiktok. |
| file | string | Conditional | Base64-encoded file content. Required when content_type is pdf, docx, or image. |
Response
| Field | Type | Description |
|---|---|---|
| id | string | Unique job identifier (et_...). |
| status | string | Always "complete". |
| input_type | string | The content_type that was submitted. |
| text | string | Extracted plain text. DOCX preserves headings as markdown (#/##), lists, and tables. |
| title | string | Human-readable document title. Extracted from file metadata, first heading, or cleaned filename. For YouTube, returns the video title. For URLs, returns the page title. |
| char_count | integer | Length of extracted text in characters. |
| processing_time_ms | integer | Server-side processing time in milliseconds. |
Example
# Extract text from a DOCX file
curl -X POST https://api.factivelabs.com/api/v1/extract-text \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content_type": "docx",
"file": "UEsDBBQAAAAIAB..."
}'
{
"id": "et_a1b2c3d4e5f6",
"status": "complete",
"input_type": "docx",
"text": "# Introduction\n\nThis thesis examines...",
"title": "Czech Independence and the Chicago School",
"char_count": 48210,
"processing_time_ms": 340
}
Title Extraction
The title field uses a waterfall strategy per content type:
| Content Type | Title Source (in priority order) |
|---|---|
docx | Document metadata (dc:title) → first Heading 1 → cleaned filename |
pdf | PDF metadata title → cleaned filename |
url | Page og:title → <title> tag → URL path |
youtube | Video title from YouTube |
tiktok | "TikTok video" |
image | Cleaned filename |
text | First non-empty line (truncated to 100 chars) |
POST /api/v1/analyze-structure
Analyze a document's section structure and get smart skip recommendations. Zero LLM calls — pure text parsing, returns in milliseconds. Use this to build a section picker UI before running the expensive verify endpoint.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| content | string | Conditional | Document text with markdown headings. Required when content_type is text. Max 500K chars. |
| url | string | Conditional | URL to fetch. Required when content_type is url. |
| file | string | Conditional | Base64-encoded file. Required when content_type is pdf or docx. |
| content_type | string | Optional | Input format: text (default), url, pdf, docx. |
Response
{
"sections": [
{
"index": 0,
"name": "Frontmatter",
"level": 0,
"start_char": 0,
"end_char": 412,
"word_count": 65,
"recommended": false,
"skip_reason": "Frontmatter (title, author, abstract)",
"child_indices": [0]
},
{
"index": 1,
"name": "Introduction",
"level": 1,
"start_char": 412,
"end_char": 3842,
"word_count": 580,
"recommended": true,
"skip_reason": null,
"child_indices": [1, 2, 3]
}
],
"total_sections": 9,
"total_words": 13420,
"recommended_sections": 6,
"recommended_words": 10800,
"source_hash": "a1b2c3d4e5f6"
}
Pass the section indices to /api/v1/verify via include_sections or exclude_sections to fact-check only the sections you want. The source_hash lets you verify the text hasn't changed between analyze and verify calls.
GET /api/v1/jobs/{job_id}
Check the status of an async verification job. Poll this endpoint after submitting a request with "async": true.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| job_id | string | Required | The job ID returned from the verify endpoint when using async mode. |
Response (Processing)
{
"id": "fc_abc123def456",
"status": "processing",
"progress": {
"claims_extracted": 8,
"claims_verified": 3,
"claims_total": 12
}
}
Response (Completed)
{
"id": "fc_abc123def456",
"status": "complete",
"progress": {"claims_extracted": 12, "claims_verified": 12, "claims_total": 12},
"result": {
"id": "fc_abc123",
"status": "complete",
"title": "Czech Independence and the Chicago School",
"claims": [...],
"counts": {"total": 12, "confirmed": 8, "disputed": 2, "inconclusive": 2, "skipped": 0},
"usage": {"claim_count": 12, "skipped_claims": 0, "cost_usd": 0.12}
},
"created_at": "2026-04-11T10:30:00Z",
"completed_at": "2026-04-11T10:30:45Z"
}
Response (Failed)
{
"id": "fc_abc123def456",
"status": "failed",
"progress": {"claims_extracted": 0, "claims_verified": 0, "claims_total": 0},
"result": null,
"created_at": "2026-05-14T10:30:00Z",
"completed_at": "2026-05-14T10:30:01Z"
}
POST /api/v1/verify/batch
Submit multiple documents for verification in a single request. Each item is processed independently.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| items | array | Required | Array of verification requests. Each item has content and content_type fields (same schema as the verify endpoint). Max items: Free 5, Pro 50, Enterprise 100. |
Request Example
{
"items": [
{"content": "The speed of light is 300,000 km/s", "content_type": "text"},
{"url": "https://example.com/article", "content_type": "url"},
{"content": "Water boils at 100 degrees Celsius at sea level", "content_type": "text"}
]
}
Response
{
"results": [
{"id": "ver_001", "status": "completed", "claims": [...], "usage": {...}},
{"id": "ver_002", "status": "completed", "claims": [...], "usage": {...}},
{"id": "ver_003", "status": "completed", "claims": [...], "usage": {...}}
],
"total_usage": {
"claims_extracted": 5,
"claims_verified": 5,
"items_processed": 3
}
}
GET /health
Health check endpoint. No authentication required.
Response
{"status": "ok", "version": "1.0.0"}
Private Corpus
Customer-private document corpus. Verify against your own documents instead of (or in addition to) the public web. All endpoints require Bearer token authentication. Documents are isolated by API key prefix — one customer cannot read another customer's corpus. If you're sub-tenanting, isolation is further scoped to (api_key, end_user_id) so each of your end-users gets their own corpus under your account.
/verify: set "use_private_corpus": true on your POST /api/v1/verify or POST /api/v1/verify/paragraph request. Combine with "corpus_mode" to control the fall-back behavior.
Returns the customer's documents (newest first), excluding deleted ones. Each item includes the document ID, filename, content type, byte size, character count, chunk count, ingestion status, and timestamps.
Response
{
"docs": [
{
"doc_id": "doc_abc123",
"filename": "policies.pdf",
"content_type": "pdf",
"size_bytes": 184221,
"char_count": 41092,
"chunk_count": 88,
"status": "ready",
"uploaded_at": "2026-05-08T14:22:11Z"
}
],
"count": 1
}
multipart/form-data. Send 1–10 files in the files field. Validation happens synchronously; ingestion (text extraction, chunking, embedding) runs as a background task. Returns 202 Accepted with a list of accepted documents in queued state. Poll GET /api/v1/private-corpus to watch each document move queued → processing → embedding → ready.
Supported file types
pdf, docx, doc, rtf, txt, md, html.
Hard-delete a single document. Returns 404 if the document does not exist or belongs to a different customer (these are reported the same way to prevent existence-leak).
Returns all chunks of a document, ordered by chunk_index. Used by the dashboard "View" modal so customers can inspect what the fact-checker actually indexed (helps spot OCR mistakes, missing sections, weird chunk boundaries).
Wipe every document in the customer's corpus. Requires confirm=true in the form body to prevent accidental wipes. Returns the count of documents removed.
GET returns the current scope description (a one-paragraph summary of what the corpus contains, used to improve corpus-routing accuracy when use_private_corpus: true) along with metadata flags edited_by_user, regenerating, and last_regenerated_at. Returns an empty/idle row for customers who have not yet uploaded any documents.
PUT saves a customer-edited description. Sets edited_by_user=true. The next successful upload or delete will overwrite this edit (per design choice 2026-05-03 — auto-regenerate on doc changes).
Schedules the regeneration as a background task and returns immediately. The dashboard polls GET /api/v1/private-corpus/scope to watch the status flip from regenerating back to idle.
Return the most recent audit log rows for this customer, newest first. Default limit 100, max 1,000. Rows include the action, document ID, chunk count for verify events, source IP, and success/failure.
Claim Object
Represents a single extracted and verified claim.
| Field | Type | Description |
|---|---|---|
| text | string | The extracted claim text as a self-contained, verifiable statement. Note: ProRata rewrites claims with coreference resolution (e.g. "it" becomes "The Great Wall"), so this may differ from the original text. |
| sentence | string | Original sentence from which the claim was extracted. |
| sentence_text | string | Full source sentence text (may differ slightly from sentence in edge cases). |
| start_offset | integer|null | Character start position of the claim region in the source text. |
| end_offset | integer|null | Character end position of the claim region in the source text. |
| sentence_start | integer|null | Character start of the full source sentence. |
| sentence_end | integer|null | Character end of the full source sentence. |
| span_start | integer|null | Sub-span start within the sentence (for multi-claim sentences). |
| span_end | integer|null | Sub-span end within the sentence. |
| claim_type | string | "factual" or "subjective". |
| verdict | string|null | One of: confirmed, disputed, inconclusive, or null if not yet verified. |
| summary | string | One-sentence verdict summary. |
| explanation | string | Detailed explanation of why the claim received this verdict. |
| corrected_text | string|null | Suggested correction for disputed claims (null if confirmed or no correction available). |
| research_query | string | The verification query sent to the research model. |
| categories | array[string] | Claim topic categories (e.g. "science", "politics", "historical"). |
| difficulty | integer | How hard to fact-check (1=trivial, 3=needs sources, 5=highly ambiguous). 0=not classified. |
| verified_by | string | Identifier for the verification path that produced the verdict. Common values include "exa+haiku" (default web verifier), "corpus+haiku" (private corpus verifier), "gemini" (fallback verifier), "Sonnet" (second-pass on disputed claims), and "Sonar" / "sonar_bo3" (legacy paths). Useful for debugging or for the Gemini-fallback caveat noted in Source Blacklist. |
| skipped | boolean | true if the claim was auto-skipped (non-verifiable framing such as idioms, meta-framing, or unintelligible text). Skipped claims have verdict: "skipped" and are not billed. |
| context_flags | object | Framing flags detected by the context analyzer. Common keys: category (string — e.g. "science", "politics"), needs_context (boolean — true if the claim depends on disambiguation), skip_flag (string — "none", "discourse_claim", "idiom", "unintelligible"). Additional internal flags may also appear depending on the pipeline path. |
| unclear_reason | string|null | For inconclusive verdicts, indicates why: "NO_SOURCES", "INSUFFICIENT", "CONFLICTING", or "SUBJECTIVE". Null for non-inconclusive verdicts. |
| sources | array[Source] | Source citations supporting the verdict. |
ExtractedClaim Object
Represents a single extracted claim (no verification data).
| Field | Type | Description |
|---|---|---|
| text | string | The extracted claim text, self-contained and verifiable. |
| sentence | string | Original sentence from which the claim was extracted. |
| start_offset | integer | Character start position in the source text (nullable). |
| end_offset | integer | Character end position in the source text (nullable). |
| filtered | boolean | True if the claim was flagged as granular table data by the filter. |
Highlight Object
Represents a pre-resolved region in the source text where a verified claim was found. Returned in the highlights SSE event after all claims are verified. Overlaps are already resolved (verdict priority: disputed > inconclusive > confirmed). See Claim Positioning and the Integration Guide in the documentation.
| Field | Type | Description |
|---|---|---|
| start | integer | Character start offset in the source text. |
| end | integer | Character end offset in the source text. |
| verdict | string | One of: confirmed, disputed, inconclusive. |
| text | string | Recommended primary locator. The exact words to highlight. Find this in your rendered content with indexOf() — works regardless of rendering differences. See the Integration Guide. |
| answer_span | string | The full original sentence containing the claim. Fallback search key when text can't be found in the local copy. |
| claim_index | integer | Index of the primary claim this region corresponds to. |
| claim_indices | array[integer] | All claim indices covered by this region (when multiple claims share the same span). |
| tooltip | string | Short display text suitable for hover tooltips. |
Highlights Envelope
The highlights SSE event wraps the array of Highlight objects with two fields for divergence detection:
| Field | Type | Description |
|---|---|---|
| highlights | array[Highlight] | Sorted by start position. Non-overlapping. |
| source_length | integer | Character length of the source text at computation time. Compare against your local copy to detect truncation. |
| source_hash | string | MD5 fingerprint (first 12 hex chars) of the source text. If your local hash differs, fall back to text or answer_span for positioning. |
Source Object
Source metadata for a citation. All fields default to empty string (never null).
| Field | Type | Description |
|---|---|---|
| url | string | Source URL. |
| title | string | Page title. |
| domain | string | Domain name (e.g. "reuters.com"). May be empty on responses from the default exa+haiku verifier path — parse the hostname from url as a fallback. |
| publisher | string | Publisher name derived from domain (e.g. "Reuters"). May be empty on the exa+haiku path — same fallback as domain. |
| snippet | string | Relevant excerpt from the source. |
| date | string | Publication date (if available from search results). |
Usage Object
Returned by the /api/v1/verify endpoint.
| Field | Type | Description |
|---|---|---|
| claim_count | integer | Total claims extracted and verified. |
| skipped_claims | integer | Claims auto-skipped by the table filter (not billed). |
| cost_usd | float | Cost for this request in USD. |
Extract Usage Object
Returned by the /api/v1/extract endpoint.
| Field | Type | Description |
|---|---|---|
| claims_extracted | integer | Total claims extracted. |
| claims_filtered | integer | Claims filtered out by the table filter. |
| content_length | integer | Character length of the input content. |
| cost_usd | float | Cost for this request in USD. |
Error Object
| Field | Type | Description |
|---|---|---|
| error.code | string | Machine-readable error code (e.g., rate_limit_exceeded, invalid_api_key). |
| error.message | string | Human-readable error description. |
| error.retry_after | integer | Seconds to wait before retrying (for rate limit errors). |