Core Concepts
Key ideas that apply across the SDK, CLI, MCP, and REST API. Tool pages link here for shared explanations.
Execution Modes
ClicheFactory supports two execution modes that determine where processing runs:
| Local (BYOK) | Service | |
|---|---|---|
| Where it runs | Your machine | ClicheFactory cloud |
| Auth | Your LLM API key (Gemini, OpenAI, etc.) | ClicheFactory API key (cliche-...) |
| Supports | extract, to_markdown | All operations including trained, robust, and training (BYOK) |
| Dependencies | clichefactory[local] + system deps | Just clichefactory (or curl) |
In the SDK: factory(mode="local", ...) or factory(api_key="cliche-..."). In the CLI: clichefactory configure --local. In MCP: set CLICHEFACTORY_MODE in env vars.
Extraction Modes
Each extraction call can specify a mode that controls the accuracy/cost tradeoff:
| Mode | Pipeline | Availability | Best For |
|---|---|---|---|
| Default (balanced) | Smart document routing + structured extraction | Local + Service | General use |
| Fast | One-shot extraction (fastest) | Local + Service | Speed, simple docs |
| Trained | Custom-trained pipeline | Service only — BYOK | Domain-specific accuracy |
| Robust | Extract + verification pass | Service only | High-stakes documents |
When using a trained artifact with robust mode, both the trained pipeline and verification pass are applied automatically.
See the SDK, CLI, or REST API pages for how to set the mode in each tool.
Processing & OCR
Service Mode
ClicheFactory's cloud service handles document processing automatically — no configuration needed. For PDFs, the server classifies each document and routes it through the optimal pipeline:
| PDF Type | How it's processed |
|---|---|
| Structured (native text) | Text extracted directly — no OCR needed |
| Scanned / image | OCR applied to recover text from images |
Non-PDF files (images, DOCX, etc.) are routed through the appropriate handler automatically. For to_markdown(), you can set conversion_mode to "fast" to skip OCR and send the file directly to a VLM.
Local Mode (BYOK)
In local mode, you can configure the processing strategy via ParsingOptions:
| Strategy | Description |
|---|---|
| Default | OCR + table structure recognition |
| Enhanced | OCR + per-page visual refinement for complex layouts |
OCR Engines (Local Mode)
| Engine | System Dependency | Default |
|---|---|---|
rapidocr | None (pure Python) | Yes |
tesseract | tesseract binary + traineddata files | — |
easyocr | None (PyTorch, downloads models on first use) | — |
OCR engines support multiple languages. See your OCR engine's documentation for available language packs.
Configure via the SDK (ParsingOptions) or CLI (--ocr-engine). MCP and OpenClaw use the defaults from the config file.
Supported File Types
| Extension | Description | Notes |
|---|---|---|
.pdf | Auto-classifies structured vs. scanned pages | Native text extraction for structured PDFs; layout-aware OCR for scanned content. |
.png, .jpg, .jpeg, .webp, .gif, .bmp | Image OCR | Routes to the configured OCR engine. |
.docx | Native Word parsing | Preserves structure and tables. |
.doc, .odt | Legacy document conversion | Converts to PDF first. Requires pandoc or soffice. |
.xlsx | Spreadsheet extraction | With sheet selection. |
.csv | CSV parsing | Auto-detect delimiter and header row. |
.eml | Email parsing | RFC 2822 with recursive attachment parsing. |
.txt, .md | Text passthrough | With encoding detection. |
Schemas
Schemas tell the extractor what fields you want. You can use Pydantic models (SDK) or raw JSON Schema dicts (all tools).
Pydantic Model
from typing import List, Optional
class LineItem(BaseModel):
description: str
quantity: int
unit_price: float
amount: float
class Invoice(BaseModel):
invoice_number: str
date: str
vendor: str
total: float
tax: Optional[float] = None
line_items: List[LineItem]
JSON Schema Dict
"type": "object",
"properties": {
"invoice_number": { "type": "string" },
"date": { "type": "string", "format": "date" },
"vendor": { "type": "string" },
"total": { "type": "number" },
"tax": { "type": "number" },
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"quantity": { "type": "integer" },
"unit_price": { "type": "number" },
"amount": { "type": "number" }
}
}
}
}
}
Supported types: string, number, integer, boolean, array, object. You can nest objects and arrays arbitrarily deep. Use Optional (Pydantic) or omit from required (JSON Schema) for nullable fields.
BYOK (Bring Your Own Key)
BYOK lets you use your own LLM API key for the inference portion of the pipeline. ClicheFactory handles document parsing, OCR, schema enforcement, and orchestration — you provide the model.
How it works: Configure your LLM key in the SDK, CLI, or MCP config. The parsing pipeline runs as usual, but the LLM call is routed through your key. In service mode with BYOK, a reduced platform fee applies for infrastructure (parsing, orchestration, schema enforcement).
Supported Providers
| Provider | Model Format |
|---|---|
| Google Gemini | gemini/gemini-3-flash-preview |
| OpenAI | openai/gpt-4o |
| Anthropic | anthropic/claude-sonnet-4-20250514 |
| Ollama | ollama/llama3.2 |
client = factory(
mode="local",
model=Endpoint(
provider_model="openai/gpt-4o",
api_key="sk-your-openai-key"
)
)