Core Concepts

Key ideas that apply across the SDK, CLI, MCP, and REST API. Tool pages link here for shared explanations.

Execution Modes

ClicheFactory supports two execution modes that determine where processing runs:

Local (BYOK)Service
Where it runsYour machineClicheFactory cloud
AuthYour LLM API key (Gemini, OpenAI, etc.)ClicheFactory API key (cliche-...)
Supportsextract, to_markdownAll operations including trained, robust, and training (BYOK)
Dependenciesclichefactory[local] + system depsJust clichefactory (or curl)

In the SDK: factory(mode="local", ...) or factory(api_key="cliche-..."). In the CLI: clichefactory configure --local. In MCP: set CLICHEFACTORY_MODE in env vars.

Extraction Modes

Each extraction call can specify a mode that controls the accuracy/cost tradeoff:

ModePipelineAvailabilityBest For
Default (balanced) Smart document routing + structured extraction Local + Service General use
Fast One-shot extraction (fastest) Local + Service Speed, simple docs
Trained Custom-trained pipeline Service only — BYOK Domain-specific accuracy
Robust Extract + verification pass Service only High-stakes documents

When using a trained artifact with robust mode, both the trained pipeline and verification pass are applied automatically.

See the SDK, CLI, or REST API pages for how to set the mode in each tool.

Processing & OCR

Service Mode

ClicheFactory's cloud service handles document processing automatically — no configuration needed. For PDFs, the server classifies each document and routes it through the optimal pipeline:

PDF TypeHow it's processed
Structured (native text)Text extracted directly — no OCR needed
Scanned / imageOCR applied to recover text from images

Non-PDF files (images, DOCX, etc.) are routed through the appropriate handler automatically. For to_markdown(), you can set conversion_mode to "fast" to skip OCR and send the file directly to a VLM.

Local Mode (BYOK)

In local mode, you can configure the processing strategy via ParsingOptions:

StrategyDescription
DefaultOCR + table structure recognition
EnhancedOCR + per-page visual refinement for complex layouts
OCR Engines (Local Mode)
EngineSystem DependencyDefault
rapidocrNone (pure Python)Yes
tesseracttesseract binary + traineddata files
easyocrNone (PyTorch, downloads models on first use)

OCR engines support multiple languages. See your OCR engine's documentation for available language packs.

Configure via the SDK (ParsingOptions) or CLI (--ocr-engine). MCP and OpenClaw use the defaults from the config file.

Supported File Types

ExtensionDescriptionNotes
.pdfAuto-classifies structured vs. scanned pagesNative text extraction for structured PDFs; layout-aware OCR for scanned content.
.png, .jpg, .jpeg, .webp, .gif, .bmpImage OCRRoutes to the configured OCR engine.
.docxNative Word parsingPreserves structure and tables.
.doc, .odtLegacy document conversionConverts to PDF first. Requires pandoc or soffice.
.xlsxSpreadsheet extractionWith sheet selection.
.csvCSV parsingAuto-detect delimiter and header row.
.emlEmail parsingRFC 2822 with recursive attachment parsing.
.txt, .mdText passthroughWith encoding detection.
PDF
Images
PNG, JPG, WebP, GIF, BMP
DOCX
XLSX / CSV
EML
TXT / MD
DOC / ODT

Schemas

Schemas tell the extractor what fields you want. You can use Pydantic models (SDK) or raw JSON Schema dicts (all tools).

Pydantic Model
from pydantic import BaseModel
from typing import List, Optional

class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: float
    amount: float

class Invoice(BaseModel):
    invoice_number: str
    date: str
    vendor: str
    total: float
    tax: Optional[float] = None
    line_items: List[LineItem]
JSON Schema Dict
{
  "type": "object",
  "properties": {
    "invoice_number": { "type": "string" },
    "date": { "type": "string", "format": "date" },
    "vendor": { "type": "string" },
    "total": { "type": "number" },
    "tax": { "type": "number" },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "integer" },
          "unit_price": { "type": "number" },
          "amount": { "type": "number" }
        }
      }
    }
  }
}

Supported types: string, number, integer, boolean, array, object. You can nest objects and arrays arbitrarily deep. Use Optional (Pydantic) or omit from required (JSON Schema) for nullable fields.

BYOK (Bring Your Own Key)

BYOK lets you use your own LLM API key for the inference portion of the pipeline. ClicheFactory handles document parsing, OCR, schema enforcement, and orchestration — you provide the model.

How it works: Configure your LLM key in the SDK, CLI, or MCP config. The parsing pipeline runs as usual, but the LLM call is routed through your key. In service mode with BYOK, a reduced platform fee applies for infrastructure (parsing, orchestration, schema enforcement).

Supported Providers
ProviderModel Format
Google Geminigemini/gemini-3-flash-preview
OpenAIopenai/gpt-4o
Anthropicanthropic/claude-sonnet-4-20250514
Ollamaollama/llama3.2
from clichefactory import factory, Endpoint

client = factory(
    mode="local",
    model=Endpoint(
        provider_model="openai/gpt-4o",
        api_key="sk-your-openai-key"
    )
)