Core Concepts — ClicheFactory Developer

Execution Modes

ClicheFactory supports two execution modes that determine where processing runs:

	Local (BYOK)	Service
Where it runs	Your machine	ClicheFactory cloud
Auth	Your LLM API key (Gemini, OpenAI, etc.)	ClicheFactory API key (`cliche-...`)
Supports	`extract`, `to_markdown`	All operations including trained, robust, and training (BYOK)
Dependencies	`clichefactory[local]` + system deps	Just `clichefactory` (or curl)

In the SDK: factory(mode="local", ...) or factory(api_key="cliche-..."). In the CLI: clichefactory configure --local. In MCP: set CLICHEFACTORY_MODE in env vars.

Extraction Modes

Each extraction call can specify a mode that controls the accuracy/cost tradeoff:

Mode	Pipeline	Availability	Best For
Default (balanced)	Smart document routing + structured extraction	Local + Service	General use
Fast	One-shot extraction (fastest)	Local + Service	Speed, simple docs
Trained	Custom-trained pipeline	Service only — BYOK	Domain-specific accuracy
Robust	Extract + verification pass	Service only	High-stakes documents

When using a trained artifact with robust mode, both the trained pipeline and verification pass are applied automatically.

See the SDK, CLI, or REST API pages for how to set the mode in each tool.

Processing & OCR

Service Mode

ClicheFactory's cloud service handles document processing automatically — no configuration needed. For PDFs, the server classifies each document and routes it through the optimal pipeline:

PDF Type	How it's processed
Structured (native text)	Text extracted directly — no OCR needed
Scanned / image	OCR applied to recover text from images

Non-PDF files (images, DOCX, etc.) are routed through the appropriate handler automatically. For to_markdown(), you can set conversion_mode to "fast" to skip OCR and send the file directly to a VLM.

Local Mode (BYOK)

In local mode, you can configure the processing strategy via ParsingOptions:

Strategy	Description
Default	OCR + table structure recognition
Enhanced	OCR + per-page visual refinement for complex layouts

OCR Engines (Local Mode)

Engine	System Dependency	Default
`rapidocr`	None (pure Python)	Yes
`tesseract`	`tesseract` binary + traineddata files	—
`easyocr`	None (PyTorch, downloads models on first use)	—

OCR engines support multiple languages. See your OCR engine's documentation for available language packs.

Configure via the SDK (ParsingOptions) or CLI (--ocr-engine). MCP and OpenClaw use the defaults from the config file.

Supported File Types

Extension	Description	Notes
`.pdf`	Auto-classifies structured vs. scanned pages	Native text extraction for structured PDFs; layout-aware OCR for scanned content.
`.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`	Image OCR	Routes to the configured OCR engine.
`.docx`	Native Word parsing	Preserves structure and tables.
`.doc`, `.odt`	Legacy document conversion	Converts to PDF first. Requires `pandoc` or `soffice`.
`.xlsx`	Spreadsheet extraction	With sheet selection.
`.csv`	CSV parsing	Auto-detect delimiter and header row.
`.eml`	Email parsing	RFC 2822 with recursive attachment parsing.
`.txt`, `.md`	Text passthrough	With encoding detection.

PDF

Images

PNG, JPG, WebP, GIF, BMP

DOCX

XLSX / CSV

EML

TXT / MD

DOC / ODT

Schemas

Schemas tell the extractor what fields you want. You can use Pydantic models (SDK) or raw JSON Schema dicts (all tools).

Pydantic Model

        from pydantic import BaseModel

        from typing import List, Optional

        class LineItem(BaseModel):

            description: str

            quantity: int

            unit_price: float

            amount: float

        class Invoice(BaseModel):

            invoice_number: str

            date: str

            vendor: str

            total: float

            tax: Optional[float] = None

            line_items: List[LineItem]

JSON Schema Dict

        {

          "type": "object",

          "properties": {

            "invoice_number": { "type": "string" },

            "date": { "type": "string", "format": "date" },

            "vendor": { "type": "string" },

            "total": { "type": "number" },

            "tax": { "type": "number" },

            "line_items": {

              "type": "array",

              "items": {

                "type": "object",

                "properties": {

                  "description": { "type": "string" },

                  "quantity": { "type": "integer" },

                  "unit_price": { "type": "number" },

                  "amount": { "type": "number" }

                }

              }

            }

          }

        }

Supported types: string, number, integer, boolean, array, object. You can nest objects and arrays arbitrarily deep. Use Optional (Pydantic) or omit from required (JSON Schema) for nullable fields.

BYOK (Bring Your Own Key)

BYOK lets you use your own LLM API key for the inference portion of the pipeline. ClicheFactory handles document parsing, OCR, schema enforcement, and orchestration — you provide the model.

How it works: Configure your LLM key in the SDK, CLI, or MCP config. The parsing pipeline runs as usual, but the LLM call is routed through your key. In service mode with BYOK, a reduced platform fee applies for infrastructure (parsing, orchestration, schema enforcement).

Supported Providers

Provider	Model Format
Google Gemini	`gemini/gemini-3-flash-preview`
OpenAI	`openai/gpt-4o`
Anthropic	`anthropic/claude-sonnet-4-20250514`
Ollama	`ollama/llama3.2`

        from clichefactory import factory, Endpoint

        client = factory(

            mode="local",

            model=Endpoint(

                provider_model="openai/gpt-4o",

                api_key="sk-your-openai-key"

            )

        )