CLI

Command-line interface for extraction, document conversion, and diagnostics.

Installation

# pip
pip install clichefactory

# uv
uv add clichefactory

This installs the clichefactory command globally. For local mode (BYOK), install with pip install clichefactory[local] or uv add clichefactory[local].

Configuration

Run the interactive setup wizard to save your credentials:

# Service mode (ClicheFactory API key)
clichefactory configure

# Local mode (BYOK — bring your own LLM key)
clichefactory configure --local

Config is saved to ~/.clichefactory/config.toml. You can also pass credentials per-command via flags or environment variables. See Execution Modes for local vs. service.

Config File Format
# ~/.clichefactory/config.toml
default_mode = "service"  # or "local"

[service]
api_key = "cf_..."
base_url = ""  # optional custom endpoint

[local]
model = "openai/gpt-4o"  # provider/model format
api_key = "sk-..."
ocr_model = ""  # optional separate OCR model
ocr_api_key = ""  # optional, falls back to local.api_key

The [local] fields serve as BYOK model override in service mode and as the primary model in local mode.

Precedence: CLI flags > environment variables > config file > defaults.

Commands

clichefactory extract

Extract structured data from a document using a JSON schema.

# Basic extraction
clichefactory extract invoice.pdf --schema schema.json

# Fast mode
clichefactory extract invoice.pdf --schema schema.json --extraction-mode fast

# Trained pipeline
clichefactory extract invoice.pdf --schema schema.json --artifact-id art_abc123

# Save output to file
clichefactory extract invoice.pdf --schema schema.json -o result.json
FlagDescription
--schemaPath to a JSON Schema file describing the fields to extract. Required unless --artifact-id is provided.
--extraction-modeExtraction mode: fast, robust. Default: balanced. See Extraction Modes.
--artifact-idTrained pipeline artifact ID. Auto-resolves to trained mode.
-o / --outputWrite result JSON to a file instead of stdout.
--modeExecution mode: service or local. Overrides config file.
--api-keyClicheFactory API key. Overrides config file.
--base-urlAPI base URL. For custom deployments.
--modelLLM model override (e.g., gemini/gemini-3-flash-preview).
--model-api-keyAPI key for the model override.
--ocr-modelSeparate model for OCR/VLM tasks.
--ocr-api-keyAPI key for the OCR model.
--ocr-engineOCR engine: rapidocr, tesseract, easyocr. See Processing & OCR.
--langOCR language codes (e.g., slv+eng). For Tesseract-based OCR.

clichefactory extract-batch

Extract from multiple files concurrently.

clichefactory extract-batch *.pdf --schema schema.json --max-concurrency 10 -o results.json

Accepts the same flags as extract plus --max-concurrency (default 5).

clichefactory to-markdown

Convert a document to markdown text.

# Default mode (full OCR pipeline)
clichefactory to-markdown document.pdf

# Fast mode (VLM-only, service mode)
clichefactory to-markdown scan.pdf --conversion-mode fast --mode service

# Save output
clichefactory to-markdown document.pdf -o output.md
FlagDescription
--conversion-modedefault (full OCR pipeline) or fast (VLM-only, no OCR). Service mode only. See Processing & OCR.
-o / --outputWrite markdown to a file instead of stdout.

clichefactory to-markdown-batch

Convert multiple documents concurrently.

clichefactory to-markdown-batch docs/*.pdf -o output_dir/ --max-concurrency 5

Accepts the same flags as to-markdown plus --max-concurrency (default 5).

clichefactory doctor

Check your configuration, installed dependencies, and system binaries.

clichefactory doctor
# Example output
Config: ~/.clichefactory/config.toml ✓
  mode: service
  api_key: cliche-...redacted
Dependencies:
  clichefactory: 0.4.2 ✓
  pydantic: 2.9.2 ✓
System binaries:
  tesseract: not found (optional)
  pandoc: 3.1.12 ✓
  soffice: not found (optional)

Run this first when something isn't working. See Troubleshooting for common fixes.