Troubleshooting

`clichefactory doctor`

The doctor command checks your configuration, dependencies, and system binaries. Run it first when something isn't working.

clichefactory doctor

                from clichefactory import factory

                client = factory(api_key="your-key")

                client.doctor()

                # In Cursor or Claude Desktop, call the doctor tool.

                # The MCP server exposes it as a tool with no parameters.

Example Output

        Config: ~/.clichefactory/config.toml ✓

          mode: service

          api_key: cliche-...redacted

        Dependencies:

          clichefactory: 0.4.2 ✓

          pydantic: 2.9.2 ✓

          docling: 2.31.0 ✓

        System binaries:

          tesseract: not found (optional)

          pandoc: 3.1.12 ✓

          soffice: not found (optional)

Common Errors

`401 Unauthorized` — Invalid API key

Cause: The API key is missing, incorrect, or has been revoked.

Fix: Check that your key starts with cliche- and matches the one in your account settings. Ensure it's set in the right config file or environment variable.

`403 Forbidden` — Insufficient credits

Cause: Your credit balance is too low for the requested operation.

Fix: Check your balance and top up credits. Each extraction mode has a different per-page cost.

`400 Bad Request` — Invalid schema

Cause: The JSON schema is malformed or contains unsupported types.

Fix: Validate your schema at jsonschema.dev or check the Schemas reference. Ensure the JSON string is properly escaped if passing via CLI or curl.

`File not found`

Cause: The file path passed to the SDK, CLI, or MCP tool doesn't exist.

Fix: Use an absolute path. For MCP, the AI assistant must pass the full file path — relative paths may resolve against the wrong working directory.

Empty or garbled extraction results

Cause: Poor OCR quality from low-resolution scans, complex table layouts, or handwriting.

Fix: Try the enhanced parser for complex documents (local mode). Use robust extraction mode for a verification pass. For scanned documents, ensure the original is at least 200 DPI.

BYOK: model API errors

Cause: Your LLM API key is invalid, rate-limited, or the model is unavailable.

Fix: Verify your model API key and check the provider's status page. Run doctor to validate the configuration. See BYOK for supported providers.

MCP: tools not appearing

Cause: The MCP server config is incorrect, or clichefactory-mcp isn't installed.

Fix: Verify the JSON config is valid and the command/args are correct. Restart the IDE after changes. See MCP Setup.

System Dependencies

Some features require system binaries. These are optional — the default OCR engine (rapidocr) and default parser work without any system deps.

Tesseract (optional — for `tesseract` OCR engine)

            # macOS

            brew install tesseract

            # Ubuntu / Debian

            sudo apt install tesseract-ocr

            # Windows (via scoop)

            scoop install tesseract

OCR Language Packs (optional — for Tesseract non-English OCR)

Tesseract needs traineddata files for each language. English is included by default.

            # macOS — install all languages

            brew install tesseract-lang

            # Ubuntu — install a specific language pack

            sudo apt install tesseract-ocr-<lang>

OCR engines support multiple languages. See your OCR engine's documentation for available language packs.

Pandoc (optional — for `.doc` and `.odt` files)

            # macOS

            brew install pandoc

            # Ubuntu / Debian

            sudo apt install pandoc

LibreOffice (optional — fallback for `.doc` and `.odt`)

            # macOS

            brew install --cask libreoffice

            # Ubuntu / Debian

            sudo apt install libreoffice

Used as a fallback if Pandoc is not available. Provides the soffice binary for document conversion.

Performance Tips

Goal	Recommendation
Fastest extraction	Use `fast` mode for one-shot extraction, skipping the full parsing step.
Best accuracy	Use `robust` mode for a verification pass on high-stakes documents.
Complex scans	Use the enhanced parser (local mode) for documents with complex tables, mixed layouts, or handwriting.
High volume	Use `extract_batch` / `extract-batch` with `max_concurrency` tuned to your rate limits.
Non-English docs	Use Tesseract with the appropriate language pack. See your OCR engine's docs for available packs.
Repeated formats	Train a model (BYOK) — trained pipelines are faster and more accurate than generic extraction on recurring document types.

Getting Help

If you're stuck:

Run the doctor command and include its output when reporting issues.
Check the Core Concepts page to verify your configuration.
Contact us at info@clichefactory.com.