LLM-gated rules¶
Who this page is for: someone deciding whether to enable the three optional rules that use an LLM judge.
What you will learn¶
- Which three rules need an LLM.
- How to enable them.
- How the cache keeps cost down.
- The privacy implications of sending descriptions to a provider.
Background¶
mcpolish has 20 rules that are pure static analysis: deterministic, offline, sub-second. Three more rules need an LLM to judge whether a description is ambiguous, whether a parameter description matches its name and type, and whether a tool with a mutating verb is hiding side effects.
These three rules are opt-in. They never run unless you pass --llm.
The three rules¶
| ID | Name | What it asks the LLM |
|---|---|---|
| MP026 | ambiguous-description | "Can a competent agent tell what this tool does from the description alone?" |
| MP031 | param-meaning-mismatch | "Do the parameter name, type, and description agree on what the agent should pass?" |
| MP032 | undocumented-side-effect | "The name suggests a mutation. Does the description acknowledge it?" |
The full prompts live in src/mcpolish/rules/. Read them before sending anything sensitive.
Step by step¶
1. Install the LLM extras¶
This adds openai and anthropic to the install.
2. Set your API key¶
For OpenAI:
For Anthropic:
For Ollama, no key is needed; run an Ollama daemon locally.
3. Run with --llm¶
mcpolish lint . --llm openai:gpt-4o
mcpolish lint . --llm anthropic:claude-opus-4-7
mcpolish lint . --llm ollama:llama3.1
The format is provider:model. The model must be one your provider exposes.
How the cache works¶
LLM calls are slow and metered. mcpolish caches every verdict in ~/.cache/mcpolish/llm.db, a small SQLite database.
The cache key is sha256(rule_id, model_id, prompt). The same description with the same prompt and the same model is a cache hit the second time you run mcpolish on that tool. Cache entries expire after 30 days.
On a typical 8-tool server:
- First run with
--llm: about 10 seconds (eight tools, three rules each, with retries). - Second run with the same code: under a second (every call hits the cache).
You can ship the cache file in CI as an artifact. A team's CI run picks up the cache and skips repeat calls.
What gets sent to the provider¶
For each LLM-gated rule, mcpolish sends:
- The rule prompt (a few hundred tokens).
- The tool name.
- The full description.
- The parameter name, type, and description (only for MP031).
What does not get sent:
- Your function body.
- Other tools' descriptions.
- Your file path.
- Your namespace or server name (unless it appears in the description text).
If your descriptions contain secrets or proprietary information, do not enable --llm unless you trust the provider.
Provider behaviour on failure¶
If the LLM call fails (network, rate limit, bad key), the client logs a warning and returns "OK" for that rule. The lint continues. The result is that an LLM-gated rule may silently pass when the network is flaky.
If you need stricter behaviour, watch the log output:
Cost guidance¶
Single rule, single tool, gpt-4o, English description of about 100 tokens: under one cent. Mistakes are cheap.
A full repo scan with --llm:
- 50 tools, 3 LLM rules, cold cache: about 150 LLM calls.
- Each call: about 200 prompt tokens, 50 response tokens.
- Cost on gpt-4o: roughly 25 cents per full scan, then near zero on cache hits.
Adjust your provider and model to your cost target. Ollama is free if you can spare the local GPU time.
Common variations¶
Per-project default LLM¶
Set it in pyproject.toml:
A bare --llm flag (no value) reads this. An explicit --llm provider:model overrides it.
Local-only judging with Ollama¶
You need the Ollama daemon listening at http://localhost:11434 (or override with OLLAMA_HOST).
Disabling LLM rules for one run¶
Just omit --llm. The three rules are silent.
Troubleshooting¶
OPENAI_API_KEY not set. The key has to be in the environment of the process running mcpolish. export OPENAI_API_KEY=... then re-run.
Cache miss when I expected a hit. Caches are keyed by the prompt text. If your description changed even by a single space, that is a new cache key. To force a cache rebuild, delete the file: rm ~/.cache/mcpolish/llm.db.
LLM rule fires inconsistently. LLM judges are stochastic. Use a low-temperature deterministic model like gpt-4o-mini. mcpolish sets temperature to 0 already, but providers may not honour it perfectly.