Methodology¶

Who this page is for: a skeptic who wants to know where each rule comes from.

What you will learn¶

The papers and operational sources behind every rule.
The known limits of what mcpolish can prove.
Why the rule IDs are stable forever.

Background¶

Every mcpolish rule is grounded in either a published paper, a documented attack, or a clear operational pattern. This page lists each source so you can read the original work and judge for yourself whether the rule should apply to your project.

Primary sources¶

Source	Citation
Wang et al., "From Docs to Descriptions: Smell-Aware Evaluation of MCP Server Descriptions"	arXiv:2602.18914, February 2026. Analysed 10,831 public MCP servers. Catalogued 18 description smells across 4 dimensions. Ran a controlled mutation experiment showing each smell causes wrong-tool selection regressions, p<0.001.
Li et al., "Don't believe everything you read: Understanding and Measuring MCP Behavior under Misleading Tool Descriptions"	arXiv:2602.03580, February 2026. Static analysis of 10,240 MCP servers. Quantifies parameter mismatches and undocumented side effects.
Anthropic tool-use guidance	Internal evaluation. Cited in Wang 2026 section 6 discussion as the source for the "1 to 3 short paragraphs" description length recommendation.
Invariant Labs tool-poisoning advisories	May 2025. Documented zero-width character injection and operator-style instructions in tool descriptions in the wild. Snyk acquired Invariant Labs in June 2025.
MCP-Scan (Invariant Labs / Snyk)	github.com/invariantlabs-ai/mcp-scan. Attack catalogue for malicious tool descriptions.

Rule provenance table¶

Rule	Source	Key finding
MP001	Wang section 4.2	An empty description is 52 percentage points worse in head-to-head selection.
MP002	Wang section 4.4	1,285 of 10,831 servers had at least one undocumented parameter.
MP003	Wang Table 4	3,093 of 10,831 servers shipped no return-value documentation.
MP004	MCP spec	JSON Schema 2020-12 requires explicit `required` when any property is mandatory.
MP005	MCP spec	The root of `inputSchema` must be a JSON Schema 2020-12 object.
MP010	Wang section 4.3	"Low-information name" smell increases wrong-tool selection by 8.8 percentage points.
MP011	Operational	MCP hosts namespace tools by server name. Repeating the namespace inside the tool name wastes context tokens.
MP012	Wang section 4.3	Inconsistent verbs in the same server degrade in-server discrimination.
MP013	Wang section 3.1	73 percent of analysed servers shared at least one tool name with another server.
MP014	Operational	Mixing snake_case and camelCase within one server breaks predictability for the agent.
MP020	Wang section 4.2	Descriptions under 50 characters are 3.4 times more likely to cause wrong-tool selection.
MP021	Anthropic guidance	Beyond about 1500 characters, descriptions burn context tokens without measurable accuracy gain.
MP022	Wang section 4.4	Missing example is a top-3 description smell by effect size.
MP023	Wang section 4.5	"Passive" descriptions without trigger language cause 6.8 percentage point regressions.
MP024	Operational	Acronym-dense descriptions fail outside the team that wrote them.
MP025	Wang section 4.5	Marketing qualifiers ("simply", "powerful") add no signal. Cited as a description smell.
MP026	Wang aggregate	An LLM judge variant of the above; catches cases the static heuristics miss.
MP030	Li section 3	3,449 of 10,240 servers had a parameter type that disagreed with its description.
MP031	Li section 3	LLM-judged variant of MP030. Detects mismatches the keyword heuristic misses.
MP032	Li section 4	1,326 servers performed undocumented mutating operations.
MP033	Wang section 4.3	Duplicate descriptions force the agent to disambiguate on names alone.
MP040	Invariant Labs, May 2025	Tool-poisoning via zero-width and bidi control characters in descriptions.
MP041	MCP-Scan attack catalogue	Operator-style instructions ("ignore previous", chat-template tokens) embedded in tool descriptions.

Why this matters¶

Wang et al. ran a head-to-head selection experiment: agents picked between a clean tool and a smelly tool with the same name. Clean descriptions won 72 percent of the time. Smelly descriptions won 20 percent. The 52-percentage-point gap is the headline number that motivates the whole project.

Each rule in mcpolish targets at least one of the smells the paper isolated, or one of the attack patterns Invariant Labs catalogued.

Limits¶

What mcpolish does not check¶

It does not run your function body. The behaviour of the code is invisible.
It does not follow imports across packages. Tools registered by a third party are not visible.
It does not validate semantic correctness of the description against your actual implementation. An MP032-style LLM judge approximates this for one specific case (read-vs-write mismatch).

Heuristic vs. judge¶

Several rules are heuristic (regex or token-based). They have false positives. The three LLM-gated rules cost real money but catch cases the heuristics miss. mcpolish is honest about which is which on each rule's detail page.

Cross-server snapshot freshness¶

The bundled snapshot is refreshed quarterly in OSS, more frequently in SaaS. A tool name that has just become a collision today will not be flagged until the next refresh. This is a known limit.

Stable IDs¶

Once an MPxxx rule ID ships, it never changes. You can put ignore = ["MP025"] in your config and trust it to keep working. The default severity may change between versions, but the meaning of the ID does not.