Skillcheck — does your agent skill actually help?

How it works

Three steps to a verdict

Sign in

Continue with Google or GitHub. We create your account and issue a Skillcheck API key tied to it.

Get 10 free runs

Your key includes 10 free skillcheck check runs. The model provider key stays on our server — never in your terminal.

Run skillcheck

Paste two commands, point the CLI at your key, and measure any SKILL.md, AGENTS.md, CLAUDE.md, or .cursorrules.

What you get

Evidence, not anecdotes

Skillcheck produces numbers you can cite. Every run is reproducible, every score has a confidence interval, and every skill gets a verdict.

Forced-injection A/B

We inject your skill as a system prompt and run the same tasks twice — with and without the skill. The delta is your effect in percentage points.

Blind grading

A separate grader model scores outputs without knowing which arm produced them. No self-evaluation bias, no cherry-picking.

Bootstrap CI

We resample 1,000 times to build a 95% confidence interval. If it does not overlap zero, the skill helps.

Rot detection

Re-run the same skill on new model releases. If the verdict flips from helps to placebo, you know the skill rotted.

Reproducible

Every result commits the skill hash, task suite, model version, and config. Anyone can re-run and verify the number.

Token-aware

We count the extra tokens the skill costs and compute value-per-1k-tokens. A big effect that burns context is not always a win.

Supported formats

One command for every skill file

Drop any supported skill file and Skillcheck normalises it into a common shape for evaluation.

SKILL.md

The Anthropic skill-creator format. Extracts instructions, domain, and bundled assets into a normalised skill object.

AGENTS.md

The multi-agent format with role definitions and routing rules. Each role is evaluated independently.

CLAUDE.md

The Claude-specific project context format. Injected as system context during the runner phase.

.cursorrules

Cursor IDE rules files. Parsed as instruction sets and evaluated for code-generation and reasoning tasks.

URL

Remote URL

Point the CLI at a raw GitHub URL or gist. Skillcheck fetches, normalises, and evaluates without a local clone.

FOLDER

Directory scan

Pass a folder path and Skillcheck discovers all supported skill files inside, evaluating each one.

Pricing

Start free. Upgrade when it pays off.

Free

10 Skillcheck runs included
Full CLI: check, eval, verify
Blind grading + bootstrap CI
Community support

Pro

$19 / one-time

Everything in Free
Unlimited Skillcheck runs
Corpus & rot reporting
Priority model capacity

Price shown is an example — set your own in Stripe. Upgrade lives in your dashboard after sign-in.

Does your skill actually help?