Skillcheck runs a forced-injection A/B test: your runner model solves generated tasks with and without the skill, grades blind, and returns an effect in percentage points with a confidence interval. Sign in, get an API key, and check any skill from the CLI.
Continue with Google or GitHub. We create your account and issue a Skillcheck API key tied to it.
Your key includes 10 free skillcheck check runs. The model provider key stays on our server — never in your terminal.
Paste two commands, point the CLI at your key, and measure any SKILL.md, AGENTS.md, CLAUDE.md, or .cursorrules.
Skillcheck produces numbers you can cite. Every run is reproducible, every score has a confidence interval, and every skill gets a verdict.
We inject your skill as a system prompt and run the same tasks twice — with and without the skill. The delta is your effect in percentage points.
A separate grader model scores outputs without knowing which arm produced them. No self-evaluation bias, no cherry-picking.
We resample 1,000 times to build a 95% confidence interval. If it does not overlap zero, the skill helps.
Re-run the same skill on new model releases. If the verdict flips from helps to placebo, you know the skill rotted.
Every result commits the skill hash, task suite, model version, and config. Anyone can re-run and verify the number.
We count the extra tokens the skill costs and compute value-per-1k-tokens. A big effect that burns context is not always a win.
Drop any supported skill file and Skillcheck normalises it into a common shape for evaluation.
The Anthropic skill-creator format. Extracts instructions, domain, and bundled assets into a normalised skill object.
The multi-agent format with role definitions and routing rules. Each role is evaluated independently.
The Claude-specific project context format. Injected as system context during the runner phase.
Cursor IDE rules files. Parsed as instruction sets and evaluated for code-generation and reasoning tasks.
Point the CLI at a raw GitHub URL or gist. Skillcheck fetches, normalises, and evaluates without a local clone.
Pass a folder path and Skillcheck discovers all supported skill files inside, evaluating each one.
Price shown is an example — set your own in Stripe. Upgrade lives in your dashboard after sign-in.
Sign in with Google or GitHub and get your API key in seconds.