2026-05-19 · Tutorial

Driving a pentest from Claude Code in 5 minutes

The whole setup

Three commands. In order:

pip install ptai
claude mcp add pentest-ai ptai mcp
claude

The first installs the ptai Python package. The second registers ptai's MCP server with Claude Code under the name pentest-ai. The third opens Claude Code.

That's it. From here, the work happens in Claude Code's conversation, not the shell. You won't be typing ptai commands directly. The MCP server exposes a set of tools, Claude Code calls them on your behalf, and you read the results.

If Claude Code is not already on your laptop, install it from claude.ai/download first. Anthropic API key in ANTHROPIC_API_KEY or via claude login.

One thing worth knowing up front: ptai's MCP server hosts no LLM of its own. Claude Code is the brain. ptai is the toolbox. The model picks tools, fills arguments, reads results, decides what to do next. If you've used Claude Code for anything else, the loop is identical.

Your first engagement

In Claude Code, type something like:

Start a pentest engagement against https://juice-shop.herokuapp.com, scope is web, intensity normal, safety mode on.

ptai's start_engagement tool fires. It creates a local SQLite engagement row under ~/.ptai/engagements/ and returns an engagement ID like eng_a7f3b1c2. Claude Code now has an engagement context. Every subsequent tool call attaches to it, so you don't have to repeat the target or scope.

Next prompt:

Run the standard probe set and report what you find.

The model picks which probes to run, in what order. It can read each result and decide whether to dig deeper. If a SQL injection probe gets a positive at /api/Feedbacks, the model might immediately fire a follow-up time-based probe to confirm before saving the finding. You see all of this happen in the transcript.

Most engagements against a known-vulnerable target finish in two or three minutes of model time. Bigger scopes take longer because there's more surface to walk.

You can also drive the engagement step by step instead of letting the model loose. Tell it to run only XSS probes, or to focus on a single endpoint, or to skip the recon phase because you already know the app. The tool surface is granular enough that you stay in control if you want to.

What the conversation looks like

Abbreviated transcript from a real run against the Juice Shop demo:

You: Start a pentest engagement against juice-shop, scope=web, intensity=normal.

Claude: Calling start_engagement(target='juice-shop', scope='web', intensity='normal')
        Engagement: eng_a7f3b1c2

You: Probe for common vulns, then summarise.

Claude: Calling run_probe(class='xss', target='juice-shop')
        -> 3 reflections at /search, /track-order, /profile
        Calling run_probe(class='sqli', target='juice-shop')
        -> 1 stored injection at /api/Feedbacks
        Calling run_probe(class='jwt', target='juice-shop')
        -> alg=none accepted on protected routes
        ... [12 more tool calls]

        Found 17 issues. Top three by severity:
        1. Stored XSS in profile, exploitable to admin via session theft
        2. SQLi in feedback endpoint, time-based
        3. JWT alg=none accepted on protected routes

        Saving findings to engagement eng_a7f3b1c2.

Real conversations are messier than this. The model second-guesses itself, runs a probe, gets an ambiguous response, backs up, tries a different angle. Sometimes it spends a few turns chasing a false positive before deciding the signal isn't there. That's fine. It's what you'd do at the keyboard.

The thing that took us the longest to get used to was watching the model think. With a CLI you fire a command and wait. Here you can read the model's reasoning as it picks the next tool, see the response come back, watch it revise. The first few engagements feel slow because you're reading every step. After that you skim.

What gets generated

Three artifacts land in ~/.ptai/engagements/<id>/ by the time the run ends.

findings.sarif is a SARIF 2.1.0 file. Upload it to GitHub code scanning and the findings show up in the Security tab with line-level annotations where applicable.

report.md is a human-readable Markdown writeup. One section per finding, severity, evidence, reproduction steps. Drop it in a ticket or paste it into a PR.

evidence/ is a directory of raw request/response pairs, one file per finding. So if someone asks "are you sure about that SQLi?", you have the actual HTTP traffic to show them.

The Claude Code conversation transcript itself is the audit trail. Every tool call, every model decision, every probe response is logged. You can replay it, share it, or paste it into a postmortem.

Common gotchas

Scope rules. If your target rewrites to a different host (say juice-shop.herokuapp.com bounces to an internal CDN), set scope.allowed_hosts explicitly when starting the engagement. The default scope is just the host you started with, which means redirected requests get dropped at the probe layer.

Bearer auth. If you need to test auth-protected endpoints, set up the auth profile before the engagement starts. Tell Claude Code: set_auth_profile type=bearer, token=<jwt>, scheme=Bearer. The token caches for the engagement's lifetime and attaches to every tool call automatically.

Safety mode. intensity='safe' plus respect_rate_limits=true keeps ptai from hammering production targets. Default is normal, which is fine for staging or owned environments. Use aggressive only on your own honeypots.

What to read next

Getting started for the longer setup doc, including authentication profiles and scope rules in detail. github.com/0xSteph/pentest-ai for issues, feature requests, and the source.