What we are building
A command line tool that takes a GitHub PR number, fetches the diff, splits it by file, and dispatches one Claude sub-agent per file. Each sub-agent gets the team style guide preloaded as its system prompt and a single instruction: review this file's changes, return structured findings. The orchestrator collects the per-file reviews and prints a single consolidated comment ready to paste into the PR.
The interesting choice here is fan-out. The naive approach is one big prompt with the whole diff and the whole style guide stuffed in. That works for tiny PRs and fails on anything real. The Agent SDK gives you AgentDefinition and the built-in Agent tool, which lets the orchestrator spawn isolated sub-agents that each get their own context window. A 30-file PR becomes 30 small reviews running in parallel, each one focused, none of them stepping on each other's context.
At the end of this tutorial you will have a pr-reviewer CLI that runs npx pr-reviewer 1234 and prints a Markdown review you can paste into the PR.
Prerequisites
You will need: Node 20 or higher, an Anthropic API key (export ANTHROPIC_API_KEY=...), the GitHub CLI (gh) authenticated against your repos, and a GitHub repo where you can read a PR. You should be comfortable reading TypeScript and have used npm before. You do not need prior Agent SDK experience, but you should know what a system prompt is and roughly how Claude tool use works.
The tutorial uses model alias sonnet. Each per-file sub-agent costs roughly 2k-8k input tokens and 1k output tokens, so a 30-file PR review runs around $0.10-0.30 at Sonnet pricing. If you want to dry run cheaply, swap the model alias to haiku in the orchestrator. Cost drops by roughly 5x with a quality hit you can live with for first passes.
Setup
Create a fresh project and pin the SDK version so this tutorial keeps working in three months when the SDK has moved on.
mkdir pr-reviewer && cd pr-reviewer
npm init -y
npm install @anthropic-ai/[email protected] typescript @types/node tsx
npx tsc --init --target es2022 --module nodenext --moduleResolution nodenext --strict
mkdir src agents
Add a smoke test before writing any real code. This confirms the SDK is wired up and your API key is valid.
cat > src/smoke.ts <<'EOF'
import { query } from "@anthropic-ai/claude-agent-sdk";
const result = await query({
prompt: "Reply with exactly the word: ok",
options: { model: "sonnet", maxTurns: 1 },
});
for await (const msg of result) {
if (msg.type === "assistant") console.log(msg.message.content);
}
EOF
npx tsx src/smoke.ts
You should see [{ type: 'text', text: 'ok' }] (or similar) in the output. If you do not, fix that before moving on. A broken SDK setup hidden behind a four-step build is a debugging nightmare.
Step 1: Write the style guide the sub-agent will enforce
The sub-agent is only as useful as the style guide you give it. Write the real one your team actually wants enforced. Vague guides ("write clean code") produce vague reviews. Specific guides ("no any types in new TypeScript code, no console.log in committed code, every public function needs a JSDoc comment") produce reviews you can act on.
cat > style-guide.md <<'EOF'
# Team Style Guide
## TypeScript
- No `any` types in new code. Use `unknown` plus a type guard, or define the actual type.
- No `console.log` in committed code. Use the project logger (`import { log } from "./logger"`).
- Every exported function needs a JSDoc comment with at least one example.
## Errors
- Throw subclasses of `AppError`, never bare `Error`.
- Wrap async boundaries in try/catch that logs context (request ID, user ID if available).
## Tests
- Every new exported function needs at least one unit test in the matching `*.test.ts` file.
- Tests must use `describe`/`it`, not `test()`.
## What is NOT a violation
- Style nits that the formatter handles (Prettier runs in CI).
- TODO comments referencing a ticket.
- Code copied unchanged from a third-party library inside `vendor/`.
EOF
The "what is NOT a violation" section matters a lot. Without it, the sub-agent flags Prettier-handled whitespace and you train your team to ignore the bot.
Step 2: Define the file-reviewer sub-agent
Create agents/file-reviewer.ts. This is the AgentDefinition that gets passed into the orchestrator. The Agent SDK reference lists every field; the ones that matter for us are description, prompt, tools, maxTurns, and model.
import { readFileSync } from "node:fs";
import type { AgentDefinition } from "@anthropic-ai/claude-agent-sdk";
const STYLE_GUIDE = readFileSync("./style-guide.md", "utf8");
export const fileReviewer: AgentDefinition = {
description:
"Review a single file's diff against the team style guide. Returns structured findings.",
prompt: `You are a senior engineer reviewing one file in a pull request against the team style guide below. You will receive the file path and the unified diff for that file as your only input.
Style guide:
${STYLE_GUIDE}
Output rules:
- Return ONLY a JSON array of findings. No prose, no markdown fences, no preamble.
- Each finding is { "line": number, "severity": "blocker"|"warning"|"nit", "rule": "<short rule name from the guide>", "message": "<one sentence, actionable>" }.
- If the diff has no findings, return [].
- Do NOT flag anything the "What is NOT a violation" section excludes.
- Line numbers refer to the new file (lines starting with + in the diff), not the diff itself.`,
tools: [],
maxTurns: 1,
model: "sonnet",
};
Three things to notice. First, tools: [] is empty on purpose. This sub-agent only reasons over text; giving it Bash or FileRead would let it wander off into the codebase, which we do not want. Second, maxTurns: 1 means one round trip - no agentic loop. The sub-agent reads the prompt, returns findings, done. Third, the prompt forces JSON output. We will validate this in the orchestrator and reject malformed responses.
Step 3: Write the orchestrator
The orchestrator does three things: fetch the PR diff, split it by file, spawn one sub-agent per file via query(). Create src/orchestrator.ts.
import { execSync } from "node:child_process";
import { query } from "@anthropic-ai/claude-agent-sdk";
import { fileReviewer } from "../agents/file-reviewer.js";
type Finding = { line: number; severity: "blocker"|"warning"|"nit"; rule: string; message: string };
type FileReview = { path: string; findings: Finding[] };
function fetchDiff(prNumber: number): string {
return execSync(`gh pr diff ${prNumber}`, { encoding: "utf8" });
}
function splitDiffByFile(diff: string): Array<{ path: string; hunk: string }> {
const files: Array<{ path: string; hunk: string }> = [];
const chunks = diff.split(/^diff --git /m).slice(1);
for (const chunk of chunks) {
const pathMatch = chunk.match(/^a\/(\S+) b\/\S+/);
if (!pathMatch) continue;
files.push({ path: pathMatch[1], hunk: "diff --git " + chunk });
}
return files;
}
async function reviewOneFile(path: string, hunk: string): Promise<FileReview> {
const result = query({
prompt: `File: ${path}\n\nDiff:\n${hunk}`,
options: {
agents: { "file-reviewer": fileReviewer },
systemPrompt: "Invoke the file-reviewer agent with the diff. Return its JSON output verbatim.",
maxTurns: 3,
},
});
let raw = "";
for await (const msg of result) {
if (msg.type === "assistant") {
for (const block of msg.message.content) {
if (block.type === "text") raw += block.text;
}
}
}
try {
const findings = JSON.parse(raw.trim()) as Finding[];
return { path, findings };
} catch {
return { path, findings: [] };
}
}
export async function reviewPr(prNumber: number): Promise<FileReview[]> {
const diff = fetchDiff(prNumber);
const files = splitDiffByFile(diff);
return Promise.all(files.map(f => reviewOneFile(f.path, f.hunk)));
}
The Promise.all is what makes this fast. Each sub-agent runs in its own context window with its own API request, so 30 files review in roughly the time of one file. The SDK handles the parallelism for you; you just have to use Promise.all instead of an awaited loop.
Step 4: Format the consolidated review
A wall of JSON is not a PR comment. Render it as Markdown that a human can read at a glance, grouped by severity. Create src/format.ts.
import type { FileReview } from "./orchestrator.js";
export function formatReview(reviews: FileReview[]): string {
const all = reviews.flatMap(r => r.findings.map(f => ({ ...f, path: r.path })));
if (all.length === 0) return "Reviewer agent: no findings against the style guide. Looks good.";
const byLevel: Record<string, typeof all> = { blocker: [], warning: [], nit: [] };
for (const f of all) byLevel[f.severity].push(f);
const sections: string[] = ["## Style guide review\n"];
for (const level of ["blocker", "warning", "nit"] as const) {
if (byLevel[level].length === 0) continue;
const emoji = { blocker: "[BLOCKER]", warning: "[WARN]", nit: "[NIT]" }[level];
sections.push(`### ${emoji} ${byLevel[level].length} ${level}${byLevel[level].length === 1 ? "" : "s"}\n`);
for (const f of byLevel[level]) {
sections.push(`- \`${f.path}:${f.line}\` (${f.rule}) - ${f.message}`);
}
sections.push("");
}
return sections.join("\n");
}
Severity ordering matters. Engineers scan top to bottom; blockers must be first. Putting nits at the top trains the reader to skim past everything.
Step 5: Wire up the CLI
Create src/cli.ts as the entry point and add a bin field to package.json.
#!/usr/bin/env node
import { reviewPr } from "./orchestrator.js";
import { formatReview } from "./format.js";
const prNumber = Number(process.argv[2]);
if (!prNumber) {
console.error("usage: pr-reviewer <pr-number>");
process.exit(1);
}
const reviews = await reviewPr(prNumber);
console.log(formatReview(reviews));
In package.json:
{
"type": "module",
"bin": { "pr-reviewer": "./dist/cli.js" },
"scripts": {
"build": "tsc",
"review": "tsx src/cli.ts"
}
}
Run it locally with npm run review -- 1234 against any PR in a repo you have gh access to.
Verify it works
Pick a real PR in one of your repos. Ideally one with 3-5 changed files so you see fan-out in action.
export ANTHROPIC_API_KEY=sk-ant-...
gh auth status # confirm gh is authenticated
npm run review -- 1234
Expected output shape:
## Style guide review
### [BLOCKER] 1 blocker
- `src/api/handler.ts:42` (no-any) - The new `payload: any` parameter should be typed. Define a `Payload` interface or use `unknown` with a runtime guard.
### [WARN] 2 warnings
- `src/api/handler.ts:67` (no-console-log) - Replace `console.log("payload received", payload)` with `log.info(...)` from the project logger.
- `src/api/handler.test.ts:12` - Test uses `test(...)`. Convert to `describe`/`it`.
### [NIT] 1 nit
- `src/api/handler.ts:8` (jsdoc-required) - Exported `handle()` is missing a JSDoc comment with an example.
If you see findings that match the actual diff content, the build worked. If you see Reviewer agent: no findings on a PR that obviously violates the style guide, your style guide is too vague - go re-read Step 1.
When it breaks
The four failure modes you will hit:
Error: ANTHROPIC_API_KEY is not set - Export the env var in the same shell where you run the CLI. If you put it in ~/.zshrc, open a new terminal. The Agent SDK reads it via process.env, so it must be set when the Node process starts.
The sub-agent returns prose instead of JSON. The try/catch in reviewOneFile swallows this and returns no findings. Add console.error(raw) inside the catch to see what came back. The fix is almost always to add stricter "JSON only, no markdown, no preamble" wording to the sub-agent prompt. Models drift toward chatty over time; the prompt has to push back.
The orchestrator hangs on huge PRs. Promise.all on 200 sub-agents will hit your account's rate limit and stall. Replace Promise.all with a small concurrency-limited helper (3-5 in flight at once) for any PR over 20 files. The p-limit package on npm is the obvious one.
Findings reference line numbers that do not exist. The sub-agent is reading the diff and trying to map back to source lines. If your diff has lots of context lines, the model sometimes counts them as added lines. The fix is to preprocess the diff and strip context lines before sending it, keeping only + lines with their real line numbers prepended. Three lines of code in splitDiffByFile solves it.
Reviews are too noisy. The sub-agent prompt is the lever. If you get nits flagged as warnings, sharpen the severity definitions in the prompt: tell it explicitly what counts as a blocker ("breaks existing tests, introduces security risk, violates a no- rule"), what counts as a warning ("makes future maintenance harder"), what counts as a nit ("cosmetic"). Vague severity wording produces inconsistent reviews.
Where to take it next
Three obvious extensions, easiest to hardest.
Post the review as a PR comment automatically. Replace the console.log in cli.ts with gh pr comment ${prNumber} --body-file -. Pipe the formatted Markdown into the GitHub CLI and the bot speaks directly in the PR.
Run it on every PR via GitHub Actions. Wrap the CLI in a workflow that triggers on pull_request events, runs the reviewer, and posts the comment. Cache the style guide via the actions/checkout step and you have an always-on bot for the cost of one Sonnet call per PR.
Give the sub-agent access to the repo via the FileRead tool. Currently each sub-agent only sees the diff. Adding tools: ["FileRead"] and unconstraining maxTurns: 5 lets it look at surrounding code (where is this function called from, what does this imported helper actually do) before flagging. Costs roughly 3x more in tokens and produces noticeably smarter reviews. Worth it for high-stakes repos, overkill for most.

