If you've been running Claude Code on anything larger than a side project, you've already felt the token drain. The reviews are good. The suggestions are useful. But the bill doesn't quite match the value — because Claude Code isn't just reading the files that matter. It's reading everything it can find.
In March 2026, a developer named Tirth Kanani published a tool that fixes this at the architectural level. It's called code-review-graph, it went GitHub Trending within days, and its benchmark headline — 49x fewer tokens on a Next.js monorepo — triggered enough scepticism and curiosity in equal measure that it's worth a thorough look.
This is that look. We'll cover the core problem, how the tool works technically, what the benchmarks actually say (including where they break down), a full installation walkthrough, an honest critique, the alternative tools in this space, and a verdict on who should actually use it.
At an India-based digital agency, we run Claude Code daily across multi-service codebases — Shopify themes, NestJS APIs, Flutter apps, n8n workflows. Token efficiency isn't academic for us. This review is informed by that operational context.
The Problem: Why Claude Code Burns Tokens You Didn't Ask For
To understand why code-review-graph exists, you need to understand a fundamental constraint of how AI coding assistants work today.
Claude Code has no persistent memory of your codebase between sessions. Every time you start a task — whether it's a review, a feature implementation, or a bug fix — Claude starts from scratch. It has no idea which files are related to your change, which functions call the code you just modified, or which tests cover the area you're working in.
So it does the only thing it can: it reads. Broadly. Generously. Sometimes excessively.
The developer who built code-review-graph described hitting this problem on a FastAPI project with nearly 3,000 files. He'd modified a single API endpoint. Claude started reading the middleware, the database models, the authentication utilities, the configuration files — files that had nothing to do with the change. By the time it finished, a review that should have cost around 800 tokens had consumed 5,500.
That 6.9x overspend isn't a Claude problem per se. It's a context problem. Without a structural map of the codebase, Claude can't distinguish between a file that's directly in the blast radius of your change and a file that happens to live in the same directory.
The naive solution — just tell Claude which files to read — doesn't scale. On a FastAPI project, you might know. On a Next.js monorepo with 27,000 files and cross-package dependencies, you don't. And even if you did, manually specifying file context on every task defeats the purpose of an AI assistant.
Code-review-graph solves this by giving Claude a map before it starts reading. A structural map, built from your codebase's AST, stored locally, and queried through MCP whenever Claude needs to understand what a change actually touches.
What code-review-graph Actually Is (Technical Architecture)
The tool has four conceptual layers: parse, store, trace, and serve. Understanding each one clarifies both its power and its limitations.
Layer 1: Parse — Building the AST with Tree-sitter
Tree-sitter is a parser generator tool that builds concrete syntax trees from source code. Unlike regular expression-based parsing (brittle, error-prone) or language server protocol parsing (heavyweight, requires language-specific servers), Tree-sitter operates on grammar files for each language and produces fast, reliable ASTs even for partially-valid code.
When you run code-review-graph build, the tool walks your entire codebase and runs every source file through the appropriate Tree-sitter grammar. From each file's AST, it extracts five types of structural nodes:
- Files — source code files as top-level containers
- Functions / Methods — all callable units extracted by name and signature
- Classes / Structs — object definitions and their inheritance relationships
- Imports / Exports — dependency declarations that connect files to each other
- Tests — test functions and their association to the code they cover
The current version supports 19 languages: Python, TypeScript, JavaScript, Go, Rust, Java, C#, Ruby, Kotlin, Swift, PHP, C/C++, Vue SFC, Solidity, Dart, R, Perl, Lua, and Jupyter/Databricks notebooks. Each language has its own node type mappings — class_definition for Python, class_declaration for Java, struct_item for Rust.
Layer 2: Store — SQLite Graph Database
Extracted nodes and their relationships are stored as a graph in a local SQLite database. This is a deliberate architectural choice: SQLite requires no external dependencies, runs entirely on your machine, starts instantly, and the data never leaves your environment.
The graph has two entity types:
- Nodes: each function, class, file, import, and test is a node with metadata (name, type, file path, line range)
- Edges: relationships between nodes — "A calls B", "X imports Y", "TestZ covers FunctionW", "ClassA extends ClassB"
The database also supports full-text search via SQLite's FTS5 extension, and optional vector embeddings for semantic similarity queries (useful for finding conceptually related code even when there's no direct import relationship).
Crucially, the graph updates incrementally. After the initial build, running code-review-graph update re-parses only the files that have changed since the last build. On a large codebase, incremental updates complete in under two seconds.
Layer 3: Trace — Blast Radius Analysis
This is the core algorithm that makes the token savings possible.
When you ask Claude Code to review a change, code-review-graph intercepts the context-gathering phase and runs a breadth-first search (BFS) through the graph starting from the changed files. It traces every edge outward: if you changed auth/middleware.py, it finds everything that imports auth/middleware.py, everything that calls functions defined there, every test that covers those functions, and every class that inherits from classes in that file.
This "blast radius" analysis produces a precise set of files that are actually relevant to the change, rather than a broad sweep of everything in the vicinity.
The BFS has configurable depth limits. At depth 1, you get direct callers and importers. At depth 2, you get their callers. Deeper traversal catches more indirect dependencies but produces a larger context set — there's a diminishing returns curve that the tool's defaults are tuned to.
The blast radius analysis has one important property: it has perfect recall. In benchmark testing across 6 real open-source repositories, it never missed an actually impacted file. It sometimes over-predicts (flagging files that weren't actually affected), but that's a conservative trade-off. Better to give Claude slightly too much relevant context than to miss a broken dependency entirely.
Layer 4: Serve — MCP Integration
The graph is exposed to Claude Code (and other supported tools) via the Model Context Protocol (MCP). MCP is Anthropic's open standard for connecting AI models to external tools and data sources — it's essentially a structured API that Claude Code understands natively.
Once installed, code-review-graph runs as a local MCP server. When Claude Code needs to gather context for a task, instead of reading files directly, it first queries the graph: "what files are relevant to this change?" The graph responds with the precise set of files in the blast radius, along with structural metadata (dependency chains, test coverage gaps, call relationships). Claude then reads only those files.
The tool exposes several MCP tools to Claude: blast radius analysis, impact queries, architecture overview, dead code detection, refactoring preview, and test coverage gaps. Claude can invoke these automatically as part of its context-gathering, or you can trigger specific queries manually.
The Benchmark Numbers: What's Real and What's Cherry-Picked
Free Download: AI Automation ROI Calculator
Plug in your numbers and see exactly what automation saves you. Based on real project data from our client engagements.
The headline numbers are impressive. Let's examine them honestly.
Code Review Benchmarks
The evaluation was run against 6 real open-source repositories, testing 13 commits total. For each commit, the tool measured token consumption using the naive approach (Claude reads all related files) versus the graph-assisted approach (Claude queries the graph first, reads only the blast radius).
Across the benchmark set, the average token reduction was 8.2x (naive vs. graph). That's a meaningful number — it means the average review cost dropped from, say, 8,200 tokens to 1,000 tokens.
But the range matters as much as the average. Some repositories saw dramatic improvements:
- FastAPI: 3.7x reduction (138,585 → 37,217 tokens)
- httpx: 4.6x reduction (64,666 → 14,090 tokens, 58 files skipped)
- Next.js monorepo: 49.1x reduction (739,352 → 15,049 tokens, ~16,000 files excluded)
And one repository did not benefit:
- Express.js: less than 1x (the graph context exceeded the raw file size for single-file changes in this small package)
The Express.js result is the honest edge case. For small packages where a single-file change in a simple codebase is the norm, the graph overhead — the metadata, the edge data, the review guidance — can actually be more tokens than just reading the file directly. The tool's author documents this openly, which is a good sign.
The 49x Number in Context
The Next.js result — 49x reduction — is real, but it represents a near-ideal scenario: a 27,732-file monorepo with complex cross-package dependencies where a change in one package touches relatively few files. In that scenario, the graph can exclude 27,700+ files from the context entirely, and the savings are extraordinary.
Most codebases aren't Next.js. Most changes touch more files than a single commit in a large monorepo. The 8.2x average is a more representative benchmark for typical use.
But 8.2x is still significant. If you're spending $50/month on Claude Code token usage, an 8x reduction saves $43.75/month — $525/year. On larger team usage, those numbers compound fast.
Review Quality: The Surprising Finding
Token savings would be a pyrrhic victory if review quality degraded. The benchmarks measured this too.
Graph-assisted reviews scored 8.8 out of 10 on a structured evaluation rubric, versus 7.2 out of 10 for naive reviews. The quality improvement was attributed to the signal-to-noise ratio: when Claude reads 20,000 tokens of irrelevant code, the actual change gets buried. When it reads 2,000 tokens of precisely relevant code, it can focus.
This finding aligns with general research on LLM performance with long contexts: models tend to anchor on information at the beginning and end of a context window, with accuracy degrading for content in the middle. A shorter, more relevant context window isn't just cheaper — it's often more accurate.
Installation and Setup: Step-by-Step Walkthrough
Here's the complete installation process. The tool supports Claude Code, Cursor, Windsurf, Zed, Continue, OpenCode, and Antigravity — the install command auto-detects which tools you have.
Prerequisites
- Python 3.9+ (check with
python3 --version) - pip or pipx installed
- Claude Code or another supported MCP client
- A git-tracked codebase (the tool uses git to detect changes for incremental updates)
Step 1: Install the Package
You have two installation options:
# Option A: pip (installs globally)
pip install code-review-graph
# Option B: pipx (isolated environment, recommended)
pipx install code-review-graph
# Option C: uvx (fastest, no permanent install)
uvx code-review-graph install
The pipx approach is recommended if you're installing multiple Python CLI tools, as it avoids dependency conflicts. uvx (from the uv package manager) is the fastest option if you already use uv.
Step 2: Configure Your MCP Client
# Auto-detect all supported tools and configure them
code-review-graph install
# Or configure only Claude Code specifically
code-review-graph install --platform claude-code
This command writes the MCP configuration to the appropriate location for your tool (e.g., ~/.claude/mcp_settings.json for Claude Code) and injects graph-aware instructions into your platform's rules file (e.g., CLAUDE.md). The auto-detection handles whether you installed via uvx or pip/pipx and generates the correct config format for each.
Important: Restart your editor or Claude Code after this step. The MCP server won't be active until you restart.
Step 3: Build the Initial Graph
# Navigate to your project root first
cd /your/project
# Build the graph (parses all files)
code-review-graph build
Initial build time depends on codebase size. For a 500-file TypeScript project, expect 10–30 seconds. For a 5,000-file monorepo, expect 2–5 minutes. The graph is stored in a .code-review-graph/ directory at your project root.
You can check what was indexed:
code-review-graph status
This shows graph statistics: total nodes, edges, languages detected, last build time, and file count.
Step 4: Enable Watch Mode (Optional but Recommended)
# Keep the graph up to date as you work
code-review-graph watch
Watch mode monitors your filesystem for changes and runs incremental updates automatically. This ensures the graph stays current without manual intervention. On most systems, incremental updates complete in under two seconds, so the overhead is negligible.
If you prefer not to run watch mode, you can update manually after significant changes:
code-review-graph update
Step 5: Verify the Integration
Open Claude Code and run /mcp to check that the code-review-graph server appears in your connected MCP servers list. If it does, the integration is active.
To test it, make a small change to a file in your project and ask Claude Code to review the change. In the tool call trace, you should see graph queries being made before Claude reads any files directly.
Excluding Files and Directories
Create a .code-review-graphignore file in your project root to exclude paths from indexing:
generated/**
*.generated.ts
vendor/**
node_modules/**
dist/**
.next/**
This follows the same syntax as .gitignore. Excluding generated files and build artifacts is important — they inflate the graph with nodes that have no meaningful relationships to your source code.
Multi-Repository Support
For setups where you work across multiple repositories, the tool supports a registry:
# Register additional repos
code-review-graph register /path/to/other/repo
# List all registered repos
code-review-graph repos
The MCP server can then serve context across all registered repositories, which is useful for microservice architectures where a change in one service has dependencies in another.
Advanced Features Worth Knowing
Risk-Scored Change Analysis
code-review-graph detect-changes
This command analyses your uncommitted changes and scores each changed file by risk level — factoring in the number of dependents, test coverage gaps, and whether the changed functions are called from critical paths. High-risk changes get flagged before you even ask Claude for a review.
Dead Code Detection
The graph's relationship mapping makes dead code detection straightforward: any node with no incoming edges (no callers, no importers, no test coverage) is a candidate for removal. Run this periodically on mature codebases to surface functions and classes that have drifted out of use.
Refactoring Preview
code-review-graph rename preview --from OldClassName --to NewClassName
Before running a rename refactor, this shows you every file that will be affected and flags any edge cases (like dynamic string references that won't be caught by static analysis). Useful before large-scale renames in codebases where IDE refactoring tools have blind spots.
Architecture Overview
code-review-graph visualize
Generates an interactive visualisation of your codebase's module structure using community detection (the Leiden algorithm) to identify natural clusters. Useful for onboarding new contributors or identifying architectural drift where modules have become more tightly coupled than they should be.
Wiki Generation
code-review-graph wiki
Generates a markdown wiki of your codebase structure — every module, its public API, its dependencies, and its test coverage. This is useful for documentation-light codebases where you need a quick orientation guide.
Honest Limitations and Known Issues
No tool is universally good. Here's where code-review-graph falls short.
Single-File / Small Codebase Problem
As the Express.js benchmark showed, for small packages and single-file changes, the graph overhead can exceed the cost of just reading the file directly. If your typical workflow involves small, isolated files with few dependencies, the tool may not save tokens. It may cost more.
The rule of thumb: if your codebase is under ~200 files and changes are typically isolated to single files, benchmark before committing. The tool pays off on multi-file changes and larger codebases.
Static Analysis Blind Spots
Tree-sitter is a static parser. It sees what's in your source files at parse time. It cannot see:
- Dynamic imports:
require(someVariable)orimport(buildPath(module))— the dependency isn't visible at parse time - Reflection-based calls: Django's signal framework, Python's
getattr()dispatch, Java reflection - Runtime-generated code: eval, code generation pipelines, template-generated files
- Cross-language boundaries: a Python service calling a TypeScript API (the relationship exists at runtime, not at the AST level)
For codebases that rely heavily on these patterns — certain Django projects, heavily metaprogrammed Ruby, dynamic JavaScript modules — the blast radius analysis may under-predict impact and miss relevant files.
Installation Edge Cases
Shortly after the tool's public release, users reported setup issues using the Claude Plugin Marketplace method. This is expected for a tool at v1.x — the plugin marketplace integration has more moving parts than the direct pip install approach. If you encounter issues, the manual install (pip + manual MCP config) is more reliable than the marketplace flow at time of writing.
Stale Graph Risk
If you're not running watch mode, the graph can drift out of sync with your codebase. Claude will be querying stale relationship data. In the best case, it reads a few extra files. In the worst case, it misses a recently-added dependency. Watch mode or a pre-task code-review-graph update mitigates this.
Monorepo with Cross-Package TypeScript
TypeScript path aliases (@/components/..., ~/lib/...) require correct tsconfig resolution to be mapped to actual file paths. The tool ships with a tsconfig_resolver.py that handles common patterns, but complex monorepo setups with multiple tsconfig files and non-standard alias patterns may require manual configuration.
Alternatives: The Broader Ecosystem
Code-review-graph isn't the only tool solving this problem. Here's how the landscape looks as of April 2026.
Claudette
A Go rewrite of code-review-graph built by Nicolas Martignole. Key differences: single binary (no Python dependency), faster startup, simpler deployment. Trades some of the original's flexibility for a leaner profile. Best suited for medium-sized Go, TypeScript, Python, or JavaScript projects. If you're allergic to Python dependencies in your development environment, Claudette is worth evaluating.
Setup is straightforward: go install github.com/nicmarti/claudette@latest and then claudette install.
better-code-review-graph
A fork of the original that fixes several known bugs: the multi-word search was broken in the original (using literal substring matching instead of AND-logic word splitting), and caller/callee resolution returned empty results for bare function names without qualified prefixes. It also adds paginated output (the original could produce unbounded 500K+ character responses for large codebases) and dual-mode embeddings (ONNX local or cloud). If you've encountered issues with the original, this fork addresses the most common pain points.
Serena
Serena takes a different architectural approach: instead of Tree-sitter, it uses the Language Server Protocol (LSP) — the same protocol your IDE uses for "go to definition" and "find references". This gives deeper semantic precision: type resolution, polymorphism awareness, cross-module inference. The trade-off is heavier setup (requires a running language server per language) and slower initial indexing.
When to prefer Serena: large multi-language projects where semantic precision matters — complex refactoring across inheritance hierarchies, type-dependent impact analysis, projects where dynamic dispatch is common. When to prefer code-review-graph: speed, simplicity, and when structural graph analysis is sufficient.
code-graph-rag
Adds a retrieval-augmented generation layer on top of the structural graph — vector search over code semantics, enabling natural language queries like "find functions similar to this one" or "what handles authentication in this codebase". More powerful for exploration and discovery use cases. More complex to set up and operate. If your use case goes beyond impact analysis into active codebase exploration, this is worth evaluating.
Native IDE Context Features
Cursor, Windsurf, and VS Code with Continue all have their own context-gathering logic that partially addresses this problem. They use a combination of open files, recent files, LSP references, and sometimes embeddings-based retrieval. They don't give you the same explicit blast-radius analysis that code-review-graph provides, but they're zero-setup. For many use cases, the native context features are sufficient and you don't need a separate tool.
The Token Cost Problem in Broader Context
Code-review-graph is one solution to what is fundamentally a context efficiency problem. It's worth understanding the full landscape of approaches before committing to any single tool.
Prompt-Level Optimisation
A tool called claude-token-efficient (a CLAUDE.md drop-in) demonstrated that controlling Claude Code's verbosity can reduce output tokens by 30–63% on output-heavy workflows. The trade-off: the CLAUDE.md file itself adds input tokens on every message, so it only pays off when your output volume is high enough to offset the recurring cost.
Another approach — dubbed "caveman mode" — prompts Claude to respond in compressed, grammar-stripped language. Real token measurements show 22–87% savings across prompts. A March 2026 paper found that brevity constraints on large models actually improved accuracy by 26 percentage points on certain benchmarks, reversing the common assumption that more verbose responses are more accurate.
These prompt-level approaches are complementary to code-review-graph, not competing. You can run both: the graph reduces the input context, and output constraints reduce the response length.
Anthropic's Official Code Review Feature
Claude Code has a built-in Code Review feature for GitHub PRs that costs $15–25 per review on average, billed separately from your plan's included usage. This is a hosted, managed solution — no setup, but no control over cost optimisation either. For teams with high PR volume and tight budgets, community tools like code-review-graph give you the efficiency lever that the official feature doesn't.
Session Architecture
How you structure Claude Code sessions matters as much as any tool you install. Long sessions accumulate context that includes earlier turns of conversation — context that's often irrelevant to your current task. Starting fresh sessions for distinct tasks, and using compact summaries instead of full conversation history for handoffs, can reduce token consumption substantially without any external tooling.
When Should You Actually Install This?
Here's an honest decision framework.
Install it if:
- Your codebase is 500+ files
- You frequently make multi-file changes with cross-module dependencies
- You're spending meaningfully on Claude Code tokens (more than ~$20/month)
- You work with large monorepos, microservices, or cross-package TypeScript
- You want better review quality in addition to token savings (the 8.8 vs 7.2 quality benchmark matters here)
- You're comfortable with a Python dependency and a local MCP server running in the background
Skip it (for now) if:
- Your codebase is under ~200 files and changes are typically isolated to single files
- You use heavily dynamic patterns (reflection, runtime code generation, dynamic imports) that static analysis can't see
- You're on a team that hasn't standardised on Claude Code yet — adding setup complexity before the core workflow is established adds friction without commensurate value
- You want a zero-maintenance solution — the graph needs to be kept in sync, either via watch mode or manual updates
Evaluate it if:
- You're in the 200–500 file range — benchmark your specific codebase before committing
- You use a mix of static and dynamic patterns — test on a representative sample of commits
- You're considering it for a client project — the setup overhead may not justify the savings on shorter engagements
Putting It Into Practice: Our Recommended Workflow
Based on the benchmarks and the tool's architecture, here's a practical workflow for teams adopting code-review-graph.
Phase 1: Baseline (Week 1)
Before installing the tool, track your Claude Code token consumption for one week. Note the codebases you're working on, the typical nature of changes (single file vs. multi-file), and your average cost per session. This gives you a baseline to measure against.
Phase 2: Installation and Initial Build (Day 1)
Install via pipx. Run code-review-graph install --platform claude-code. Build the graph on your primary codebase. Configure your .code-review-graphignore to exclude build artifacts and generated files. Enable watch mode. Restart Claude Code and verify the MCP connection.
Phase 3: Controlled Testing (Week 2)
Run your normal workflow for a week with the tool active. Don't change your task patterns — use Claude Code the same way you normally would. At the end of the week, compare token consumption against your Week 1 baseline.
Also compare review quality: were the graph-assisted reviews more focused? Did Claude miss anything important, or was it more precisely on-target?
Phase 4: Optimisation (Week 3+)
If the token savings are positive, tune the configuration: adjust BFS depth, refine your ignore file, evaluate whether watch mode is necessary or if manual updates are sufficient. If you're working across multiple repositories, register them all and evaluate the cross-repo impact analysis.
The Bigger Picture: Why This Tool Matters
Code-review-graph represents a category of tooling that will become increasingly important as AI coding assistants mature: context infrastructure.
Right now, the dominant mental model for AI coding tools is "give the AI as much context as possible and let it figure out what's relevant." That model works when codebases are small and context windows are cheap. It breaks down — in cost, in quality, in latency — as codebases grow and as teams integrate AI deeper into their development workflows.
The alternative model — give the AI precisely the context it needs, structured in a way that amplifies rather than dilutes signal — is harder to build but better in every measurable dimension. The 8.8 vs 7.2 quality benchmark isn't a side effect of token savings. It's the mechanism: less noise, more signal, better output.
Tree-sitter-based structural graphs are one implementation of this model. LSP-based approaches (Serena) are another. RAG-based retrieval is a third. The common thread is that each of these approaches replaces "read everything" with "read what matters."
As model context windows continue to expand (Claude's is already 200K tokens), you might assume that context efficiency becomes less important. The opposite is likely true: larger context windows enable more complex tasks, which involve more files, which means the noise problem scales with the capability. The infrastructure to manage context precisely will matter more, not less, as AI coding tools become more powerful.
Free Download: AI Automation ROI Calculator
Plug in your numbers and see exactly what automation saves you. Based on real project data from our client engagements.
Frequently Asked Questions
Written by

Founder & CEO
Rishabh Sethia is the founder and CEO of Innovatrix Infotech, a Kolkata-based digital engineering agency. He leads a team that delivers web development, mobile apps, Shopify stores, and AI automation for startups and SMBs across India and beyond.
Connect on LinkedIn