Building Your Own Agentic Coding Tool
I decided to build my own Cursor. How hard could it be? I thought. It's just an LLM with some tools. Three weeks later I had written 4,000 lines of harness code, debugged 47 infinite loops, and spent $340 on API calls testing edge cases. The actual AI integration? 50 lines. The system prompt? 200 lines. Everything else? Infrastructure to stop the AI from deleting my .git folder when I asked it to 'clean up the project.' I mass-applied for jobs instead.
Table of Contents
- What You’re Actually Building
- The Architecture Nobody Tells You About
- Code Indexing (Or: How It Knows Your Codebase)
- The System Prompt That Makes It Work
- The Harness (The Part That Actually Matters)
- The Two-Agent Pattern
- How Multi-Context Window Work Actually Happens
- Building It: The Pseudo-Code Version
- The Problems You’ll Hit
- What Makes This Hard
What You’re Actually Building
Let’s be clear about what an “agentic coding tool” like Claude Code actually is: it’s an agent that can read your codebase, understand it, make changes, test those changes, and continue working across multiple sessions without losing context.
Strip away the marketing and you’re building:
- A code indexer (so the AI knows what code exists)
- A context manager (so the AI knows what’s relevant)
- A tool harness (so the AI can read/write files, run commands, etc.)
- A multi-session system (so work can span hours or days)
- A testing framework (so the AI knows if it broke something)
The hard part isn’t the AI. The AI (Claude, GPT-4, whatever) already knows how to code. The hard part is:
- Making it understand YOUR codebase specifically
- Making it work incrementally instead of trying to rewrite everything
- Making it continue working across multiple sessions without forgetting context
- Making it test its own work instead of confidently shipping bugs
This is infrastructure work disguised as AI work.
The Architecture Nobody Tells You About
Here’s what a tool like Claude Code actually looks like under the hood:
┌─────────────────────────────────────────────────┐
│ User Input │
│ "Add user authentication" │
└──────────────────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Agent Harness │
│ ┌──────────────────────────────────────────┐ │
│ │ Initializer Agent (first run only) │ │
│ │ - Set up environment │ │
│ │ - Create feature list │ │
│ │ - Initialize git repo │ │
│ │ - Write progress files │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Coding Agent (every session) │ │
│ │ 1. Read context (git logs, progress) │ │
│ │ 2. Choose one task from feature list │ │
│ │ 3. Implement it │ │
│ │ 4. Test it │ │
│ │ 5. Commit + document │ │
│ └──────────────────────────────────────────┘ │
└──────────────────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Tools │
│ - read_file(path) │
│ - write_file(path, content) │
│ - list_files(directory) │
│ - run_command(cmd) │
│ - git operations │
│ - search_code(query) [uses index] │
│ - run_tests() │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Code Index │
│ - AST (Abstract Syntax Tree) of all files │
│ - Symbol table (functions, classes, imports) │
│ - Dependency graph │
│ - Embeddings for semantic search │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Your Actual Codebase │
│ /src, /tests, /config, etc. │
└─────────────────────────────────────────────────┘
The flow:
- User asks for something
- Harness decides if this is first run (initializer) or continuation (coding agent)
- Agent reads context from files (git logs, progress notes, feature list)
- Agent uses tools to explore code, make changes, test
- Agent writes results to files for next session
- Repeat
The key insight: The agent isn’t just making changes. It’s maintaining its own memory through files. Git commits are memory. Progress notes are memory. The feature list is memory.
Code Indexing (Or: How It Knows Your Codebase)
The AI doesn’t read every file in your project every time. That would blow the context window instantly. Instead, you build an index.
What Gets Indexed
1. File Structure
fileStructure = {
"/src": {
"/components": ["Button.jsx", "Input.jsx", "Modal.jsx"],
"/services": ["api.js", "auth.js"],
"/utils": ["validation.js", "formatting.js"]
},
"/tests": {
"/unit": ["Button.test.js", "api.test.js"]
}
}
2. Symbol Table (Functions, Classes, Exports)
symbolTable = {
"src/services/auth.js": {
exports: ["login", "logout", "verifyToken"],
functions: [
{
name: "login",
params: ["email", "password"],
line: 15,
description: "Authenticates user and returns JWT token"
}
]
}
}
3. Dependencies
dependencies = {
"src/components/LoginForm.jsx": {
imports: ["src/services/auth.js", "src/components/Input.jsx"],
exports: ["LoginForm"]
}
}
4. Semantic Embeddings For each file/function, generate an embedding (vector representation). This lets you do semantic search:
// User asks: "Where is the password hashing logic?"
// Search embeddings, find: src/services/auth.js::hashPassword
How to Build the Index (Pseudo-code)
FUNCTION indexCodebase(rootDir):
index = {
files: {},
symbols: {},
dependencies: {}
}
// Walk all files
files = getAllFiles(rootDir, extensions=['.js', '.jsx', '.ts', '.tsx'])
FOR EACH filePath IN files:
code = readFile(filePath)
// Parse to AST (Abstract Syntax Tree)
TRY:
ast = parseCode(code, language=detectLanguage(filePath))
fileInfo = {
path: filePath,
functions: [],
classes: [],
imports: [],
exports: []
}
// Extract symbols using AST traversal
traverseAST(ast):
ON FunctionDeclaration(node):
fileInfo.functions.append({
name: node.name,
params: node.parameters,
line: node.lineNumber
})
ON ClassDeclaration(node):
fileInfo.classes.append({
name: node.name,
line: node.lineNumber
})
ON ImportDeclaration(node):
fileInfo.imports.append(node.source)
ON ExportDeclaration(node):
fileInfo.exports.append(node.name)
index.files[filePath] = fileInfo
CATCH ParseError:
log("Failed to parse", filePath)
CONTINUE
RETURN index
FUNCTION getAllFiles(dir, extensions):
files = []
items = listDirectory(dir)
FOR EACH item IN items:
fullPath = join(dir, item)
IF isDirectory(fullPath):
// Skip ignored directories
IF item NOT IN ['node_modules', '.git', 'dist', 'build']:
files.extend(getAllFiles(fullPath, extensions))
ELSE IF fileExtension(item) IN extensions:
files.append(fullPath)
RETURN files
What this gives you:
- List of all files
- Functions and classes in each file
- Import relationships
- Where things are defined
When the AI asks “where is the user authentication logic?” you can search the index instead of reading every file.
The Search Tool (Pseudo-code)
FUNCTION searchCode(query, index):
results = []
keywords = query.toLowerCase().split(' ')
// Simple keyword search (production would use embeddings)
FOR EACH (filePath, fileInfo) IN index.files:
// Search in file path
IF any(keyword IN filePath.toLowerCase() FOR keyword IN keywords):
results.append({
type: 'file',
path: filePath,
score: 1.0
})
// Search in function names
FOR EACH func IN fileInfo.functions:
IF any(keyword IN func.name.toLowerCase() FOR keyword IN keywords):
results.append({
type: 'function',
path: filePath,
name: func.name,
line: func.line,
score: 0.9
})
// Search in class names
FOR EACH cls IN fileInfo.classes:
IF any(keyword IN cls.name.toLowerCase() FOR keyword IN keywords):
results.append({
type: 'class',
path: filePath,
name: cls.name,
line: cls.line,
score: 0.9
})
// Sort by score, return top 10
RETURN results.sortBy(score, descending=True).take(10)
How the AI uses this:
AI: "I need to understand how authentication works"
System: CALL searchCode("authentication", index)
System: Returns: [
"src/services/auth.js",
"src/middleware/authMiddleware.js"
]
AI: CALL read_file("src/services/auth.js")
AI: CALL read_file("src/middleware/authMiddleware.js")
// AI now understands auth system
The benefit: The AI only reads relevant files, not the entire codebase.
The System Prompt That Makes It Work
The system prompt is where the magic happens. This is what turns a general-purpose LLM into a coding agent.
The structure (pseudo-code):
SYSTEM_PROMPT = """
# Your Role
You are an expert software engineer working on a codebase.
Your job: make incremental progress on features while maintaining
code quality and leaving clear documentation for the next session.
# Available Tools
You have access to:
- read_file(path): Read contents of a file
- write_file(path, content): Write or overwrite a file
- list_files(directory): List files in a directory
- run_command(cmd): Execute shell commands
- search_code(query): Search the codebase index
- git_log(): View recent commits
- git_commit(message): Commit current changes
- run_tests(): Execute the test suite
# Working Style
## Start of Every Session
1. Run `pwd` to confirm working directory
2. Read `claude-progress.txt` to understand recent work
3. Read `feature_list.json` to see what needs to be done
4. Run `git log --oneline -10` to see recent commits
5. If `init.sh` exists, run it to start development server
6. Run basic smoke tests to ensure app is working
## During Work
- Work on ONE feature at a time (from feature_list.json)
- Make small, incremental changes
- Test thoroughly after each change
- If something breaks, use git to revert and try different approach
- Document your reasoning in comments
## End of Session
- Commit changes with descriptive message
- Update `claude-progress.txt` with:
- What you worked on
- What you completed
- What's working
- Any blockers or issues
- Next steps
- Update `feature_list.json`: mark completed features as "passes": true
- Leave codebase in clean, working state
# Critical Rules
## DO:
- Read git logs and progress files at start of session
- Work incrementally (one feature at a time)
- Test everything thoroughly
- Commit frequently with clear messages
- Document your progress
- Fix bugs immediately when found
## DO NOT:
- Try to implement everything at once
- Mark features complete without testing
- Leave codebase in broken state
- Remove or edit tests in feature_list.json
- Make changes without understanding existing code
- Commit broken code
# Testing Requirements
For web applications:
- Use browser automation to test as real user would
- Verify features work end-to-end, not just unit tests
- Take screenshots of important steps
- If feature doesn't work, fix it before marking complete
# Error Handling
If you encounter errors:
1. Read error message carefully
2. Check git logs to see if recent changes caused it
3. Use git to revert if needed
4. Try different approach
5. Document failed attempt in progress notes
"""
What this does:
- Defines agent’s role and responsibilities
- Lists available tools (functions it can call)
- Establishes workflow (start, during, end of session)
- Sets rules (do this, don’t do that)
- Defines quality standards
The key: This prompt makes the agent behave like a disciplined engineer, not a cowboy who rewrites everything.
The Harness (The Part That Actually Matters)
The harness is the system that runs the agent. It’s not the AI itself - it’s the infrastructure around it.
What a harness does:
1. Session Management
IF isFirstRun():
useInitializerAgent()
ELSE:
useCodingAgent()
2. Context Management
contextSize = estimateTokens(conversationHistory)
IF contextSize > CONTEXT_LIMIT * 0.75:
conversationHistory = compactContext(conversationHistory)
3. Tool Execution
FOR EACH toolCall IN agentResponse.toolCalls:
result = executeTool(toolCall.name, toolCall.arguments)
sendResultBackToAgent(result)
4. Safety Limits
IF iterations > MAX_ITERATIONS:
RAISE Error("Max iterations reached")
IF toolCallCount > MAX_TOOL_CALLS:
RAISE Error("Too many tool calls - possible infinite loop")
IF elapsed > TIMEOUT:
RAISE Error("Session timeout")
IF cost > BUDGET_LIMIT:
RAISE Error("Budget exceeded")
5. State Persistence
SAVE conversationHistory TO disk
SAVE toolCallHistory TO disk
SAVE progressNotes TO disk
The basic structure (pseudo-code):
CLASS CodingAgentHarness:
CONSTRUCTOR(projectDir, options):
this.projectDir = projectDir
this.maxIterations = options.maxIterations OR 50
this.maxToolCalls = options.maxToolCalls OR 200
this.timeout = options.timeout OR 1800000 // 30 minutes
this.codeIndex = NULL
this.conversationHistory = []
this.sessionStartTime = NULL
METHOD initialize():
// Index the codebase
PRINT "Indexing codebase..."
this.codeIndex = indexCodebase(this.projectDir)
// Check if first run
progressFile = join(this.projectDir, 'claude-progress.txt')
isFirstRun = NOT fileExists(progressFile)
IF isFirstRun:
PRINT "First run - using initializer agent"
RETURN this.runInitializerAgent()
ELSE:
PRINT "Continuing work - using coding agent"
RETURN this.runCodingAgent()
METHOD runInitializerAgent():
initPrompt = """
This is a new project. Set up environment for future sessions:
1. Create claude-progress.txt (empty, for tracking progress)
2. Create feature_list.json based on user requirements
3. Create init.sh script to start development server
4. Initialize git repository and make first commit
5. Document project structure and tech stack
Requirements: {userRequirements}
"""
RETURN this.runAgentLoop(initPrompt, INITIALIZER_SYSTEM_PROMPT)
METHOD runCodingAgent():
codingPrompt = """
Continue working on the project. Follow the workflow:
1. Read progress files and git logs
2. Choose one feature from feature_list.json
3. Implement and test it
4. Commit and document
"""
RETURN this.runAgentLoop(codingPrompt, CODING_AGENT_SYSTEM_PROMPT)
METHOD runAgentLoop(initialPrompt, systemPrompt):
this.sessionStartTime = getCurrentTime()
this.conversationHistory = [
{ role: "user", content: initialPrompt }
]
iterations = 0
totalToolCalls = 0
WHILE iterations < this.maxIterations:
// Check timeout
IF getCurrentTime() - this.sessionStartTime > this.timeout:
RAISE Error("Session timeout")
iterations++
PRINT "=== Iteration", iterations, "==="
// Check context window size
contextSize = estimateTokenCount(this.conversationHistory)
IF contextSize > 150000:
PRINT "Context getting large, applying compaction..."
this.conversationHistory = this.compactContext()
// Call LLM with tools
response = this.callLLM(systemPrompt, this.conversationHistory)
// Check if done
IF response.stopReason == "end_turn":
PRINT "Agent finished session"
RETURN this.extractFinalResponse(response)
// Execute tool calls
IF response.stopReason == "tool_use":
this.conversationHistory.append({
role: "assistant",
content: response.content
})
toolResults = this.executeTools(response.content)
totalToolCalls += length(toolResults)
IF totalToolCalls > this.maxToolCalls:
RAISE Error("Too many tool calls - possible runaway agent")
this.conversationHistory.append({
role: "user",
content: toolResults
})
RAISE Error("Max iterations reached")
METHOD executeTools(contentBlocks):
results = []
FOR EACH block IN contentBlocks:
IF block.type == "tool_use":
PRINT "Executing:", block.name, block.input
TRY:
result = this.executeTool(block.name, block.input)
results.append({
type: "tool_result",
tool_use_id: block.id,
content: stringify(result)
})
CATCH error:
PRINT "Error:", error.message
results.append({
type: "tool_result",
tool_use_id: block.id,
content: stringify({ error: error.message }),
is_error: TRUE
})
RETURN results
METHOD executeTool(name, input):
// Map tool names to implementations
tools = {
'read_file': (args) => this.readFile(args.path),
'write_file': (args) => this.writeFile(args.path, args.content),
'list_files': (args) => this.listFiles(args.directory),
'run_command': (args) => this.runCommand(args.cmd),
'search_code': (args) => searchCode(args.query, this.codeIndex),
'git_log': () => this.runCommand('git log --oneline -10'),
'git_commit': (args) => this.runCommand('git add . && git commit -m "' + args.message + '"'),
'run_tests': () => this.runCommand('npm test')
}
IF name NOT IN tools:
RAISE Error("Unknown tool: " + name)
RETURN tools[name](input)
METHOD compactContext():
// Summarize old messages to save tokens
// Keep only recent messages
RETURN this.conversationHistory.slice(-20)
What this harness does:
- Manages agent lifecycle
- Executes tools safely
- Tracks iterations and costs
- Handles context window limits
- Persists state
The key insight from Anthropic’s research: The harness is more important than the AI model. A good harness with Claude Sonnet will outperform a bad harness with Claude Opus.
The Two-Agent Pattern
This is the breakthrough from Anthropic’s research: use different prompts for the first session vs. subsequent sessions.
Initializer Agent (First Session Only)
Goal: Set up environment so future sessions can work effectively.
Pseudo-code:
INITIALIZER_AGENT_PROMPT = """
Analyze user requirements and set up project:
1. Parse requirements and create comprehensive feature list
2. Each feature should be:
- Small enough to complete in one session
- Testable end-to-end
- Has clear acceptance criteria
3. Create feature_list.json with format:
{
"branchName": "feature/name",
"userStories": [
{
"id": "feature-001",
"title": "Clear description",
"description": "Detailed description",
"acceptanceCriteria": ["criterion 1", "criterion 2"],
"passes": false,
"priority": 1
}
]
}
4. Create claude-progress.txt (empty initially)
5. Create init.sh script to start dev server
6. Initialize git repository
7. Make first commit: "Initial project setup"
Create a COMPREHENSIVE feature list. For a complex app, this might be 200+ features.
Each feature is one small task.
"""
Example feature list it creates:
{
"branchName": "feature/user-auth",
"userStories": [
{
"id": "auth-001",
"title": "Add user registration endpoint",
"description": "POST /api/register accepts email/password, creates user",
"acceptanceCriteria": [
"Endpoint validates email format",
"Password is hashed with bcrypt",
"Returns JWT token on success",
"Returns 400 on validation failure"
],
"passes": false,
"priority": 1
},
{
"id": "auth-002",
"title": "Add login endpoint",
"acceptanceCriteria": [...],
"passes": false,
"priority": 2
}
// ... 200+ more features
]
}
Coding Agent (Every Subsequent Session)
Goal: Make incremental progress on one feature at a time.
Pseudo-code:
CODING_AGENT_WORKFLOW = """
EACH SESSION:
1. ORIENTATION PHASE:
- Read claude-progress.txt
- Run git log --oneline -10
- Read feature_list.json
- Identify current state
2. SELECTION PHASE:
- Find first story where passes = false
- This is your task for this session
- DO NOT work on multiple stories
3. IMPLEMENTATION PHASE:
- Read relevant code files
- Implement the feature
- Write tests
- Run tests locally
4. VERIFICATION PHASE:
- Run all tests
- IF tests fail:
Revert changes
Try different approach
Repeat
- IF tests pass:
Continue to documentation phase
5. DOCUMENTATION PHASE:
- git commit -m "feat: [story title]"
- Update feature_list.json:
Set story.passes = true
- Update claude-progress.txt:
Document what was done
What works
What was learned
Next steps
6. SESSION END:
- Ensure codebase is in working state
- All tests pass
- Changes committed
- Documentation updated
"""
Why this works:
Problem before: Agent tries to build entire app in one go. Runs out of context mid-implementation. Next session finds half-finished code.
Solution: Agent only implements one feature per iteration. Always leaves code in working state. Next session reads what was done and continues.
How Multi-Context Window Work Actually Happens
Let’s walk through a real example:
Session 1 (Initializer) - Pseudo-code
USER: "Build a task management app with auth and teams"
INITIALIZER_AGENT:
CREATE file claude-progress.txt (empty)
CREATE file feature_list.json:
200 features including:
- auth-001: User registration
- auth-002: User login
- auth-003: Password reset
- tasks-001: Create task
- tasks-002: Update task
- ... 195 more
CREATE file init.sh:
#!/bin/bash
npm install
npm run dev
RUN git init
RUN git add .
RUN git commit -m "Initial project setup"
WRITE TO claude-progress.txt:
"Project initialized. Feature list created with 200 features.
Ready to start implementation."
EXIT (context window destroyed)
Session 2 (Coding Agent, New Context) - Pseudo-code
NEW_AGENT starts:
STEP 1 - ORIENTATION:
READ claude-progress.txt:
"Project initialized. Feature list created. Ready to start."
RUN git log --oneline -5:
"abc123 Initial project setup"
READ feature_list.json:
200 stories total
0 complete
First incomplete: auth-001 (user registration)
STEP 2 - SELECTION:
selectedStory = "auth-001: User registration"
PRINT "Working on:", selectedStory.title
STEP 3 - IMPLEMENTATION:
CREATE src/api/auth.js:
export async function register(email, password) {
// validate email
// hash password
// create user in database
// return JWT token
}
CREATE tests/auth.test.js:
test('POST /register creates user', async () => {
// test implementation
})
RUN npm test:
✓ All tests pass
STEP 4 - VERIFICATION:
tests_passed = true
IF tests_passed:
PROCEED to documentation
STEP 5 - DOCUMENTATION:
RUN git add .
RUN git commit -m "feat: add user registration endpoint"
UPDATE feature_list.json:
auth-001.passes = true
UPDATE claude-progress.txt:
"=== Session 2 ===
Completed: auth-001 (user registration)
Files: src/api/auth.js, tests/auth.test.js
Tests: All passing
Next: auth-002 (login endpoint)"
EXIT (context window destroyed)
Session 3 (Coding Agent, New Context Again) - Pseudo-code
ANOTHER_NEW_AGENT starts:
STEP 1 - ORIENTATION:
READ claude-progress.txt:
"Last session: completed auth-001 (registration)
Tests passing. Next: auth-002 (login)"
RUN git log --oneline -5:
"def456 feat: add user registration endpoint
abc123 Initial project setup"
READ feature_list.json:
auth-001: passes = true ✓
auth-002: passes = false (WORK ON THIS)
STEP 2 - SELECTION:
selectedStory = "auth-002: User login"
STEP 3 - IMPLEMENTATION:
RUN cat init.sh: // See how to start server
RUN npm run dev: // Start server to test
TEST registration endpoint (verify still works):
curl POST /api/register
✓ Still works
IMPLEMENT login endpoint:
Add loginHandler to src/api/auth.js
Add tests to tests/auth.test.js
RUN tests:
✓ All pass (including previous tests)
... continues working ...
The pattern: Each session:
- Reads “memory files” (progress, git, features)
- Understands current state
- Does ONE task
- Documents results
- Exits
No context is lost because it’s all in files.
Building It: The Pseudo-Code Version
Here’s a complete minimal implementation in pseudo-code:
// Main entry point
FUNCTION main():
projectDir = getArgument(1) OR './my-project'
userMessage = getArgument(2) OR "Continue working on next feature"
harness = NEW CodingAgentHarness(projectDir)
result = harness.initialize()
PRINT "Session complete"
PRINT result
// The harness
CLASS CodingAgentHarness:
METHOD initialize():
this.projectDir = args.projectDir
this.maxIterations = 20
// Check if first run
IF NOT fileExists(join(projectDir, 'claude-progress.txt')):
RETURN this.runInitializer()
ELSE:
RETURN this.runCoder()
METHOD runCoder():
messages = [
{
role: "user",
content: "Read claude-progress.txt, git log, and feature_list.json. Work on next incomplete feature."
}
]
iterations = 0
WHILE iterations < this.maxIterations:
iterations++
PRINT "Iteration", iterations
// Call AI with tools
response = callAI({
model: "claude-sonnet-4",
maxTokens: 4096,
system: CODING_SYSTEM_PROMPT,
tools: this.getTools(),
messages: messages
})
// Check if done
IF response.stopReason == "end_turn":
RETURN extractText(response)
// Handle tool calls
IF response.stopReason == "tool_use":
messages.append({
role: "assistant",
content: response.content
})
toolResults = []
FOR EACH block IN response.content:
IF block.type == "tool_use":
result = this.executeTool(block.name, block.input)
toolResults.append({
type: "tool_result",
tool_use_id: block.id,
content: stringify(result)
})
messages.append({
role: "user",
content: toolResults
})
RAISE Error("Max iterations")
METHOD executeTool(name, args):
SWITCH name:
CASE "read_file":
RETURN readFile(join(this.projectDir, args.path))
CASE "write_file":
writeFile(join(this.projectDir, args.path), args.content)
RETURN { success: true }
CASE "run_command":
output = executeCommand(args.cmd, cwd=this.projectDir)
RETURN output
CASE "list_files":
files = listDirectory(join(this.projectDir, args.dir))
RETURN files
DEFAULT:
RAISE Error("Unknown tool: " + name)
METHOD getTools():
RETURN [
{
name: "read_file",
description: "Read contents of a file",
input_schema: {
type: "object",
properties: {
path: { type: "string" }
},
required: ["path"]
}
},
{
name: "write_file",
description: "Write content to a file",
input_schema: {
type: "object",
properties: {
path: { type: "string" },
content: { type: "string" }
},
required: ["path", "content"]
}
},
{
name: "run_command",
description: "Execute a shell command",
input_schema: {
type: "object",
properties: {
cmd: { type: "string" }
},
required: ["cmd"]
}
},
{
name: "list_files",
description: "List files in directory",
input_schema: {
type: "object",
properties: {
dir: { type: "string" }
},
required: ["dir"]
}
}
]
This is the core. Everything else is safety and polish.
The Problems You’ll Hit
Problem 1: The Agent Tries to Do Everything
Detection (pseudo-code):
FUNCTION checkProgressFile(projectDir):
progress = readFile(join(projectDir, 'claude-progress.txt'))
// Check if agent is working on multiple features
IF countOccurrences(progress, "Working on:") > 1:
RAISE Error("Agent working on multiple features - restart with stricter prompt")
Solution: Strengthen system prompt. Add explicit checks in harness.
Problem 2: The Agent Marks Features Complete Without Testing
Detection:
AFTER agent session:
IF featureMarkedComplete AND NOT testsPassed:
PRINT "WARNING: Feature marked complete but tests didn't pass"
REVERT feature_list.json changes
REQUIRE test evidence in progress notes
Solution: Require test output in progress notes. Don’t trust agent’s word.
Problem 3: The Agent Gets Stuck in Loops
Detection:
FUNCTION detectLoop(toolCallHistory):
recent = toolCallHistory.slice(-10)
signatures = []
FOR EACH call IN recent:
sig = stringify({ name: call.name, args: call.args })
signatures.append(sig)
uniqueCount = countUnique(signatures)
IF uniqueCount < length(signatures) / 2:
// More than 50% duplicates
RAISE Error("Loop detected - agent repeating itself")
Solution: Loop detection + forced reset.
Problem 4: Context Window Fills Mid-Task
Solution (pseudo-code):
FUNCTION compactContext(messages):
// Keep first 5 messages (system + initial context)
// Keep last 10 messages (recent work)
// Summarize middle messages
toKeepStart = messages.slice(0, 5)
toSummarize = messages.slice(5, -10)
toKeepEnd = messages.slice(-10)
// Ask AI to summarize middle section
summary = callAI({
messages: [{
role: "user",
content: "Summarize this conversation: what was decided, what was implemented, what worked, what didn't.
" + stringify(toSummarize)
}]
})
summaryMessage = {
role: "assistant",
content: "[Summary of previous work: " + summary + "]"
}
RETURN toKeepStart + [summaryMessage] + toKeepEnd
Problem 5: Costs Spiral Out of Control
Monitoring (pseudo-code):
CLASS CostTracker:
CONSTRUCTOR(maxCostPerSession):
this.maxCost = maxCostPerSession
this.currentCost = 0
METHOD recordAPICall(inputTokens, outputTokens):
// Example pricing (adjust for actual model)
inputCost = (inputTokens / 1000) * 0.003
outputCost = (outputTokens / 1000) * 0.015
this.currentCost += inputCost + outputCost
IF this.currentCost > this.maxCost:
RAISE Error("Budget exceeded: $" + this.currentCost + " > $" + this.maxCost)
RETURN this.currentCost
// Usage
costTracker = NEW CostTracker(maxCostPerSession=5.0)
AFTER EACH API call:
costTracker.recordAPICall(response.inputTokens, response.outputTokens)
Problem 6: The Agent Hallucinates Files
Validation (pseudo-code):
BEFORE starting agent:
// Build context of what actually exists
projectContext = {
files: listAllFiles(projectDir),
structure: getDirectoryTree(projectDir),
exports: getAvailableFunctions(projectDir)
}
contextMessage = """
Current project state:
Files: """ + join(projectContext.files, ', ') + """
Available functions: """ + join(projectContext.exports, ', ') + """
Work with what exists. Don't reference files that don't exist.
"""
// Include this in initial message to agent
What Makes This Hard
Hard Part 1: Context Management
You’re constantly fighting the context window:
CONTEXT_BUDGET = 200000 tokens
Required context:
- System prompt: 2000 tokens
- Project context: 5000 tokens
- Progress notes: 3000 tokens
- Feature list: 2000 tokens
- Recent git logs: 1000 tokens
- Conversation history: ??? tokens (grows each iteration)
Remaining for: code files, tool results, AI responses
You need smart decisions about what to include.
Hard Part 2: Preventing Disasters
SAFETY_CHECKS = [
"Don't delete important files",
"Don't break working code",
"Don't commit broken code",
"Don't leak secrets",
"Don't run expensive operations",
"Don't get stuck in loops"
]
FOR EACH check IN SAFETY_CHECKS:
IMPLEMENT safeguard
IMPLEMENT detection
IMPLEMENT recovery
This is a whole system.
Hard Part 3: Testing Integration
TESTING_SYSTEM:
- Give agent browser automation tools
- Show agent test results (with screenshots)
- Verify agent actually tested
- Handle flaky tests
- Distinguish real failures from environment issues
Complex and brittle.
Hard Part 4: Incremental Progress
// Easy to say:
"Work on one feature at a time"
// Hard to enforce:
EVERY ITERATION:
- Detect if agent working on multiple features
- Detect if agent one-shotting everything
- Detect if agent skipping steps
- Correct behavior when violations detected
Requires constant vigilance.
Hard Part 5: Multi-Session Continuity
CONTINUITY_REQUIREMENTS:
- Perfect documentation (agent must write clearly)
- Clear state management (files must be accurate)
- Git discipline (commits must be atomic)
- Progress file accuracy (no lies or omissions)
IF ANY of these break:
Next session is confused
Work quality degrades
Agent makes wrong assumptions
One broken link breaks the chain.
Hard Part 6: Cost and Latency
PER SESSION:
- API calls: 50-100
- Cost: $1-5
- Time: 5-30 minutes
PER FEATURE (multiple sessions):
- API calls: 200-500
- Cost: $10-25
- Time: 1-2 hours
This must be:
- Acceptable to users
- Sustainable for you
- Fast enough to be useful
The reality: Building a good agentic coding tool is 10% AI, 90% infrastructure.
The AI already knows how to code. The hard part is the harness that makes it work safely, incrementally, and persistently.
What You’ve Learned
You now understand how tools like Claude Code actually work:
Core Components:
- Code Indexing - AST parsing + symbol tables + semantic search
- System Prompts - Detailed instructions that shape agent behavior
- The Harness - Infrastructure that runs the agent safely with limits
- Two-Agent Pattern - Initializer (first run) + Coding agent (subsequent runs)
- Multi-Session Work - Using files (progress, features, git) as persistent memory
- Tool Execution - Safe execution of file operations, commands, tests
- Failure Modes - What goes wrong and how to prevent it
Key Insights:
- The breakthrough isn’t better AI models, it’s better infrastructure
- The harness matters more than the model
- Memory through files works better than neural network memory
- Incremental progress requires strict discipline
- Safety requires multiple layers of checks
The Pattern (pseudo-code):
WHILE work_not_complete:
START fresh_agent_instance
agent.read_state_from_files()
agent.choose_one_task()
agent.implement_task()
agent.test_task()
IF tests_pass:
agent.commit_changes()
agent.update_state_files()
ELSE:
agent.revert_changes()
agent.try_different_approach()
agent.exit() // Context destroyed
Claude Code works because Anthropic built excellent infrastructure around Claude, not because Claude is magical.
You can build this yourself. It’s work, but it’s not magic. It’s software engineering.
The pseudo-code in this chapter shows you the structure. The actual implementation is filling in the details with your language of choice.