Module 10: Building Your Own Agentic Coding Tool

I decided to build my own Cursor. How hard could it be? I thought. It's just an LLM with some tools. Three weeks later I had written 4,000 lines of harness code, debugged 47 infinite loops, and spent $340 on API calls testing edge cases. The actual AI integration? 50 lines. The system prompt? 200 lines. Everything else? Infrastructure to stop the AI from deleting my .git folder when I asked it to 'clean up the project.' I mass-applied for jobs instead.

— Letters from Hell

What You’re Actually Building
The Architecture Nobody Tells You About
Code Indexing (Or: How It Knows Your Codebase)
The System Prompt That Makes It Work
The Harness (The Part That Actually Matters)
The Two-Agent Pattern
How Multi-Context Window Work Actually Happens
Building It: The Pseudo-Code Version
The Problems You’ll Hit
What Makes This Hard

What You’re Actually Building

Let’s be clear about what an “agentic coding tool” like Claude Code actually is: it’s an agent that can read your codebase, understand it, make changes, test those changes, and continue working across multiple sessions without losing context.

Strip away the marketing and you’re building:

A code indexer (so the AI knows what code exists)
A context manager (so the AI knows what’s relevant)
A tool harness (so the AI can read/write files, run commands, etc.)
A multi-session system (so work can span hours or days)
A testing framework (so the AI knows if it broke something)

The hard part isn’t the AI. The AI (Claude, GPT-4, whatever) already knows how to code. The hard part is:

Making it understand YOUR codebase specifically
Making it work incrementally instead of trying to rewrite everything
Making it continue working across multiple sessions without forgetting context
Making it test its own work instead of confidently shipping bugs

This is infrastructure work disguised as AI work.

The Architecture Nobody Tells You About

An iceberg diagram where the tip above water is labeled "AI API Call" and is tiny, while the massive underwater portion shows layers of infrastructure: "Code Indexer", "Context Manager", "Tool Harness", "Session Memory", "Safety Rails". The underwater part is on fire somehow. Technical diagram meets disaster illustration.

Here’s what a tool like Claude Code actually looks like under the hood:

┌─────────────────────────────────────────────────┐
│                   User Input                     │
│          "Add user authentication"               │
└──────────────────┬──────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────┐
│              Agent Harness                       │
│  ┌──────────────────────────────────────────┐  │
│  │  Initializer Agent (first run only)      │  │
│  │  - Set up environment                     │  │
│  │  - Create feature list                    │  │
│  │  - Initialize git repo                    │  │
│  │  - Write progress files                   │  │
│  └──────────────────────────────────────────┘  │
│                                                  │
│  ┌──────────────────────────────────────────┐  │
│  │  Coding Agent (every session)            │  │
│  │  1. Read context (git logs, progress)    │  │
│  │  2. Choose one task from feature list    │  │
│  │  3. Implement it                          │  │
│  │  4. Test it                               │  │
│  │  5. Commit + document                     │  │
│  └──────────────────────────────────────────┘  │
└──────────────────┬──────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────┐
│                  Tools                           │
│  - read_file(path)                              │
│  - write_file(path, content)                    │
│  - list_files(directory)                        │
│  - run_command(cmd)                             │
│  - git operations                                │
│  - search_code(query) [uses index]              │
│  - run_tests()                                   │
└─────────────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────┐
│              Code Index                          │
│  - AST (Abstract Syntax Tree) of all files      │
│  - Symbol table (functions, classes, imports)   │
│  - Dependency graph                              │
│  - Embeddings for semantic search               │
└─────────────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────┐
│           Your Actual Codebase                   │
│  /src, /tests, /config, etc.                    │
└─────────────────────────────────────────────────┘

The flow:

User asks for something
Harness decides if this is first run (initializer) or continuation (coding agent)
Agent reads context from files (git logs, progress notes, feature list)
Agent uses tools to explore code, make changes, test
Agent writes results to files for next session
Repeat

The key insight: The agent isn’t just making changes. It’s maintaining its own memory through files. Git commits are memory. Progress notes are memory. The feature list is memory.

Code Indexing (Or: How It Knows Your Codebase)

A robot librarian creating a detailed card catalog system from a messy pile of code files. Each card shows function names, dependencies, and relationships. The robot has multiple arms working simultaneously, creating order from chaos.

The AI doesn’t read every file in your project every time. That would blow the context window instantly. Instead, you build an index.

What Gets Indexed

1. File Structure

fileStructure = {
  "/src": {
    "/components": ["Button.jsx", "Input.jsx", "Modal.jsx"],
    "/services": ["api.js", "auth.js"],
    "/utils": ["validation.js", "formatting.js"]
  },
  "/tests": {
    "/unit": ["Button.test.js", "api.test.js"]
  }
}

2. Symbol Table (Functions, Classes, Exports)

symbolTable = {
  "src/services/auth.js": {
    exports: ["login", "logout", "verifyToken"],
    functions: [
      {
        name: "login",
        params: ["email", "password"],
        line: 15,
        description: "Authenticates user and returns JWT token"
      }
    ]
  }
}

3. Dependencies

dependencies = {
  "src/components/LoginForm.jsx": {
    imports: ["src/services/auth.js", "src/components/Input.jsx"],
    exports: ["LoginForm"]
  }
}

4. Semantic Embeddings For each file/function, generate an embedding (vector representation). This lets you do semantic search:

// User asks: "Where is the password hashing logic?"
// Search embeddings, find: src/services/auth.js::hashPassword

How to Build the Index (Pseudo-code)

FUNCTION indexCodebase(rootDir):
  index = {
    files: {},
    symbols: {},
    dependencies: {}
  }

  // Walk all files
  files = getAllFiles(rootDir, extensions=['.js', '.jsx', '.ts', '.tsx'])
  
  FOR EACH filePath IN files:
    code = readFile(filePath)
    
    // Parse to AST (Abstract Syntax Tree)
    TRY:
      ast = parseCode(code, language=detectLanguage(filePath))
      
      fileInfo = {
        path: filePath,
        functions: [],
        classes: [],
        imports: [],
        exports: []
      }
      
      // Extract symbols using AST traversal
      traverseAST(ast):
        ON FunctionDeclaration(node):
          fileInfo.functions.append({
            name: node.name,
            params: node.parameters,
            line: node.lineNumber
          })
        
        ON ClassDeclaration(node):
          fileInfo.classes.append({
            name: node.name,
            line: node.lineNumber
          })
        
        ON ImportDeclaration(node):
          fileInfo.imports.append(node.source)
        
        ON ExportDeclaration(node):
          fileInfo.exports.append(node.name)
      
      index.files[filePath] = fileInfo
      
    CATCH ParseError:
      log("Failed to parse", filePath)
      CONTINUE
  
  RETURN index

FUNCTION getAllFiles(dir, extensions):
  files = []
  items = listDirectory(dir)
  
  FOR EACH item IN items:
    fullPath = join(dir, item)
    
    IF isDirectory(fullPath):
      // Skip ignored directories
      IF item NOT IN ['node_modules', '.git', 'dist', 'build']:
        files.extend(getAllFiles(fullPath, extensions))
    
    ELSE IF fileExtension(item) IN extensions:
      files.append(fullPath)
  
  RETURN files

What this gives you:

List of all files
Functions and classes in each file
Import relationships
Where things are defined

When the AI asks “where is the user authentication logic?” you can search the index instead of reading every file.

The Search Tool (Pseudo-code)

FUNCTION searchCode(query, index):
  results = []
  keywords = query.toLowerCase().split(' ')
  
  // Simple keyword search (production would use embeddings)
  FOR EACH (filePath, fileInfo) IN index.files:
    
    // Search in file path
    IF any(keyword IN filePath.toLowerCase() FOR keyword IN keywords):
      results.append({
        type: 'file',
        path: filePath,
        score: 1.0
      })
    
    // Search in function names
    FOR EACH func IN fileInfo.functions:
      IF any(keyword IN func.name.toLowerCase() FOR keyword IN keywords):
        results.append({
          type: 'function',
          path: filePath,
          name: func.name,
          line: func.line,
          score: 0.9
        })
    
    // Search in class names
    FOR EACH cls IN fileInfo.classes:
      IF any(keyword IN cls.name.toLowerCase() FOR keyword IN keywords):
        results.append({
          type: 'class',
          path: filePath,
          name: cls.name,
          line: cls.line,
          score: 0.9
        })
  
  // Sort by score, return top 10
  RETURN results.sortBy(score, descending=True).take(10)

How the AI uses this:

AI: "I need to understand how authentication works"
System: CALL searchCode("authentication", index)
System: Returns: [
  "src/services/auth.js",
  "src/middleware/authMiddleware.js"
]
AI: CALL read_file("src/services/auth.js")
AI: CALL read_file("src/middleware/authMiddleware.js")
// AI now understands auth system

The benefit: The AI only reads relevant files, not the entire codebase.

The System Prompt That Makes It Work

The system prompt is where the magic happens. This is what turns a general-purpose LLM into a coding agent.

The structure (pseudo-code):

SYSTEM_PROMPT = """
# Your Role
You are an expert software engineer working on a codebase. 
Your job: make incremental progress on features while maintaining 
code quality and leaving clear documentation for the next session.

# Available Tools
You have access to:
- read_file(path): Read contents of a file
- write_file(path, content): Write or overwrite a file
- list_files(directory): List files in a directory
- run_command(cmd): Execute shell commands
- search_code(query): Search the codebase index
- git_log(): View recent commits
- git_commit(message): Commit current changes
- run_tests(): Execute the test suite

# Working Style

## Start of Every Session
1. Run `pwd` to confirm working directory
2. Read `claude-progress.txt` to understand recent work
3. Read `feature_list.json` to see what needs to be done
4. Run `git log --oneline -10` to see recent commits
5. If `init.sh` exists, run it to start development server
6. Run basic smoke tests to ensure app is working

## During Work
- Work on ONE feature at a time (from feature_list.json)
- Make small, incremental changes
- Test thoroughly after each change
- If something breaks, use git to revert and try different approach
- Document your reasoning in comments

## End of Session
- Commit changes with descriptive message
- Update `claude-progress.txt` with:
  - What you worked on
  - What you completed
  - What's working
  - Any blockers or issues
  - Next steps
- Update `feature_list.json`: mark completed features as "passes": true
- Leave codebase in clean, working state

# Critical Rules

## DO:
- Read git logs and progress files at start of session
- Work incrementally (one feature at a time)
- Test everything thoroughly
- Commit frequently with clear messages
- Document your progress
- Fix bugs immediately when found

## DO NOT:
- Try to implement everything at once
- Mark features complete without testing
- Leave codebase in broken state
- Remove or edit tests in feature_list.json
- Make changes without understanding existing code
- Commit broken code

# Testing Requirements
For web applications:
- Use browser automation to test as real user would
- Verify features work end-to-end, not just unit tests
- Take screenshots of important steps
- If feature doesn't work, fix it before marking complete

# Error Handling
If you encounter errors:
1. Read error message carefully
2. Check git logs to see if recent changes caused it
3. Use git to revert if needed
4. Try different approach
5. Document failed attempt in progress notes
"""

What this does:

Defines agent’s role and responsibilities
Lists available tools (functions it can call)
Establishes workflow (start, during, end of session)
Sets rules (do this, don’t do that)
Defines quality standards

The key: This prompt makes the agent behave like a disciplined engineer, not a cowboy who rewrites everything.

The Harness (The Part That Actually Matters)

The harness is the system that runs the agent. It’s not the AI itself - it’s the infrastructure around it.

What a harness does:

1. Session Management

IF isFirstRun():
  useInitializerAgent()
ELSE:
  useCodingAgent()

2. Context Management

contextSize = estimateTokens(conversationHistory)
IF contextSize > CONTEXT_LIMIT * 0.75:
  conversationHistory = compactContext(conversationHistory)

3. Tool Execution

FOR EACH toolCall IN agentResponse.toolCalls:
  result = executeTool(toolCall.name, toolCall.arguments)
  sendResultBackToAgent(result)

4. Safety Limits

IF iterations > MAX_ITERATIONS:
  RAISE Error("Max iterations reached")

IF toolCallCount > MAX_TOOL_CALLS:
  RAISE Error("Too many tool calls - possible infinite loop")

IF elapsed > TIMEOUT:
  RAISE Error("Session timeout")

IF cost > BUDGET_LIMIT:
  RAISE Error("Budget exceeded")

5. State Persistence

SAVE conversationHistory TO disk
SAVE toolCallHistory TO disk
SAVE progressNotes TO disk

The basic structure (pseudo-code):

CLASS CodingAgentHarness:
  
  CONSTRUCTOR(projectDir, options):
    this.projectDir = projectDir
    this.maxIterations = options.maxIterations OR 50
    this.maxToolCalls = options.maxToolCalls OR 200
    this.timeout = options.timeout OR 1800000  // 30 minutes
    this.codeIndex = NULL
    this.conversationHistory = []
    this.sessionStartTime = NULL
  
  METHOD initialize():
    // Index the codebase
    PRINT "Indexing codebase..."
    this.codeIndex = indexCodebase(this.projectDir)
    
    // Check if first run
    progressFile = join(this.projectDir, 'claude-progress.txt')
    isFirstRun = NOT fileExists(progressFile)
    
    IF isFirstRun:
      PRINT "First run - using initializer agent"
      RETURN this.runInitializerAgent()
    ELSE:
      PRINT "Continuing work - using coding agent"
      RETURN this.runCodingAgent()
  
  METHOD runInitializerAgent():
    initPrompt = """
    This is a new project. Set up environment for future sessions:
    
    1. Create claude-progress.txt (empty, for tracking progress)
    2. Create feature_list.json based on user requirements
    3. Create init.sh script to start development server
    4. Initialize git repository and make first commit
    5. Document project structure and tech stack
    
    Requirements: {userRequirements}
    """
    
    RETURN this.runAgentLoop(initPrompt, INITIALIZER_SYSTEM_PROMPT)
  
  METHOD runCodingAgent():
    codingPrompt = """
    Continue working on the project. Follow the workflow:
    1. Read progress files and git logs
    2. Choose one feature from feature_list.json
    3. Implement and test it
    4. Commit and document
    """
    
    RETURN this.runAgentLoop(codingPrompt, CODING_AGENT_SYSTEM_PROMPT)
  
  METHOD runAgentLoop(initialPrompt, systemPrompt):
    this.sessionStartTime = getCurrentTime()
    this.conversationHistory = [
      { role: "user", content: initialPrompt }
    ]
    
    iterations = 0
    totalToolCalls = 0
    
    WHILE iterations < this.maxIterations:
      
      // Check timeout
      IF getCurrentTime() - this.sessionStartTime > this.timeout:
        RAISE Error("Session timeout")
      
      iterations++
      PRINT "=== Iteration", iterations, "==="
      
      // Check context window size
      contextSize = estimateTokenCount(this.conversationHistory)
      IF contextSize > 150000:
        PRINT "Context getting large, applying compaction..."
        this.conversationHistory = this.compactContext()
      
      // Call LLM with tools
      response = this.callLLM(systemPrompt, this.conversationHistory)
      
      // Check if done
      IF response.stopReason == "end_turn":
        PRINT "Agent finished session"
        RETURN this.extractFinalResponse(response)
      
      // Execute tool calls
      IF response.stopReason == "tool_use":
        this.conversationHistory.append({
          role: "assistant",
          content: response.content
        })
        
        toolResults = this.executeTools(response.content)
        totalToolCalls += length(toolResults)
        
        IF totalToolCalls > this.maxToolCalls:
          RAISE Error("Too many tool calls - possible runaway agent")
        
        this.conversationHistory.append({
          role: "user",
          content: toolResults
        })
    
    RAISE Error("Max iterations reached")
  
  METHOD executeTools(contentBlocks):
    results = []
    
    FOR EACH block IN contentBlocks:
      IF block.type == "tool_use":
        PRINT "Executing:", block.name, block.input
        
        TRY:
          result = this.executeTool(block.name, block.input)
          results.append({
            type: "tool_result",
            tool_use_id: block.id,
            content: stringify(result)
          })
        
        CATCH error:
          PRINT "Error:", error.message
          results.append({
            type: "tool_result",
            tool_use_id: block.id,
            content: stringify({ error: error.message }),
            is_error: TRUE
          })
    
    RETURN results
  
  METHOD executeTool(name, input):
    // Map tool names to implementations
    tools = {
      'read_file': (args) => this.readFile(args.path),
      'write_file': (args) => this.writeFile(args.path, args.content),
      'list_files': (args) => this.listFiles(args.directory),
      'run_command': (args) => this.runCommand(args.cmd),
      'search_code': (args) => searchCode(args.query, this.codeIndex),
      'git_log': () => this.runCommand('git log --oneline -10'),
      'git_commit': (args) => this.runCommand('git add . && git commit -m "' + args.message + '"'),
      'run_tests': () => this.runCommand('npm test')
    }
    
    IF name NOT IN tools:
      RAISE Error("Unknown tool: " + name)
    
    RETURN tools[name](input)
  
  METHOD compactContext():
    // Summarize old messages to save tokens
    // Keep only recent messages
    RETURN this.conversationHistory.slice(-20)

What this harness does:

Manages agent lifecycle
Executes tools safely
Tracks iterations and costs
Handles context window limits
Persists state

The key insight from Anthropic’s research: The harness is more important than the AI model. A good harness with Claude Sonnet will outperform a bad harness with Claude Opus.

The Two-Agent Pattern

This is the breakthrough from Anthropic’s research: use different prompts for the first session vs. subsequent sessions.

Initializer Agent (First Session Only)

Goal: Set up environment so future sessions can work effectively.

Pseudo-code:

INITIALIZER_AGENT_PROMPT = """
Analyze user requirements and set up project:

1. Parse requirements and create comprehensive feature list
2. Each feature should be:
   - Small enough to complete in one session
   - Testable end-to-end
   - Has clear acceptance criteria
3. Create feature_list.json with format:
   {
     "branchName": "feature/name",
     "userStories": [
       {
         "id": "feature-001",
         "title": "Clear description",
         "description": "Detailed description",
         "acceptanceCriteria": ["criterion 1", "criterion 2"],
         "passes": false,
         "priority": 1
       }
     ]
   }
4. Create claude-progress.txt (empty initially)
5. Create init.sh script to start dev server
6. Initialize git repository
7. Make first commit: "Initial project setup"

Create a COMPREHENSIVE feature list. For a complex app, this might be 200+ features.
Each feature is one small task.
"""

Example feature list it creates:

{
  "branchName": "feature/user-auth",
  "userStories": [
    {
      "id": "auth-001",
      "title": "Add user registration endpoint",
      "description": "POST /api/register accepts email/password, creates user",
      "acceptanceCriteria": [
        "Endpoint validates email format",
        "Password is hashed with bcrypt",
        "Returns JWT token on success",
        "Returns 400 on validation failure"
      ],
      "passes": false,
      "priority": 1
    },
    {
      "id": "auth-002",
      "title": "Add login endpoint",
      "acceptanceCriteria": [...],
      "passes": false,
      "priority": 2
    }
    // ... 200+ more features
  ]
}

Coding Agent (Every Subsequent Session)

Goal: Make incremental progress on one feature at a time.

Pseudo-code:

CODING_AGENT_WORKFLOW = """
EACH SESSION:

1. ORIENTATION PHASE:
   - Read claude-progress.txt
   - Run git log --oneline -10
   - Read feature_list.json
   - Identify current state

2. SELECTION PHASE:
   - Find first story where passes = false
   - This is your task for this session
   - DO NOT work on multiple stories

3. IMPLEMENTATION PHASE:
   - Read relevant code files
   - Implement the feature
   - Write tests
   - Run tests locally

4. VERIFICATION PHASE:
   - Run all tests
   - IF tests fail:
       Revert changes
       Try different approach
       Repeat
   - IF tests pass:
       Continue to documentation phase

5. DOCUMENTATION PHASE:
   - git commit -m "feat: [story title]"
   - Update feature_list.json:
       Set story.passes = true
   - Update claude-progress.txt:
       Document what was done
       What works
       What was learned
       Next steps

6. SESSION END:
   - Ensure codebase is in working state
   - All tests pass
   - Changes committed
   - Documentation updated
"""

Why this works:

Problem before: Agent tries to build entire app in one go. Runs out of context mid-implementation. Next session finds half-finished code.

Solution: Agent only implements one feature per iteration. Always leaves code in working state. Next session reads what was done and continues.

How Multi-Context Window Work Actually Happens

Let’s walk through a real example:

Session 1 (Initializer) - Pseudo-code

USER: "Build a task management app with auth and teams"

INITIALIZER_AGENT:
  CREATE file claude-progress.txt (empty)
  
  CREATE file feature_list.json:
    200 features including:
    - auth-001: User registration
    - auth-002: User login
    - auth-003: Password reset
    - tasks-001: Create task
    - tasks-002: Update task
    - ... 195 more
  
  CREATE file init.sh:
    #!/bin/bash
    npm install
    npm run dev
  
  RUN git init
  RUN git add .
  RUN git commit -m "Initial project setup"
  
  WRITE TO claude-progress.txt:
    "Project initialized. Feature list created with 200 features.
     Ready to start implementation."
  
  EXIT (context window destroyed)

Session 2 (Coding Agent, New Context) - Pseudo-code

NEW_AGENT starts:

STEP 1 - ORIENTATION:
  READ claude-progress.txt:
    "Project initialized. Feature list created. Ready to start."
  
  RUN git log --oneline -5:
    "abc123 Initial project setup"
  
  READ feature_list.json:
    200 stories total
    0 complete
    First incomplete: auth-001 (user registration)

STEP 2 - SELECTION:
  selectedStory = "auth-001: User registration"
  PRINT "Working on:", selectedStory.title

STEP 3 - IMPLEMENTATION:
  CREATE src/api/auth.js:
    export async function register(email, password) {
      // validate email
      // hash password
      // create user in database
      // return JWT token
    }
  
  CREATE tests/auth.test.js:
    test('POST /register creates user', async () => {
      // test implementation
    })
  
  RUN npm test:
    ✓ All tests pass

STEP 4 - VERIFICATION:
  tests_passed = true
  IF tests_passed:
    PROCEED to documentation

STEP 5 - DOCUMENTATION:
  RUN git add .
  RUN git commit -m "feat: add user registration endpoint"
  
  UPDATE feature_list.json:
    auth-001.passes = true
  
  UPDATE claude-progress.txt:
    "=== Session 2 ===
     Completed: auth-001 (user registration)
     Files: src/api/auth.js, tests/auth.test.js
     Tests: All passing
     Next: auth-002 (login endpoint)"
  
  EXIT (context window destroyed)

Session 3 (Coding Agent, New Context Again) - Pseudo-code

ANOTHER_NEW_AGENT starts:

STEP 1 - ORIENTATION:
  READ claude-progress.txt:
    "Last session: completed auth-001 (registration)
     Tests passing. Next: auth-002 (login)"
  
  RUN git log --oneline -5:
    "def456 feat: add user registration endpoint
     abc123 Initial project setup"
  
  READ feature_list.json:
    auth-001: passes = true ✓
    auth-002: passes = false (WORK ON THIS)

STEP 2 - SELECTION:
  selectedStory = "auth-002: User login"

STEP 3 - IMPLEMENTATION:
  RUN cat init.sh:  // See how to start server
  RUN npm run dev:  // Start server to test
  
  TEST registration endpoint (verify still works):
    curl POST /api/register
    ✓ Still works
  
  IMPLEMENT login endpoint:
    Add loginHandler to src/api/auth.js
    Add tests to tests/auth.test.js
  
  RUN tests:
    ✓ All pass (including previous tests)

... continues working ...

The pattern: Each session:

Reads “memory files” (progress, git, features)
Understands current state
Does ONE task
Documents results
Exits

No context is lost because it’s all in files.

Building It: The Pseudo-Code Version

Here’s a complete minimal implementation in pseudo-code:

// Main entry point
FUNCTION main():
  projectDir = getArgument(1) OR './my-project'
  userMessage = getArgument(2) OR "Continue working on next feature"
  
  harness = NEW CodingAgentHarness(projectDir)
  result = harness.initialize()
  
  PRINT "Session complete"
  PRINT result

// The harness
CLASS CodingAgentHarness:
  
  METHOD initialize():
    this.projectDir = args.projectDir
    this.maxIterations = 20
    
    // Check if first run
    IF NOT fileExists(join(projectDir, 'claude-progress.txt')):
      RETURN this.runInitializer()
    ELSE:
      RETURN this.runCoder()
  
  METHOD runCoder():
    messages = [
      {
        role: "user",
        content: "Read claude-progress.txt, git log, and feature_list.json. Work on next incomplete feature."
      }
    ]
    
    iterations = 0
    
    WHILE iterations < this.maxIterations:
      iterations++
      PRINT "Iteration", iterations
      
      // Call AI with tools
      response = callAI({
        model: "claude-sonnet-4",
        maxTokens: 4096,
        system: CODING_SYSTEM_PROMPT,
        tools: this.getTools(),
        messages: messages
      })
      
      // Check if done
      IF response.stopReason == "end_turn":
        RETURN extractText(response)
      
      // Handle tool calls
      IF response.stopReason == "tool_use":
        messages.append({
          role: "assistant",
          content: response.content
        })
        
        toolResults = []
        FOR EACH block IN response.content:
          IF block.type == "tool_use":
            result = this.executeTool(block.name, block.input)
            toolResults.append({
              type: "tool_result",
              tool_use_id: block.id,
              content: stringify(result)
            })
        
        messages.append({
          role: "user",
          content: toolResults
        })
    
    RAISE Error("Max iterations")
  
  METHOD executeTool(name, args):
    SWITCH name:
      CASE "read_file":
        RETURN readFile(join(this.projectDir, args.path))
      
      CASE "write_file":
        writeFile(join(this.projectDir, args.path), args.content)
        RETURN { success: true }
      
      CASE "run_command":
        output = executeCommand(args.cmd, cwd=this.projectDir)
        RETURN output
      
      CASE "list_files":
        files = listDirectory(join(this.projectDir, args.dir))
        RETURN files
      
      DEFAULT:
        RAISE Error("Unknown tool: " + name)
  
  METHOD getTools():
    RETURN [
      {
        name: "read_file",
        description: "Read contents of a file",
        input_schema: {
          type: "object",
          properties: {
            path: { type: "string" }
          },
          required: ["path"]
        }
      },
      {
        name: "write_file",
        description: "Write content to a file",
        input_schema: {
          type: "object",
          properties: {
            path: { type: "string" },
            content: { type: "string" }
          },
          required: ["path", "content"]
        }
      },
      {
        name: "run_command",
        description: "Execute a shell command",
        input_schema: {
          type: "object",
          properties: {
            cmd: { type: "string" }
          },
          required: ["cmd"]
        }
      },
      {
        name: "list_files",
        description: "List files in directory",
        input_schema: {
          type: "object",
          properties: {
            dir: { type: "string" }
          },
          required: ["dir"]
        }
      }
    ]

This is the core. Everything else is safety and polish.

The Problems You’ll Hit

Problem 1: The Agent Tries to Do Everything

Detection (pseudo-code):

FUNCTION checkProgressFile(projectDir):
  progress = readFile(join(projectDir, 'claude-progress.txt'))
  
  // Check if agent is working on multiple features
  IF countOccurrences(progress, "Working on:") > 1:
    RAISE Error("Agent working on multiple features - restart with stricter prompt")

Solution: Strengthen system prompt. Add explicit checks in harness.

Problem 2: The Agent Marks Features Complete Without Testing

Detection:

AFTER agent session:
  IF featureMarkedComplete AND NOT testsPassed:
    PRINT "WARNING: Feature marked complete but tests didn't pass"
    REVERT feature_list.json changes
    REQUIRE test evidence in progress notes

Solution: Require test output in progress notes. Don’t trust agent’s word.

Problem 3: The Agent Gets Stuck in Loops

Detection:

FUNCTION detectLoop(toolCallHistory):
  recent = toolCallHistory.slice(-10)
  signatures = []
  
  FOR EACH call IN recent:
    sig = stringify({ name: call.name, args: call.args })
    signatures.append(sig)
  
  uniqueCount = countUnique(signatures)
  
  IF uniqueCount < length(signatures) / 2:
    // More than 50% duplicates
    RAISE Error("Loop detected - agent repeating itself")

Solution: Loop detection + forced reset.

Problem 4: Context Window Fills Mid-Task

Solution (pseudo-code):

FUNCTION compactContext(messages):
  // Keep first 5 messages (system + initial context)
  // Keep last 10 messages (recent work)
  // Summarize middle messages
  
  toKeepStart = messages.slice(0, 5)
  toSummarize = messages.slice(5, -10)
  toKeepEnd = messages.slice(-10)
  
  // Ask AI to summarize middle section
  summary = callAI({
    messages: [{
      role: "user",
      content: "Summarize this conversation: what was decided, what was implemented, what worked, what didn't.
" + stringify(toSummarize)
    }]
  })
  
  summaryMessage = {
    role: "assistant",
    content: "[Summary of previous work: " + summary + "]"
  }
  
  RETURN toKeepStart + [summaryMessage] + toKeepEnd

Problem 5: Costs Spiral Out of Control

Monitoring (pseudo-code):

CLASS CostTracker:
  CONSTRUCTOR(maxCostPerSession):
    this.maxCost = maxCostPerSession
    this.currentCost = 0
  
  METHOD recordAPICall(inputTokens, outputTokens):
    // Example pricing (adjust for actual model)
    inputCost = (inputTokens / 1000) * 0.003
    outputCost = (outputTokens / 1000) * 0.015
    
    this.currentCost += inputCost + outputCost
    
    IF this.currentCost > this.maxCost:
      RAISE Error("Budget exceeded: $" + this.currentCost + " > $" + this.maxCost)
    
    RETURN this.currentCost

// Usage
costTracker = NEW CostTracker(maxCostPerSession=5.0)
AFTER EACH API call:
  costTracker.recordAPICall(response.inputTokens, response.outputTokens)

Problem 6: The Agent Hallucinates Files

Validation (pseudo-code):

BEFORE starting agent:
  // Build context of what actually exists
  projectContext = {
    files: listAllFiles(projectDir),
    structure: getDirectoryTree(projectDir),
    exports: getAvailableFunctions(projectDir)
  }
  
  contextMessage = """
  Current project state:
  Files: """ + join(projectContext.files, ', ') + """
  Available functions: """ + join(projectContext.exports, ', ') + """
  
  Work with what exists. Don't reference files that don't exist.
  """
  
  // Include this in initial message to agent

What Makes This Hard

Hard Part 1: Context Management

You’re constantly fighting the context window:

CONTEXT_BUDGET = 200000 tokens

Required context:
  - System prompt: 2000 tokens
  - Project context: 5000 tokens
  - Progress notes: 3000 tokens
  - Feature list: 2000 tokens
  - Recent git logs: 1000 tokens
  - Conversation history: ??? tokens (grows each iteration)

Remaining for: code files, tool results, AI responses

You need smart decisions about what to include.

Hard Part 2: Preventing Disasters

SAFETY_CHECKS = [
  "Don't delete important files",
  "Don't break working code",
  "Don't commit broken code",
  "Don't leak secrets",
  "Don't run expensive operations",
  "Don't get stuck in loops"
]

FOR EACH check IN SAFETY_CHECKS:
  IMPLEMENT safeguard
  IMPLEMENT detection
  IMPLEMENT recovery

This is a whole system.

Hard Part 3: Testing Integration

TESTING_SYSTEM:
  - Give agent browser automation tools
  - Show agent test results (with screenshots)
  - Verify agent actually tested
  - Handle flaky tests
  - Distinguish real failures from environment issues

Complex and brittle.

Hard Part 4: Incremental Progress

// Easy to say:
"Work on one feature at a time"

// Hard to enforce:
EVERY ITERATION:
  - Detect if agent working on multiple features
  - Detect if agent one-shotting everything
  - Detect if agent skipping steps
  - Correct behavior when violations detected

Requires constant vigilance.

Hard Part 5: Multi-Session Continuity

CONTINUITY_REQUIREMENTS:
  - Perfect documentation (agent must write clearly)
  - Clear state management (files must be accurate)
  - Git discipline (commits must be atomic)
  - Progress file accuracy (no lies or omissions)

IF ANY of these break:
  Next session is confused
  Work quality degrades
  Agent makes wrong assumptions

One broken link breaks the chain.

Hard Part 6: Cost and Latency

PER SESSION:
  - API calls: 50-100
  - Cost: $1-5
  - Time: 5-30 minutes

PER FEATURE (multiple sessions):
  - API calls: 200-500
  - Cost: $10-25
  - Time: 1-2 hours

This must be:
  - Acceptable to users
  - Sustainable for you
  - Fast enough to be useful

The reality: Building a good agentic coding tool is 10% AI, 90% infrastructure.

The AI already knows how to code. The hard part is the harness that makes it work safely, incrementally, and persistently.

What You’ve Learned

You now understand how tools like Claude Code actually work:

Core Components:

Code Indexing - AST parsing + symbol tables + semantic search
System Prompts - Detailed instructions that shape agent behavior
The Harness - Infrastructure that runs the agent safely with limits
Two-Agent Pattern - Initializer (first run) + Coding agent (subsequent runs)
Multi-Session Work - Using files (progress, features, git) as persistent memory
Tool Execution - Safe execution of file operations, commands, tests
Failure Modes - What goes wrong and how to prevent it

Key Insights:

The breakthrough isn’t better AI models, it’s better infrastructure
The harness matters more than the model
Memory through files works better than neural network memory
Incremental progress requires strict discipline
Safety requires multiple layers of checks

The Pattern (pseudo-code):

WHILE work_not_complete:
  START fresh_agent_instance
  
  agent.read_state_from_files()
  agent.choose_one_task()
  agent.implement_task()
  agent.test_task()
  
  IF tests_pass:
    agent.commit_changes()
    agent.update_state_files()
  ELSE:
    agent.revert_changes()
    agent.try_different_approach()
  
  agent.exit()  // Context destroyed

Claude Code works because Anthropic built excellent infrastructure around Claude, not because Claude is magical.

You can build this yourself. It’s work, but it’s not magic. It’s software engineering.

The pseudo-code in this chapter shows you the structure. The actual implementation is filling in the details with your language of choice.

Mandatory Knowledge Check

Question 1 / 10Score: 0

Building Your Own Agentic Coding Tool

Table of Contents