Troubleshooting Common Problems
It was working yesterday.
Table of Contents
- Output Quality Issues
- Technical Errors
- Performance Problems
- Cost Issues
- Integration Problems
- Quick Fixes Cheat Sheet
- When to Give Up
AI tools break in predictable ways. Here’s how to fix the most common issues.
Output Quality Issues
Problem: AI gives generic, unhelpful responses
Symptoms:
- Responses are vague
- Answers could apply to any project
- Missing specific details
Likely causes:
- Prompt is too vague
- Not enough context provided
- Wrong model for the task
Fixes:
❌ Bad: "Help me with this code"
✅ Better: "Review this Python function for security issues.
It handles user authentication in a Flask app.
Check for: SQL injection, timing attacks, password handling."
Checklist:
- Is the prompt specific about what you want?
- Did you include relevant context?
- Did you specify the format you want?
- Did you include constraints (what NOT to do)?
Problem: AI keeps hallucinating facts/APIs/functions
Symptoms:
- References non-existent libraries
- Suggests deprecated methods
- Makes up API endpoints
Likely causes:
- Training data cutoff (doesn’t know recent changes)
- Model confidently guessing
- Ambiguous library names
Fixes:
- Specify versions explicitly:
Use React 18 with the new hooks API.
Use Python 3.11+ syntax.
Use Node.js v20 LTS APIs.
- Provide documentation:
Here's the actual API documentation:
{paste relevant docs}
Based on this documentation, write...
- Ask for verification:
Before providing code, verify that all imports and
function calls exist. If you're unsure about an API,
say so rather than guessing.
- Use RAG: Include actual documentation in context.
Problem: AI ignores my instructions
Symptoms:
- Asked for X, got Y
- Specifically said “don’t do Z”, it did Z
- Format instructions ignored
Likely causes:
- Instructions buried in long prompt
- Conflicting instructions
- Model “helpfully” overriding your choices
Fixes:
- Put critical instructions at the start AND end:
IMPORTANT: Return only JSON, no explanation.
{rest of prompt}
Remember: JSON only, no other text.
- Use explicit formatting:
Format your response EXACTLY like this:
[ANALYSIS]
{your analysis here}
[/ANALYSIS]
[CODE]
{your code here}
[/CODE]
- Remove ambiguity:
❌ "Keep it short" (what's short?)
✅ "Maximum 3 sentences"
✅ "Under 100 words"
Problem: Code has bugs/doesn’t compile
Symptoms:
- Syntax errors
- Type mismatches
- Missing imports
Likely causes:
- Incomplete context
- Model mixing language versions
- Framework conventions mismatch
Fixes:
- Specify exact environment:
TypeScript 5.3 with strict mode
React 18 with functional components only
Node.js ESM (import/export, not require)
- Include your existing types/interfaces:
Use these existing types:
{paste your interfaces}
- Ask for complete code:
Provide complete, runnable code including all imports.
Do not use placeholder comments like "// rest of code here".
- Request self-review:
After writing the code, check for:
- Missing imports
- Type errors
- Syntax errors
Fix any issues before responding.
Problem: Responses are too long/short
Symptoms:
- Asked for summary, got essay
- Asked for detailed explanation, got one sentence
Fixes:
For too long:
Respond in 3 sentences maximum.
Be concise. No preamble or summary.
Just the code, no explanation needed.
For too short:
Provide a detailed explanation with examples.
Include: {list specific things to include}
Aim for approximately 500 words.
Technical Errors
Problem: Rate limit errors (429)
Symptoms:
Error: 429 Too Many Requests
Rate limit exceeded
Fixes:
- Implement exponential backoff:
async function withRetry(fn: () => Promise<any>, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (e) {
if (e.status === 429 && i < maxRetries - 1) {
await sleep(Math.pow(2, i) * 1000);
continue;
}
throw e;
}
}
}
-
Check your tier limits: You might need to upgrade.
-
Add request queuing: Don’t fire parallel requests.
-
Cache responses: Don’t ask the same question twice.
Problem: Context length exceeded
Symptoms:
Error: Maximum context length exceeded
Error: Input too long
Fixes:
- Truncate input:
function truncateToTokenLimit(text: string, maxTokens: number) {
// Rough estimate: 1 token ≈ 4 characters
const maxChars = maxTokens * 4;
if (text.length > maxChars) {
return text.slice(0, maxChars) + "\n[truncated]";
}
return text;
}
-
Summarize context: Ask the AI to summarize previous conversation.
-
Use sliding window: Keep only recent messages.
-
Chunk large documents: Process in pieces.
-
Use a model with larger context: Claude has 200K, some models have less.
Problem: Timeout errors
Symptoms:
Error: Request timeout
Error: Connection timed out
Likely causes:
- Complex prompt taking too long
- Network issues
- Service overloaded
Fixes:
- Increase timeout:
const response = await client.messages.create({
// ... options
}, {
timeout: 120000 // 2 minutes
});
- Use streaming for long responses:
const stream = await client.messages.create({
// ... options
stream: true
});
- Break into smaller requests: Don’t ask for everything at once.
Problem: API key errors
Symptoms:
Error: Invalid API key
Error: Authentication failed
Error: 401 Unauthorized
Checklist:
- Key is actually set in environment
- No trailing whitespace in key
- Key hasn’t been rotated/revoked
- Using correct key for correct provider
- Not hitting free tier limits
Debug:
# Check if env var is set
echo $ANTHROPIC_API_KEY | head -c 10
# Verify it's the right format
# Anthropic keys start with "sk-ant-"
# OpenAI keys start with "sk-"
Performance Problems
Problem: Responses are too slow
Symptoms:
- API calls taking 10+ seconds
- Users complaining about latency
Fixes:
- Use streaming:
// User sees content appearing progressively
const stream = await client.messages.create({
stream: true,
// ...
});
for await (const chunk of stream) {
process.stdout.write(chunk.delta?.text || '');
}
- Use faster models:
- Claude: Haiku < Sonnet < Opus
- OpenAI: GPT-3.5 < GPT-4
-
Reduce input size: Less context = faster processing.
-
Request shorter outputs:
Respond in under 100 words.
- Use caching: Same question = cached answer.
Problem: High memory usage (local models)
Symptoms:
- System slowing down
- Out of memory errors
- Swap thrashing
Fixes:
-
Use smaller model: 7B instead of 70B
-
Use quantized model:
# Q4_K_M is good quality/size balance
ollama run llama3.1:8b-q4_K_M
- Reduce context length:
ollama run llama3.1 --context-length 4096
- Check actual requirements: Some models lie about their sizes.
Cost Issues
Problem: API costs unexpectedly high
Symptoms:
- Surprise bill
- Usage much higher than expected
Debug:
- Check what’s actually being sent:
console.log('Input tokens:', countTokens(prompt));
console.log('Prompt preview:', prompt.slice(0, 500));
- Audit your calls:
let totalTokens = 0;
async function trackedCall(prompt: string) {
const response = await client.messages.create({...});
totalTokens += response.usage.input_tokens;
totalTokens += response.usage.output_tokens;
console.log(`Total tokens so far: ${totalTokens}`);
return response;
}
Common causes:
- Conversation history growing unbounded
- System prompt included in every call
- Retry loops without limits
- Logging prompts that include full context
Fixes:
- Set up billing alerts
- Implement per-request budgets
- Truncate conversation history
- Cache identical requests
Problem: Using expensive model for simple tasks
Diagnosis: Are you using Opus/GPT-4 for everything?
Fix: Route to appropriate models:
function selectModel(task: string): string {
const simpleTasks = ['classification', 'extraction', 'yes/no'];
const complexTasks = ['reasoning', 'analysis', 'creative'];
if (simpleTasks.some(t => task.includes(t))) {
return 'claude-3-5-haiku-20241022';
}
return 'claude-sonnet-4-20250514';
}
Integration Problems
Problem: Inconsistent response format
Symptoms:
- Sometimes JSON, sometimes text
- Structure varies between calls
- Parsing fails randomly
Fixes:
- Use structured output features:
// Anthropic
const response = await client.messages.create({
// ...
tools: [{
name: "format_response",
description: "Format the response",
input_schema: {
type: "object",
properties: {
answer: { type: "string" },
confidence: { type: "number" }
},
required: ["answer", "confidence"]
}
}],
tool_choice: { type: "tool", name: "format_response" }
});
- Validate and retry:
async function getStructuredResponse(prompt: string, schema: any) {
for (let attempt = 0; attempt < 3; attempt++) {
const response = await client.messages.create({...});
try {
const parsed = JSON.parse(response.content);
validateSchema(parsed, schema);
return parsed;
} catch {
// Add reminder to prompt and retry
prompt += "\n\nIMPORTANT: Respond ONLY with valid JSON.";
}
}
throw new Error("Failed to get structured response");
}
- Be extremely explicit:
Respond with ONLY a JSON object, no other text.
Do not include markdown code blocks.
Do not include explanation.
Just the JSON.
Problem: Streaming not working
Symptoms:
- No chunks received
- All content at once
- Connection hanging
Checklist:
- Using
stream: trueoption - Using async iteration properly
- Not buffering the entire response
- Handling stream errors
Example fix:
try {
const stream = await client.messages.create({
stream: true,
// ...
});
for await (const event of stream) {
if (event.type === 'content_block_delta') {
process.stdout.write(event.delta.text);
}
}
} catch (e) {
console.error('Stream error:', e);
}
Problem: Tool/function calls failing
Symptoms:
- Model doesn’t call tools
- Wrong tool called
- Invalid arguments
Fixes:
- Better tool descriptions:
{
name: "search_docs",
description: `Search internal documentation.
Use this when user asks about: company policies,
technical specs, or internal processes.
Do NOT use for general knowledge questions.`,
input_schema: {
// Be explicit about expected values
}
}
- Validate tool calls:
function validateToolCall(call: ToolCall): boolean {
const tool = tools.find(t => t.name === call.name);
if (!tool) return false;
// Validate arguments against schema
return validateSchema(call.arguments, tool.input_schema);
}
- Handle gracefully:
if (!validateToolCall(call)) {
// Ask for clarification instead of crashing
return "I wasn't able to understand that request. Could you rephrase?";
}
Quick Fixes Cheat Sheet
| Problem | First thing to try |
|---|---|
| Generic responses | Add more context to prompt |
| Hallucinations | Include actual docs in prompt |
| Ignores instructions | Move instructions to start AND end |
| Code bugs | Specify exact versions |
| Rate limits | Add exponential backoff |
| Context exceeded | Truncate or summarize input |
| Slow responses | Use streaming |
| High costs | Route to cheaper models |
| Inconsistent format | Use structured output tools |
| Tool calls failing | Improve tool descriptions |
When to Give Up
Sometimes the AI just can’t do what you want. Signs to try a different approach:
- Same error after 5+ different prompts
- Error rate above 30% even with good prompts
- Task requires real-time information
- Task requires guaranteed correctness
- Cost per task exceeds value of task
Alternatives to consider:
- Traditional programming
- Different AI model
- Human in the loop
- Hybrid approach (AI assists, human decides)
Remember: If you’ve been debugging the same AI issue for an hour, take a break. Fresh eyes often spot what tired eyes miss.