Works Cited (Yes, We Did Our Homework)
I didn't read all these papers. I skimmed them like everyone else. But I skimmed them thoroughly.
A Note on Sources
This course draws from actual research, real documentation, and hard-won experience. We’re not just making things up (unlike some AI outputs we could mention).
Some of these sources are essential. Some are included because people will ask if you’ve read them. Some are genuinely excellent. We’ll tell you which is which.
Foundational Papers
The Ones You Should Actually Skim
Vaswani, A., et al. (2017). “Attention Is All You Need.” NeurIPS 2017 https://arxiv.org/abs/1706.03762
The paper that started this whole mess. Introduced the Transformer architecture. You don’t need to understand the math, but knowing this exists makes you sound informed. The title is also peak academic confidence.
Brown, T., et al. (2020). “Language Models are Few-Shot Learners.” NeurIPS 2020 https://arxiv.org/abs/2005.14165
The GPT-3 paper. 75 pages. Nobody has read all of it. The key insight: bigger models can learn from examples in the prompt. This is why few-shot prompting works.
Wei, J., et al. (2022). “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” NeurIPS 2022 https://arxiv.org/abs/2201.11903
Why “let’s think step by step” actually works. Short enough to actually read. Surprisingly accessible.
The Ones People Name-Drop But Haven’t Read
Radford, A., et al. (2019). “Language Models are Unsupervised Multitask Learners.” OpenAI Blog https://openai.com/research/better-language-models
The GPT-2 paper. Historically important. You can skip it now.
Devlin, J., et al. (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” NAACL 2019 https://arxiv.org/abs/1810.04805
BERT was huge for embeddings. Still relevant for understanding why some models are better at search vs. generation.
Raffel, C., et al. (2019). “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.” JMLR 2020 https://arxiv.org/abs/1910.10683
The T5 paper. Important for researchers. You can live without it.
Technical Documentation
Actually Useful
OpenAI API Documentation https://platform.openai.com/docs
The most complete API docs in the space. Good examples. Updated regularly. Start here when building anything.
Anthropic Claude Documentation https://docs.anthropic.com
Clean, well-organized. Their prompt engineering guide is genuinely good.
LangChain Documentation https://python.langchain.com/docs
Comprehensive but overwhelming. Use it as a reference, not a tutorial. (And maybe consider PocketFlow instead.)
PocketFlow Documentation https://github.com/The-Pocket/PocketFlow
~100 lines of framework. The documentation is the code. Refreshing.
Reference When Needed
Hugging Face Documentation https://huggingface.co/docs
Essential for working with open-source models. The model cards are genuinely helpful.
Pinecone Documentation https://docs.pinecone.io
Best-documented vector database. Good for understanding RAG concepts even if you use a different provider.
Chroma Documentation https://docs.trychroma.com
Simpler than Pinecone. Good for local development and learning.
Blog Posts & Articles
Must-Reads
Karpathy, A. (2023). “State of GPT.” Microsoft Build 2023 Talk https://www.youtube.com/watch?v=bZQun8Y4L2A
Actually a video, but essential. Andrej Karpathy explains how LLMs work in plain English. Watch at 1.5x speed.
Wolfram, S. (2023). “What Is ChatGPT Doing … and Why Does It Work?” https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
Long but accessible. Good for understanding the fundamentals without diving into papers.
Simon Willison’s Blog https://simonwillison.net
The best ongoing coverage of practical AI development. No hype, lots of code, healthy skepticism.
Lilian Weng’s Blog https://lilianweng.github.io
More technical than Simon’s, but excellent explanations of complex topics. Her post on prompt engineering is definitive.
Worth Your Time
Gwern’s Essays on AI https://gwern.net/
Deep, weird, thorough. Not for everyone, but genuinely insightful.
The Gradient https://thegradient.pub
Academic-adjacent but readable. Good for staying current without drowning in hype.
Chip Huyen’s Blog https://huyenchip.com/blog/
Practical ML engineering. Her posts on evaluation and production ML are excellent.
Books
Actually Good
Jurafsky, D. & Martin, J.H. “Speech and Language Processing” (3rd ed. draft) https://web.stanford.edu/~jurafsky/slp3/
Free online. The textbook for NLP. Dense but comprehensive. Use it as a reference.
Tunstall, L., et al. (2022). “Natural Language Processing with Transformers.” O’Reilly Media
Practical, code-heavy, focused on Hugging Face. Good for hands-on learning.
Ng, A. “Machine Learning Yearning.” https://www.deeplearning.ai/resources/
Free. Short. Focused on practical ML decision-making. Surprisingly useful.
If You Want to Go Deeper
Goodfellow, I., et al. (2016). “Deep Learning.” MIT Press https://www.deeplearningbook.org
The deep learning bible. Free online. You don’t need this for using AI, but it’s there if you want it.
Bishop, C. (2006). “Pattern Recognition and Machine Learning.” Springer
Classic ML textbook. Mathematically rigorous. Only if you’re going full researcher mode.
Tools & Frameworks Referenced
| Tool | URL | What It’s For |
|---|---|---|
| OpenAI API | platform.openai.com | GPT models, embeddings |
| Anthropic Claude | anthropic.com | Alternative to GPT, longer context |
| GitHub Copilot | github.com/features/copilot | Code completion |
| Cursor | cursor.sh | AI-native code editor |
| Ollama | ollama.ai | Run local models easily |
| LM Studio | lmstudio.ai | GUI for local models |
| PocketFlow | github.com/The-Pocket/PocketFlow | Minimal agent framework |
| LangChain | langchain.com | Comprehensive (complex) framework |
| Pinecone | pinecone.io | Managed vector database |
| Chroma | trychroma.com | Local vector database |
| Weights & Biases | wandb.ai | ML experiment tracking |
The “I Read a Tweet” Section
Things that influenced this course but aren’t formal citations:
- Countless Twitter/X threads from practitioners
- Hacker News discussions (the skeptical ones)
- Reddit r/LocalLLaMA for local model insights
- Discord servers where people share what actually works
- Conference talks at NeurIPS, ICML, and ACL
- Internal documentation from teams who’ve shipped AI features
- War stories from developers who learned the hard way
On Staying Current
AI moves fast. Some of these links will be outdated by the time you read this. Here’s how to stay informed without losing your mind:
- Simon Willison’s blog — Best signal-to-noise ratio
- Hacker News — Filter for the skeptical comments
- ArXiv Sanity (arxiv-sanity-lite.com) — Curated papers
- Your own experiments — Nothing beats hands-on experience
Don’t try to read everything. Read enough to stay competent, then go build things.
A Final Note
We cited real sources because this stuff matters. But here’s the uncomfortable truth: most of what you’ll learn about AI comes from using it, breaking it, and figuring out what works in your specific context.
Papers give you theory. Documentation gives you APIs. Experience gives you judgment.
You’ve got the theory and the APIs. Now go get the experience.
Last updated: January 2026
Some links may have changed. The fundamentals probably haven’t.