Back to LLMs

Context Windows & Memory

The Attention Bottleneck

Difficulty
Intermediate
Duration
12-15 min
Prerequisites
Transformer attention
Step
1/ 7

What Is a Context Window?

The context window is the maximum number of tokens an LLM can process in a single forward pass. It includes everything: the system prompt, conversation history, user message, and the model's response.

Think of it as the model's "working memory." Anything inside the context window can be attended to; anything outside is invisible. The model has no persistent memory — each API call starts fresh with only the tokens you provide.

Why is it limited? The self-attention mechanism computes a score between every pair of tokens. With n tokens, that's n² scores. Doubling the context length quadruples the memory and compute.

Context window sizes have grown rapidly:

| Model | Context Window | |---------------|----------------| | GPT-2 (2019) | 1,024 tokens | | GPT-3 (2020) | 2,048 tokens | | GPT-4 (2023) | 8K / 32K | | Claude 3 (2024)| 200K tokens | | Gemini 1.5 (2024)| 1M tokens |

Practical impact: A 4K context window holds roughly 3,000 words — about 6 pages. A 200K window holds ~150,000 words — an entire novel. Larger windows enable processing entire codebases, long documents, and extended conversations without losing context.

Context Window Sizes Across Models

ModelContext LengthApprox. WordsEquivalent Content
GPT-21,024 tokens~750 words1-2 pages of text
GPT-32,048 tokens~1,500 words3-4 pages of text
GPT-48,192 tokens~6,000 words12 pages, a long essay
GPT-4-32K32,768 tokens~24,000 words50 pages, a novella chapter
Claude 3200,000 tokens~150,000 wordsAn entire novel
Gemini 1.51,000,000 tokens~750,000 wordsMultiple textbooks

Context Window Budget

What Consumes ContextTypical SizeNotes
System prompt100-2,000 tokensPresent in every request
Conversation history500-50,000 tokensGrows with each turn
RAG context500-10,000 tokensRetrieved documents
User message10-5,000 tokensThe current query
Model response50-4,000 tokensGenerated output
Total budgetContext window limitEverything must fit