The context window is the maximum number of tokens a language model can read and reason over in a single call. Everything in the call — system prompt, conversation history, retrieved context, and the response itself — must fit. Exceeding the window truncates or rejects the call.
State in 2026.
200k tokens is the practical floor for production-grade models (Claude, GPT-5). The frontier extends to ~2M (Gemini 1.5/2.0) but quality degrades past ~100k for most workloads regardless of advertised limit. The honest planning number is 100k usable, not the headline maximum.
Lost in the middle.
Past ~50k tokens, models reliably ignore information placed in the middle of the context (the "lost in the middle" phenomenon documented since 2023). For RAG specifically, this means dumping 100 retrieved chunks into the prompt is worse than reranking to the top 10 and concatenating those. More context past the relevance frontier is negative value.
Related concepts.
Context-window planning is the practical face of token budget. When a workflow grows past the budget, the answer is rarely "move to a larger window" — it's reranking, summarization, or splitting into multiple calls. See also RAG for the canonical pattern that keeps context windows small.
Frequently asked.
- What is a context window?
- The context window is the maximum number of tokens a language model can process in a single call — the system prompt, the conversation history, the retrieved context, and the response itself all share this budget. Exceeding it truncates or rejects the call.
- How big are context windows in 2026?
- 200k tokens is the practical floor for production-grade models (Claude, GPT-5). Gemini 1.5/2.0 advertises up to ~2M. But quality degrades past ~100k for most workloads regardless of advertised limit. The honest planning number is 100k usable, not the headline maximum.
- Should I always use the longest available context window?
- No. Longer context costs more per call and quality often degrades past ~50k due to lost-in-the-middle. For RAG specifically, reranking to the top 10 chunks beats dumping 100 chunks into a long-context call. Big windows are a tool, not a strategy.
- What's 'lost in the middle'?
- A well-documented phenomenon where language models reliably under-attend to information placed in the middle of long contexts. Most prominent past 50k tokens. The practical implication: don't trust the model to find a needle in a haystack — give it the needle with minimal hay.