What Is a Context Window? Understanding LLM Memory Limits

A context window (also called context length) is the maximum number of tokens that a large language model can process in a single interaction. It defines the total "working memory" of the model — encompassing the system prompt, conversation history, retrieved documents, and the generated response. Any information beyond the context window is invisible to the model.

Why Context Windows Matter

The context window directly impacts what an LLM can do:

Small window (4K tokens) — Can handle short conversations and simple queries
Medium window (32K-128K tokens) — Can process long documents, code files, or extended conversations
Large window (200K-1M+ tokens) — Can analyze entire books, codebases, or massive document collections

Context Window Sizes

Model	Context Window	Approximate Pages
GPT-3.5	4K / 16K tokens	3–12 pages
GPT-4	128K tokens	~96 pages
Claude 3.5 Sonnet	200K tokens	~150 pages
Gemini 1.5 Pro	1M+ tokens	~750+ pages
LLaMA 3	128K tokens	~96 pages

How Context Windows Work

Input + Output = Total Context

The context window includes everything — your prompt, system instructions, conversation history, retrieved documents, AND the model's response. A 128K context window means the sum of all input and output tokens cannot exceed 128K.

Attention Mechanism

The transformer architecture processes all tokens in the context window through self-attention, where every token can attend to every other token. This is why longer context windows are computationally expensive — the cost scales quadratically with length.

Context Window vs. Memory

LLMs have no true long-term memory — the context window is their entire "working memory." Once a conversation exceeds the context window, earlier messages are either:

Truncated (removed from the start)
Summarized (compressed into shorter form)
Lost entirely

Managing Context Effectively

Retrieval-Augmented Generation (RAG) — Retrieve only relevant information instead of stuffing everything into context
Conversation Summarization — Periodically summarize long conversations to save space
Chunking — Break large documents into relevant sections and only include what's needed
System Prompt Optimization — Keep system instructions concise
Priority Ordering — Place the most important information where the model attends most strongly

The "Lost in the Middle" Problem

Research shows that LLMs pay more attention to information at the beginning and end of the context window, sometimes missing crucial details in the middle. This means placement of information within the context matters, not just whether it's included.

Cookie Preferences

What Is a Context Window?