What Is Chunking? Breaking Documents for AI Retrieval

Chunking is the process of breaking down large documents into smaller, manageable segments (chunks) that can be individually embedded and retrieved in AI systems. It's a critical step in Retrieval-Augmented Generation (RAG) pipelines — the quality of chunking directly impacts the relevance and accuracy of retrieved information.

Why Chunking Matters

Documents can be thousands of pages long, but embedding models and LLM context windows have limits. Chunking solves this by:

Enabling embedding — Most embedding models have token limits (512-8192 tokens)
Improving precision — Smaller chunks return more targeted, relevant results
Managing context — Only relevant portions are sent to the LLM, saving tokens and cost

Chunking Strategies

Fixed-Size Chunking

Split documents into chunks of a predetermined size (e.g., 500 tokens).

Pros: Simple, consistent chunk sizes
Cons: May split sentences or ideas mid-thought

Recursive Character/Text Splitting

Split by paragraphs first, then sentences, then words if chunks are still too large.

Pros: Respects natural text boundaries
Cons: Variable chunk sizes

Semantic Chunking

Use an embedding model to detect topic boundaries and split at semantic shifts.

Pros: Preserves topical coherence
Cons: More computationally expensive

Document-Structure-Based

Split based on document structure (headings, sections, pages).

Pros: Preserves document organization
Cons: Sections may vary greatly in size

Agentic/Contextual Chunking

Use an LLM to intelligently decide how to split and add context summaries to each chunk.

Pros: Highest quality, preserves context
Cons: Slow and expensive at scale

Key Chunking Parameters

Parameter	Description	Typical Range
Chunk Size	Number of tokens per chunk	256-1024 tokens
Chunk Overlap	Shared tokens between adjacent chunks	50-200 tokens
Separator	What to split on (paragraph, sentence, character)	Varies

Impact on Retrieval Quality

Chunk Size	Precision	Recall	Best For
Small (128-256 tokens)	High	Lower	Specific factual queries
Medium (512-1024 tokens)	Balanced	Balanced	General Q&A
Large (1024-2048 tokens)	Lower	Higher	Complex, multi-fact queries

Best Practices

Add Overlap — Prevents losing context at chunk boundaries
Preserve Metadata — Keep source document, page number, section title with each chunk
Test and Iterate — Evaluate retrieval quality with different chunk sizes for your specific data
Consider Multi-Strategy — Different document types may benefit from different chunking approaches
Add Context — Prepend section titles or document summaries to each chunk

Cookie Preferences

What Is Chunking?