What Is Chunking?
Chunking is the process of breaking down large documents into smaller, manageable segments (chunks) that can be individually embedded and retrieved in AI systems. It's a critical step in Retrieval-Augmented Generation (RAG) pipelines — the quality of chunking directly impacts the relevance and accuracy of retrieved information.
Why Chunking Matters
Documents can be thousands of pages long, but embedding models and LLM context windows have limits. Chunking solves this by:
- Enabling embedding — Most embedding models have token limits (512-8192 tokens)
- Improving precision — Smaller chunks return more targeted, relevant results
- Managing context — Only relevant portions are sent to the LLM, saving tokens and cost
Chunking Strategies
Fixed-Size Chunking
Split documents into chunks of a predetermined size (e.g., 500 tokens).
- Pros: Simple, consistent chunk sizes
- Cons: May split sentences or ideas mid-thought
Recursive Character/Text Splitting
Split by paragraphs first, then sentences, then words if chunks are still too large.
- Pros: Respects natural text boundaries
- Cons: Variable chunk sizes
Semantic Chunking
Use an embedding model to detect topic boundaries and split at semantic shifts.
- Pros: Preserves topical coherence
- Cons: More computationally expensive
Document-Structure-Based
Split based on document structure (headings, sections, pages).
- Pros: Preserves document organization
- Cons: Sections may vary greatly in size
Agentic/Contextual Chunking
Use an LLM to intelligently decide how to split and add context summaries to each chunk.
- Pros: Highest quality, preserves context
- Cons: Slow and expensive at scale
Key Chunking Parameters
| Parameter | Description | Typical Range |
|---|---|---|
| Chunk Size | Number of tokens per chunk | 256-1024 tokens |
| Chunk Overlap | Shared tokens between adjacent chunks | 50-200 tokens |
| Separator | What to split on (paragraph, sentence, character) | Varies |
Impact on Retrieval Quality
| Chunk Size | Precision | Recall | Best For |
|---|---|---|---|
| Small (128-256 tokens) | High | Lower | Specific factual queries |
| Medium (512-1024 tokens) | Balanced | Balanced | General Q&A |
| Large (1024-2048 tokens) | Lower | Higher | Complex, multi-fact queries |
Best Practices
- Add Overlap — Prevents losing context at chunk boundaries
- Preserve Metadata — Keep source document, page number, section title with each chunk
- Test and Iterate — Evaluate retrieval quality with different chunk sizes for your specific data
- Consider Multi-Strategy — Different document types may benefit from different chunking approaches
- Add Context — Prepend section titles or document summaries to each chunk