What Is a Token in AI?
A token is the basic unit of text that a language model reads, processes, and generates. Tokens can be whole words, parts of words (subwords), individual characters, or even punctuation marks. When you interact with an AI like GPT or Claude, your text is first broken into tokens before the model processes it.
How Tokenization Works
Why Not Just Use Words?
Using whole words as tokens creates problems:
- Huge vocabularies — Hundreds of thousands of unique words across languages
- Unknown words — Misspellings, technical terms, or new words can't be processed
- Inefficiency — Rare words consume the same space as common ones
Subword Tokenization
Modern LLMs use subword tokenization algorithms that split text into frequently occurring pieces:
- "unhappiness" → ["un", "happiness"] or ["un", "happ", "iness"]
- "ChatGPT" → ["Chat", "GPT"]
- "🚀" → [emoji token]
Common Tokenization Methods
| Method | Used By | Approach |
|---|---|---|
| Byte Pair Encoding (BPE) | GPT, LLaMA | Iteratively merges most frequent character pairs |
| WordPiece | BERT, Gemini | Similar to BPE but optimizes likelihood |
| SentencePiece | T5, LLaMA | Language-agnostic, works on raw text |
| Tiktoken | OpenAI models | Optimized BPE implementation |
Token Counts in Practice
A rough rule of thumb for English text:
- 1 token ≈ 4 characters or ¾ of a word
- 100 tokens ≈ 75 words
- 1,000 tokens ≈ 750 words (about 1.5 pages)
Different languages tokenize differently:
- English is relatively efficient (~1.3 tokens per word)
- Chinese, Japanese, Korean may use 1.5–2x more tokens per character
- Code typically uses more tokens than prose
Why Tokens Matter
Context Window
Every LLM has a maximum context window — the total number of tokens it can process at once. This includes both input and output tokens:
| Model | Context Window |
|---|---|
| GPT-4 | 128K tokens |
| Claude 3.5 | 200K tokens |
| Gemini 1.5 Pro | 1M+ tokens |
| LLaMA 3 | 128K tokens |
Cost
API pricing for LLMs is typically per token:
- Input tokens (your prompt) are usually cheaper
- Output tokens (model's response) are usually more expensive
- Efficient prompting directly reduces costs
Performance
- Fewer tokens = faster inference
- More context tokens = more relevant responses but slower processing
- Token efficiency affects both speed and cost at scale
Tokenization and AI Development
Understanding tokenization is essential for:
- Prompt engineering — Crafting efficient prompts within token limits
- RAG systems — Chunking documents into token-appropriate segments
- Cost optimization — Reducing unnecessary tokens in API calls
- Model evaluation — Comparing models with different tokenization schemes