What Is a Token in AI? How LLMs Process Text

A token is the basic unit of text that a language model reads, processes, and generates. Tokens can be whole words, parts of words (subwords), individual characters, or even punctuation marks. When you interact with an AI like GPT or Claude, your text is first broken into tokens before the model processes it.

How Tokenization Works

Why Not Just Use Words?

Using whole words as tokens creates problems:

Huge vocabularies — Hundreds of thousands of unique words across languages
Unknown words — Misspellings, technical terms, or new words can't be processed
Inefficiency — Rare words consume the same space as common ones

Subword Tokenization

Modern LLMs use subword tokenization algorithms that split text into frequently occurring pieces:

"unhappiness" → ["un", "happiness"] or ["un", "happ", "iness"]
"ChatGPT" → ["Chat", "GPT"]
"🚀" → [emoji token]

Common Tokenization Methods

Method	Used By	Approach
Byte Pair Encoding (BPE)	GPT, LLaMA	Iteratively merges most frequent character pairs
WordPiece	BERT, Gemini	Similar to BPE but optimizes likelihood
SentencePiece	T5, LLaMA	Language-agnostic, works on raw text
Tiktoken	OpenAI models	Optimized BPE implementation

Token Counts in Practice

A rough rule of thumb for English text:

1 token ≈ 4 characters or ¾ of a word
100 tokens ≈ 75 words
1,000 tokens ≈ 750 words (about 1.5 pages)

Different languages tokenize differently:

English is relatively efficient (~1.3 tokens per word)
Chinese, Japanese, Korean may use 1.5–2x more tokens per character
Code typically uses more tokens than prose

Why Tokens Matter

Context Window

Every LLM has a maximum context window — the total number of tokens it can process at once. This includes both input and output tokens:

Model	Context Window
GPT-4	128K tokens
Claude 3.5	200K tokens
Gemini 1.5 Pro	1M+ tokens
LLaMA 3	128K tokens

Cost

API pricing for LLMs is typically per token:

Input tokens (your prompt) are usually cheaper
Output tokens (model's response) are usually more expensive
Efficient prompting directly reduces costs

Performance

Fewer tokens = faster inference
More context tokens = more relevant responses but slower processing
Token efficiency affects both speed and cost at scale

Tokenization and AI Development

Understanding tokenization is essential for:

Prompt engineering — Crafting efficient prompts within token limits
RAG systems — Chunking documents into token-appropriate segments
Cost optimization — Reducing unnecessary tokens in API calls
Model evaluation — Comparing models with different tokenization schemes

Cookie Preferences

What Is a Token in AI?