LLM Tokenization Advances: BPE, SentencePiece, and the Quest for Better Tokenizers
A technical deep dive into how modern LLM tokenizers work, the tradeoffs between BPE and SentencePiece, and emerging approaches that improve multilingual and code performance.