Large Language Models

Large Language Models & LLM Insights

Explore large language model architectures, fine-tuning strategies, prompt engineering, and how LLMs power modern AI applications.

9 of 92 articles

7 min read15Apr 24, 2026

Model Latency Profiles by Provider: TTFT, TPS, and p99 in 2026

Headline tokens-per-second numbers hide what matters. The 2026 latency profiles by provider — TTFT, TPS, and p99 — for production planning.

8 min read8Apr 24, 2026

Mixture of Depths lets models skip layers for easy tokens and spend compute on hard tokens. The 2026 implementations and what they save.

8 min read74Apr 24, 2026

By 2026, sub-10B models beat 2024-era GPT-4 on most benchmarks. The Phi-4, Gemma-3, and SmolLM-3 family compared head-to-head.

$The Transformer Math Behind Long-Context: Cost vs Capability$

7 min read10Apr 24, 2026

Why long context is expensive, where the cost shows up, and the 2026 tricks that let frontier models serve million-token windows.

8 min read6Apr 24, 2026

The evolution of attention from the original transformer to 2026's multi-query and grouped-query variants — what changed and why it matters.

8 min read11Apr 24, 2026

The four major LLM ecosystems in 2026 compared on production trade-offs — quality, cost, latency, ecosystem, governance.

7 min read8Apr 24, 2026

Sparse attention patterns are back in production for long-context inference. The 2026 implementations and where each pattern wins.

9 min read56Apr 24, 2026

Mamba-3 and the state-space-model family now power production deployments. Where they beat transformers, where they lose, and what's next.

Showing 9 of 92