
A large language model is not a brain, a search engine, or a database. It's a stack of mathematical layers shaped in three phases to predict the next token well enough that the result looks like thought. Part 2 opens the box.
The AI Explainer, Part 2: Inside the LLM
A large language model is not a brain, a search engine, or a database. It's a stack of mathematical layers shaped in three distinct phases to predict the next token well enough that the result looks like thought. Each phase leaves a fingerprint on what the model can and cannot do, and most of the public confusion about LLMs traces back to collapsing those phases into a single fuzzy idea called "AI."
Part 2 of the trilogy opens the object. Tokens, not words. Pretraining as fluency-without-knowledge. Why prompts work at all. The post-training pass that turns a base model into an assistant. Why these things hallucinate, forget, and sound confident anyway. And why retrieval and memory are software features wrapped around the model, not properties of it. Six articles on the interior.
Large language models never see letters or words. They see integers from a vocabulary built by a separate program with its own bugs.
4 min readPretraining does one thing trillions of times: predict the next token. The result is a fluent predictor with patchy facts, and the gap between fluency and truth is where hallucinations live.
5 min readThe most surprising thing about GPT-3 wasn't that it scored higher on benchmarks. It was that you could teach it a new task by typing three examples into the prompt, and the labels didn't even need to be right.
4 min readThree stages of post-training turn a fluent text engine into a useful assistant. Behavior, not parameter count, made InstructGPT preferred over the 100x larger GPT-3.
5 min readHallucinations are not a bug. They are a structural consequence of training a model to predict plausible tokens.
5 min readWhen ChatGPT or Claude appears to recall what you said three weeks ago, the language model itself is doing none of the remembering. Three different systems get confused for one.
6 min read