Random Labels Work Almost As Well
The most surprising thing about GPT-3 wasn't that it scored higher on benchmarks. It was that you could teach it a new task by typing three examples into the prompt, and the labels didn't even need to be right.
In 2020, OpenAI released the GPT-3 paper. The headline most people remember is "it got bigger and scored higher." That wasn't the strange part.
The strange part was buried in the middle. Brown and the rest of the team showed that a model trained only to predict the next token could do a new task at test time given nothing but three or four examples in the prompt. No fine-tuning. No weight updates. You typed examples. It did the task. They called it in-context learning.
In practice it looks like this. You type "English: cat. French: chat. English: dog. French: chien. English: house. French:" and the model writes "maison." Nobody trained the model on that specific task in that specific format. It still produces the right answer.
That's what most people are actually doing when they prompt. They write instructions. They paste a couple of examples. They expect the model to "learn." And it does, sort of. The question nobody asked for years is why.
The Min Result
In 2022, Sewon Min and a group at the University of Washington, Meta, and the Allen Institute ran the experiment that should have made more headlines than it did. They tested 12 language models, including the whole GPT-3 family. The setup was the standard few-shot prompt. Three or four examples, then a new question.
Then they swapped the labels in the examples for random ones.
The model's accuracy barely moved.
Read that again. You can take a sentiment classification prompt, label half the positive examples "negative" and half the negative examples "positive," and the model gets close to the same score as if you had labeled them correctly. Whatever the examples are doing, they aren't teaching the model to map inputs to labels. The mapping is already there from pretraining.
What the demonstrations actually do, Min argues, is communicate three things. The space of possible labels. The kind of text it's looking at. The format the output should come in. They orient the model toward the task. They don't install the task.
The Lampinen Counterweight
It's not that demonstrations do nothing. Andrew Lampinen and a group at DeepMind ran the next test. They added natural-language explanations to the in-context examples. Things like "this is positive because the speaker calls the food delicious." On harder reasoning tasks, the explanations helped. And the lift was larger on bigger models.
So there's a real signal in well-constructed prompts. It's just not the signal most people assume. The model isn't learning the rule from the examples. It's using the explanation as extra context for which prior to lean on. And the ability to use that extra context is itself something that emerges as the model gets larger.
What The Model Is Doing Inside
This is where the honesty has to start. Nobody fully knows.
The leading hypothesis comes from Johannes von Oswald and a group at ETH Zurich and Google. Their paper argues that the attention layers, when you feed them in-context examples, end up running something that approximates gradient descent. The model is, in a sense, training itself on the prompt during a single forward pass. They showed this clearly for linear regression in small toy transformers. Whether it scales up to what GPT-4 is doing with a fifty-shot prompt is an open question.
The other line of evidence comes from Anthropic's interpretability team. Catherine Olsson and her group identified something called an induction head. It's a pair of attention heads that finds a repeated pattern in the input and completes it. When you train a transformer, induction heads appear at a specific point, and that's the same point where in-context learning jumps in capability. They're a real circuit. They explain part of it. They don't explain the whole thing.
So the picture is a robust empirical phenomenon, a partial mechanistic story, and a lot of active research.
What This Means For Anyone Prompting
The practical takeaway is steadier than the theory.
Examples help, but the labels don't need to be perfect. The format matters more than the labels. Clear task descriptions help. Explanations help more on hard tasks and bigger models. None of it changes the model's weights. None of it adds knowledge the model didn't already see during pretraining.
If you find yourself writing a prompt to make the model "learn" something it doesn't know, you're using the wrong tool. In-context learning is selection, not education. You're pointing the model at a pattern it already has somewhere in its weights and asking it to project that pattern onto your input.
That's a useful thing. It's also a smaller thing than the word "learning" makes it sound. Once you see the shape of what's happening, you stop expecting the model to learn things from your prompt that it never saw during pretraining, and you start writing prompts that select rather than teach.
Part of the Inside the LLM series.
This article reflects the state of AI tooling as of May 2026. Specific framework names, APIs, and vendor positioning may have shifted.
Sources
- Brown, Tom B. et al. "Language Models are Few-Shot Learners" (2020, NeurIPS) (opens in new tab)
- Min, Sewon et al. "Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?" (2022, EMNLP) (opens in new tab)
- Lampinen, Andrew K. et al. "Can language models learn from explanations in context?" (2022, EMNLP Findings) (opens in new tab)
- von Oswald, Johannes et al. "Transformers learn in-context by gradient descent" (2023, ICML) (opens in new tab)
- Olsson, Catherine et al. "In-context Learning and Induction Heads" (2022, Transformer Circuits Thread) (opens in new tab)



