Machine Learning Inverts How Programs Get Written
Machine learning isn't the computer learning like a human. It's an old mathematical idea that finally got enough data and compute to work at scale.
In 1997, a Carnegie Mellon professor named Tom Mitchell published a textbook called "Machine Learning." On page 2 he wrote one sentence that has held up for almost thirty years. A program learns from experience E with respect to a task T and a performance measure P if its performance at T, as measured by P, improves with E.
That is still the definition used in introductory courses today. It is also the cleanest way to puncture the marketing.
There is nothing in it about understanding. Nothing about consciousness. Nothing about reasoning. The system improves at a task with experience, measured by a metric. That's all it is.
The Flip
Compare it to ordinary software.
In ordinary software, you write rules and feed in data. The program produces answers.
In machine learning, you feed in data and the answers you want. The program produces the rules.
Pedro Domingos at the University of Washington calls this an inversion of the usual flow of programming. You are no longer writing the logic. You are curating the examples. The algorithm fits a function to those examples. That function is what people call a "model."
The model is the output of the training process. Not the input.
This sounds like a small distinction. It is not. It is the entire reason machine learning systems behave differently than the software that came before them. They are fitted approximations with error bars. They perform statistically well on data that looks like the data they were trained on, and they break in interesting ways on data that doesn't.
The Three Buckets
Machine learning comes in three classical flavors, distinguished by what the learner gets to see during training.
Supervised learning uses labeled examples. Each input comes paired with the correct output. The classic problems are image classification (this picture is a cat, this one is a dog) and spam detection (this email is spam, this one isn't). Most of the production ML systems in your life are supervised.
Unsupervised learning gets no labels. It is handed a pile of data and told to find structure. Usually that means clusters or low-dimensional representations.
Reinforcement learning uses a reward signal. The algorithm learns a policy that maximizes cumulative reward over time. DeepMind's AlphaGo and AlphaZero are reinforcement learning systems. The agent plays itself a billion times and figures out what works.
Andrew Ng's Machine Learning Specialization on Coursera, taught jointly by Stanford and DeepLearning.AI, uses this three-way split as the organizing structure for its intro course. Over 4.8 million people have taken some version of it since the original launched in 2012.
What Machine Learning Is Not
The most expensive misconception is that machine learning means "the computer learning the way a human learns."
It does not.
It is the computer fitting a mathematical function to examples and then applying that function to new inputs. There is no internal model of the world. There is no understanding. There is no intent.
Mitchell's definition is operational on purpose. Performance improves with experience, measured by a metric. Nothing in there requires consciousness or intuition or anything a human would call reasoning.
Google's Machine Learning Crash Course, which has been used as internal training for thousands of Google engineers, opens with this same framing. ML systems learn a model from examples rather than from explicit rules. That is the whole pitch.
The reason this misconception is expensive is that it sets people up to be either disappointed or terrified by ML systems for the wrong reasons. Disappointed because the system doesn't actually "get it." Terrified because they assume it does.
The Math Is Old
The second misconception worth puncturing is that machine learning is recent.
It is not.
Linear regression, the workhorse of supervised learning, was formalized by Adrien-Marie Legendre and Carl Friedrich Gauss in the early 1800s. Bayes' rule, which underpins most of probabilistic ML, was published posthumously in 1763. The mathematical scaffolding of machine learning is over two hundred years old.
What changed in the last 25 years was not the math. It was the data and the compute.
The internet generated training corpora nobody had in 1990. GPUs, originally built to render video games, turned out to be excellent at the kind of parallel linear algebra that ML needs. The same algorithms that had been theoretically known for decades suddenly worked at a useful scale.
This is one of the central lessons of the deep learning era and it is going to come back in the next article. Neural networks are not new. They became practical only when the surrounding infrastructure caught up.
So when someone tells you a system is "powered by machine learning," what they mean is this. Somebody fed a lot of labeled examples to an algorithm with roots in 19th-century statistics. The algorithm fit a function to those examples. The resulting function makes decent guesses on new inputs.
That is not nothing. It is a large fraction of what currently gets called artificial intelligence.
But it is not magic. And it is definitely not learning the way you learn.
Part of the AI Foundations series. Previous: What AI Actually Is.
Sources
- Mitchell, Tom M. "Machine Learning" (McGraw-Hill, 1997) (opens in new tab)
- Ng, Andrew, et al. "Machine Learning Specialization" (Stanford and DeepLearning.AI, Coursera) (opens in new tab)
- Google. "Machine Learning Crash Course" (opens in new tab)
- Russell, Stuart, and Peter Norvig. "Artificial Intelligence: A Modern Approach" (4th edition, Pearson, 2020) (opens in new tab)
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey Hinton. "ImageNet Classification with Deep Convolutional Neural Networks" (2012, NeurIPS) (opens in new tab)



