An Agent Is Not a Smarter Model

A chatbot is a model plus a thin prompt. A coding agent is a model plus a tool loop, a state machine, a sandbox, and a reviewer running in another model call. The model is the same. The system is not.

What people call "AI" in mid-2026 is mostly the system. The model is a frozen function. Tokens in, token probabilities out. Everything else, memory, retries, tool calls, approval gates, sandboxes, tracing, lives in code wrapped around that function. That wrapping is what this series is about.

The Anthropic Line

On 19 December 2024, two engineers at Anthropic, Erik Schluntz and Barry Zhang, published a post called "Building Effective Agents." They drew a line that's still the cleanest one I've seen. A workflow is a system where multiple LLM calls are orchestrated using predefined paths in code. An agent is a system where the LLM "dynamically directs its own processes and tool usage, maintaining control over how it accomplishes tasks." Same model. Different software around it.

ReAct, 2022

The pattern under all of this goes back further. On 6 October 2022, Shunyu Yao at Princeton and a team at Google Research posted ReAct: "Reason and Act." The idea was small and load-bearing. Instead of asking a model for a final answer, ask it to interleave thought traces with discrete actions. Each action returns an observation. The observation feeds the next prompt. Thought, action, observation, new thought. Loop.

Concretely: the model emits text that looks like "I should search for the capital of France." Code parses that, calls a search API, gets back "Paris." The string is appended to the prompt as an observation. The model is called again. Now it has the new context. It writes the next thought, or decides it's done.

On HotpotQA, a multi-hop question-answering benchmark, ReAct with a Wikipedia search tool cut hallucination on factual tasks. On ALFWorld and WebShop, two interactive decision benchmarks, it beat the imitation and reinforcement-learning baselines by 34 and 10 percentage points respectively. Every modern agent loop is a variation on this cycle. Most of the time the thought trace is hidden from the user and the actions live behind a tool-calling API. The bones are the same.

The Model Has No Memory

The model itself is stateless. Each call is a separate function invocation. Prompt in, sample out. Memory, history, any sense of "continuing a task," all of that lives in code that builds the next prompt. The model has no idea it's in a loop. The harness does.

This is why "agent" is best understood as a software-architecture term, not a model term. The same model weights can be wired into a one-shot completer, a multi-step ReAct loop, or a long-horizon planner that runs for hours. The model is the same in every case. The system is not.

The Word Has No Definition Yet

The terminology is unsettled. OpenAI ships the Agents SDK, with a v2 release in April 2025 that added built-in memory and sandbox-aware orchestration. Anthropic renamed its Claude Code SDK to the Claude Agent SDK in 2025. LangChain calls things "agents" when the LLM directs the loop, and "chains" or "graphs" when control flow is fixed in code. None of these vendor labels carry a precise technical definition that survives across product surfaces.

Aaron Levie at Box has compared the moment to the early days of "cloud" as a label. Technically loose, commercially load-bearing, eventually stabilized by what survives in production. We're in that phase. "Agent" is a marketing word pointing at a spectrum. If you want to know what you're actually buying, read the control-flow code.

This matters in practice. If you can trace the code path on a whiteboard, it's a workflow. If the LLM picks the next step at runtime and the path is different on every run, it's an agent. Both are useful. They have different failure modes. They cost different amounts. They need different evaluation. Calling them the same thing makes the engineering harder.

I build with these APIs. I'm not at the frontier of agent research. But I've written enough code against them to see what changes when you wrap a model in a loop. What surprised me wasn't the model. It was the loop. The retry logic. The tool definitions. The parsing. The error handling when a tool call comes back malformed. The decision about when to stop. Most of what feels "agentic" to a user is the code around the model call, not the call itself.

For the rest of this series, when I say "agent," I mean a system in which the LLM, not fixed code, decides at runtime which tool to call next. Everything else is a workflow. The model is the same in both. What you're building, and what's actually doing the work, is everything that wraps around it.

Part of The Harness Layer series.

This article reflects the state of AI tooling as of May 2026. Specific framework names, APIs, and vendor positioning may have shifted.

Sources

The Anthropic Line

ReAct, 2022

The Model Has No Memory

The Word Has No Definition Yet

Part of The Harness Layer series.

This article reflects the state of AI tooling as of May 2026. Specific framework names, APIs, and vendor positioning may have shifted.

An Agent Is Not a Smarter Model

The Anthropic Line

ReAct, 2022

The Model Has No Memory

The Word Has No Definition Yet

Sources

Keep reading

The Tool Loop and Who Drives It

Most 'AI' in Production Is a Workflow

Random Labels Work Almost As Well

An Agent Is Not a Smarter Model

The Anthropic Line

ReAct, 2022

The Model Has No Memory

The Word Has No Definition Yet

Sources

Keep reading

The Tool Loop and Who Drives It

Most 'AI' in Production Is a Workflow

Random Labels Work Almost As Well