Most 'AI' in Production Is a Workflow

In December 2024, two engineers at Anthropic, Erik Schluntz and Barry Zhang, published a piece called "Building Effective Agents." Most of it is an argument against building agents.

The core line: "Agents trade latency and cost for better task performance, and you should consider when this tradeoff makes sense." Translation: don't reach for the loop when a function call will do.

The post catalogs five workflow patterns. Prompt chaining runs LLM calls in a fixed sequence, each feeding the next, with optional gates in between. Routing classifies an input and dispatches it to a specialized prompt. Parallelization fans out subtasks and aggregates the results. Orchestrator-workers uses one LLM to break a problem down and dispatch sub-prompts. Evaluator-optimizer loops a generator against a critic until the critic accepts.

Notice what these have in common. The control flow lives in code. The LLM is a callable function that returns text. The harness decides what to do with the text.

An agent, by Anthropic's definition, is different. It "operates independently for extended periods" and uses tool calls in a loop the LLM itself controls. The loop runs until the model emits a stop signal, hits a step limit, or a human intervenes. Schluntz and Zhang are explicit that agents earn their cost only on "open-ended problems where it's difficult or impossible to predict the required number of steps, and where you can't hardcode a fixed path." Almost everything else is a workflow.

The Klarna Story

The biggest "AI replaced humans" headline of 2024 was Klarna. The Klarna AI Assistant, built on OpenAI's API, went live globally in early 2024 and handled 2.3 million conversations in its first month. Klarna's own press release framed that as the equivalent of 700 full-time agents. Average resolution time dropped from 11 minutes to under 2 minutes. Repeat inquiries fell 25%.

That looks like an agent story. It's not.

The deployed system is a workflow. A router classifies the message. A small set of tool calls handles specific actions. The escalation triggers are coded. The script boundaries are coded. The model produces language. The harness produces decisions.

By mid-2025, Klarna walked the all-AI strategy back and reintroduced human agents for higher-friction cases. The hybrid is still mostly a workflow. The win came from getting the routing and the tool registry right, not from letting the model run open-ended.

Copilot Is a Workflow Too

GitHub Copilot launched as a technical preview on 29 June 2021 and reached general availability in June 2022. For most of its history it was a single-shot prompt. The IDE captures the cursor context. The editor sends a structured request. The model returns completions. The editor inserts them.

No loop. No tool selection by the model. No goal-directed planning. The product won on context construction, not autonomy.

Copilot's later "edit" and "agent" modes added small loops. But the bulk of the product's success was built on a workflow so tight that most users never thought of it as one. The model was a frozen function. The IDE was the harness.

Time Horizons and the Tradeoff

The case for restraint gets sharper as task length grows. METR, a nonprofit that benchmarks frontier models, has been publishing a "time horizon" measurement: how long a task is for a human, given that the model completes it with 50% reliability. Their March 2025 paper showed that time horizons have grown roughly exponentially since 2020. The best 2025 models could handle 30 to 60 minute tasks on average.

By the January 2026 update, the horizon on some software tasks had pushed past several hours.

Two things are true at once. The trajectory is real. The variance is brutal. A workflow that runs in 5 seconds with 99% reliability beats an agent that runs in 5 minutes with 70% reliability for any high-volume use case. The agent earns its cost when the task is genuinely open-ended and a one-off failure is cheap. The workflow earns its cost everywhere else.

The workflow is not a downgrade. It's a different product. The harness around the model is doing real work. Classifying, routing, gating, retrying, escalating. That work has a name, and the name is engineering. The LLM is just one of the function calls.

The model is not the product. The workflow is the product. Most of the time, the loop is overkill.

Part of The Harness Layer series.

This article reflects the state of AI tooling as of May 2026. Specific framework names, APIs, and vendor positioning may have shifted.

Sources

The Klarna Story

That looks like an agent story. It's not.

Copilot Is a Workflow Too

No loop. No tool selection by the model. No goal-directed planning. The product won on context construction, not autonomy.

Time Horizons and the Tradeoff

By the January 2026 update, the horizon on some software tasks had pushed past several hours.

The model is not the product. The workflow is the product. Most of the time, the loop is overkill.

Part of The Harness Layer series.

This article reflects the state of AI tooling as of May 2026. Specific framework names, APIs, and vendor positioning may have shifted.

Most 'AI' in Production Is a Workflow

The Klarna Story

Copilot Is a Workflow Too

Time Horizons and the Tradeoff

Sources

Keep reading

An Agent Is Not a Smarter Model

The Tool Loop and Who Drives It

The Dispatch Layer Is the Part of AI Agents I Trust

Most 'AI' in Production Is a Workflow

The Klarna Story

Copilot Is a Workflow Too

Time Horizons and the Tradeoff

Sources

Keep reading

An Agent Is Not a Smarter Model

The Tool Loop and Who Drives It

The Dispatch Layer Is the Part of AI Agents I Trust