Agentic AI hype and the Dunning-Kruger effect - Version 1

A pattern emerges with every major technological advance, and AI is no exception.

You’ve probably seen both curves. The Gartner hype cycle tracks how a technology moves from excitement to disillusionment and eventually to something that actually works in practice. The Dunning-Kruger effect demonstrates how people with limited knowledge tend to overestimate how much they understand. They’re not just similar in shape. They describe the same dynamic, one at the individual level and one at the market level. Confidence outpaces reality, then reality catches up.

According to Gartner’s 2026 Hype Cycle for Agentic AI, Agentic AI sits squarely at the peak of inflated expectations. The upward slope of enlightenment is still ahead, but most teams haven’t quite got there yet.

And, as you’d expect at this stage, the ‘immediate’ experts are starting to proliferate.

The one-night expert problem

You’ve probably seen this in action. Someone runs a few prompts, ships a quick demo, and suddenly has a confident answer to problems they haven’t really investigated properly. We’ve seen this pattern with every tech wave, and this one is no different. The giveaway is usually a sentence that starts with “we should just build an agent”.

Customer service problem? Agent

Data processing bottleneck? Agent

Workflow that took months to map out? Agent

The paradox here: this is not always wrong. Agents can be very useful in the right context. The problem is deferring to an “agent” as the default answer before you’ve properly understood the problem.

Before you even think about architecture, you need clear answers to a few basics: what’s the task, what level of accuracy do you need, what failure modes are acceptable, what latency can you tolerate, and what does it cost at scale? Most agentic proposals fall apart on one of these, usually cost or accuracy.

Agents vs Agentic Workflows: a distinction worth making

There’s also a definitional issue that’s not helping. A proper autonomous agent has genuine autonomy. It plans steps towards a goal, decides which tools to use, and operates in environments where the inputs aren’t predictable. That’s not hard to build but getting it to work reliably in production is a different story and genuine examples of that are still rare.

What most teams are actually building is something different: an agentic workflow. A defined sequence of steps, with an LLM handling one or more of them, orchestrated by deterministic logic. That’s probably 90% of what is referred to an “agent” today.

And this is nothing new. Back in 2023, once LLMs were available via the main CSPs, you could wire up a Logic App or an RPA flow, drop in an LLM for the reasoning step, and call downstream APIs based on the output. That’s an agentic workflow. The tech was already there. The label came later.

There’s nothing wrong with this pattern. In many cases it’s the right solution. The problem is calling it an agent when this isn’t accurate. Once you do that, teams start adding planning loops, tool selection, multi agent coordination, things that were never needed. You end up with higher latency, higher cost, harder debugging, and often worse accuracy. Every extra moving part is another place things can break.

Where the trade-offs start to matter

The trade-offs that actually matter in production are usually three things.

First is accuracy, and this is where things tend to get more complex. “Does it work” isn’t a metric. You need to define what matters: precision, recall, F1, or something domain specific. In document extraction, missing a field is often worse than picking up the wrong one. In search, recall often matters more than precision. If you get this wrong at design time, you’ve already failed, and it happens more often than people admit
Second is latency. Real-time systems behave very differently from batch ones, and agents with reasoning loops are particularly sensitive here. A workflow that chains a few LLM calls, classification, reasoning, response generation, can easily hit 8 to 12 seconds end to end. That might be fine for overnight processing. It’s not fine for anything customer facing
Third is cost, and this is the one people tend to ignore early on. LLM calls cost money, and at scale that adds up quickly. Retries, failures, and reasoning loops make it worse. Agent-style architectures are especially exposed because every loop means more calls. It’s very easy to build something that looks fine in a demo and becomes expensive in production

What real expertise looks like

Real expertise in this regard isn’t understanding how to build an agent. It’s knowing when not to.

It’s understanding the full toolbox: classical machine learning (ML) where it’s enough, deterministic rules where requirements are stable, structured pipelines where reliability matters, LLMs where language understanding actually adds value, and agents only when you genuinely need autonomy.

Most systems currently labelled as “AI agents” would be simpler, cheaper, and more reliable as structured workflows with a well-scoped LLM step and some deterministic logic. It’s less exciting. But it also tends to work.

Getting out of the trough

We’re still early in this cycle, and the experimentation is healthy. Hype is a normal part of how new technologies get explored and funded. But if every problem results in the same immediate answer, we’re not advancing anything. We’re relabelling old patterns with new vocabulary and calling it innovation. Worse, we risk building systems that are slower, more expensive, and less accurate than simpler alternatives. Not because the technology failed, but because we never asked whether it was the right tool in the first place.

Getting out of the trough means understanding the problem properly, choosing the right approach, and being honest about the trade-offs. That’s what separates genuine understanding from chasing the latest shiny thing.