AI Agents and Consistency

Alex Blom

June 1, 20265 min read

Agents are yet again topic of the month. I use agents pretty heavily and you should feel pretty comfortable betting on the fact that agents will continue improving. There is something wild watching an LLM direct a fleet of LLMs to completion on a task. Functionally the way an agent completes work comes with tradeoffs - and they behave very differently to a more deterministic workflow.

I touched on the Agentic vs Deterministic distinction lightly on a recent blog. In short: deterministic means the steps and control flow are defined in code - the model only does small defined tasks within that flow and it runs the same way every time. Agentic means we hand the model the goal and let it control the flow at runtime.

So if we look a little bit closer at Agents - how do they actually work?

You give instructions and behaviour, not a binding spec. You hand an Agent the goal, some context and as much process as you like. It will generally follow those instructions - and with the right model selection and feedback loop will take a task to completion. The instructions are closer to strong suggestions than orders and you can not rely 100% on the accurate following of all steps.
Problem solving at run time Generally speaking Agents have the ability to deviate from their planned steps and sometimes even write new tooling to solve problems. Did an API call to our CRM fail? No problems there is a cached version in a spreadsheet. This type of live decision making is what makes Agentic cases valuable in my mind - especially when it comes to edge cases we could not map. The behaviour mapping shines in these cases too.
Over Delivery. Following on - agents will often surprise us by over-delivering. Did we forget to ask for phone numbers on a list of hotels? The agent is probably going to add them anyway. This over-reach can be useful in the right cases.
Made for Orchestration. Agents really earn their place managing the bigger picture and directing moving pieces and subtasks, which is often used to pursue bigger goals that would usually eat too much context.

And this all sounds really good in practice. But once you've been running agents in production for long enough, you bump into enough of the downsides to think about it some more:

Variable decision pathing. Even on a stable model an agent can take a different path each time. This is what powers our broader reach, but it also usually means a higher failure rate. A 3% failure on a process is not a process. My personal experience is that hallucinations are more likely in this world, and debugging the source can be a nightmare.
There are off days. When you dig into it, a model is not really a performance guarantee. The backend compute provided often impacts performance and end users have no control - so the model is more of a guidance. We've all felt the Claude is watered down days. In agent land, where the LLM is making determinations and controlling flow, that lobotomy impacts our results.
New models are new personalities. Opus 3 to 4 to 4.7 aren't the same model so much as they are different workers with different personalities, skills and judgement. Sonnet went from doing exactly what you said to proactively handling edge cases. When I've passed decision-making to my agent, a major model change means a different "solve" on the problem and I need to constantly guard against it. Which leads into the next one.
Maintenance and durability can be an issue. If I relied on decision-making that has fundamentally changed, I'm rewriting flows more regularly. For basic process workflows this is the biggest reason I like to stay in deterministic land where I can - fewer redos.
Cost and speed. We all know this but it's worth stating: in most cases an Agent will burn more tokens and be slower to an end goal. Most of the time we don't care - until you accidentally spend half a billion on Claude credits.

These all vector to the same point: in agent land the machinery changes underneath you. That impacts the cost and performance without much you can do about it - but in exchange you can reach for bigger and less defined goals. I think there's also a blunt reality that most processes start as deterministic tasks. If you can whiteboard the process, it's probably deterministic first and running it as an agent is increasing risk for no gain.

One thing we're doing more and more is using agents to orchestrate larger actions - with a supporting list of deterministic tasks they can perform to stay on the rails. The agent does what it's good at, holding the goal and deciding what to do next, and the deterministic tools do what they're good at, running the same way every time. The freedom stays up top where it's useful and the load-bearing work sits underneath full of constraints and guards.

That payoff works because a lot of our partner orgs have already invested in enough deterministic workflows that agents now have useful, trusted tools to reach for. An agent with no reliable tools underneath leads to the variance with none of the safety. The orgs who did the boring deterministic groundwork first that get to use agents well now and move one step closer to nirvana. Which is the slightly unsexy takeaway: the way you earn good agents is by building the boring un-agentic stuff first.

Stay in the loop