Skip to main content

The Factory Approach to AI Workflows

Alex Blom

It feels like we're at the edge of something. AI models are getting genuinely capable of handling complex interactions, and we've moved from primary to high school. That's exciting. It's also tricky: exciting and reliable are not the same thing.

We started building reliable AI workflows, I hesitate to call them agents yet, in 2024. The biggest mistake I still see is people not understanding how to build a reliable solution.

Mega Prompts == Failure

It's easy to see a one-shot prompt run in chat, or focus on a select set of best-case outcomes, and call it success. This often starts with a giant prompt that tries to do everything in a single model call. After much coaxing, we get there.

But issues creep in. It isn't fully reliable. We can't observe what happened inside the call when something goes wrong. And it becomes a real problem the moment we start to rely on the output.

Breaking the Problem Into Atomic Steps

Instead of one giant prompt, break the work down into steps and make each step an atomic unit. Picture a factory line: one machine fits the wheel, the next paints the panel, the next bolts on the door. No single machine reasons about the whole car. It receives a specific input, performs one operation, passes the result on. That's how we build AI workflows.

Your prompts are the building blocks. Each one is a thinking unit: a small, self-contained piece of reasoning the AI does on your behalf. Write it once, and it slots into any workflow that needs that job done. The "extract pain points" unit works on call transcripts, support tickets, and sales emails. Update the prompt and every workflow using it updates with it.

Factory-style workflow: Fathom call debrief

The Factory Approach Lets Us Build Reliable Solutions

Once the work is broken down this way, a lot of good things fall out of the structure:

  • The thinking is constrained. Each step has a narrow job, so the model has less room to drift. LLM blips are smaller and easier to catch.
  • Upgrades are routine. Reasoning is locked into the structure of the workflow, not the model. When a new model lands, we swap it into the steps that benefit and leave the rest alone. No full retest for drift.
  • It runs the same way thousands of times. The line is the line. Same inputs, same steps, same output, every time.
  • Failure is measurable. Each step has a defined input and output, so we can test it on its own and know exactly which station produced the defect.
  • Parts are reusable. A thinking unit built for one workflow becomes inventory for the next. Every workflow you build makes the next one cheaper.

Newer models like Claude Opus 4 can handle more complex instructions in a single call than anything before them. The temptation is to push more into the prompt because the model can technically take it. We still pull in the opposite direction.

Not because the models aren't capable (that is getting scary now). It's because dependability comes from structure. A factory produces the same car every time because the line is the same every time. An AI workflow produces the same output every time because the steps are the same every time.

Stay in the loop

Product updates, tutorials, and AI insights. No spam.