Agent Architecture

A Practical Taxonomy for Agentic AI Systems

Most taxonomies for agentic AI sort solutions by department or hype-cycle stage. Useful for a budget conversation. Useless for a design doc.

Most taxonomies for agentic AI sort solutions by department — customer service, IT, sales — or by hype-cycle stage. Useful for a budget conversation. Useless for a design doc. As a solutions architect, the question I actually need answered isn't "what does this do," it's "what decides what it does next" — because that's the thing that determines how you test it, how you govern it, and how much you can stop watching it once it's in production.

Here's the taxonomy I landed on, and — more usefully — how it broke the first three times I tried to draw it.

Where I Started: Three Tiers

My first cut was simple, and it's still the right starting instinct:

Conversational — a human or another agent on one side, turn-based interaction.
Workflow / graph-based — semi-dynamic or dynamic node selection, but the structure is deterministic and governed.
Autonomous — give it a goal, it finds its own tools and agents, and comes back for approval only where enterprise policy requires it.

This spine is correct in spirit. It just doesn't survive contact with real systems, and the places it breaks are exactly where the interesting design decisions live.

Where It Breaks

"Conversational" describes an interface, not an architecture. Chat is a turn-taking pattern — it tells you nothing about what's running behind it. You can put a chat window on top of a fully governed graph: the user types a request, it triggers a deterministic multi-step workflow, and the chat is just the front door. You can also have a fully autonomous agent with no conversational surface at all — an SRE agent that wakes up on a paging alert, investigates on its own, and only ever posts a Slack message as a side effect, never as a back-and-forth. Interface and control architecture are correlated, not identical, and conflating them is the first thing that goes wrong.

Multi-agent isn't exclusive to the autonomous tier. "Finds other agents and tools" implies runtime discovery — the system locating capability it wasn't wired to ahead of time. But a workflow graph can fan out to five specialist sub-agents in a fixed, known sequence and still be fully deterministic at the topology level. That's more nodes, not less governance. Fixed-roster multi-agent and dynamic-discovery multi-agent are different risk classes entirely, and the original three-tier model buries that distinction inside tier three.

The real battleground is the "semi-dynamic" graph. This is the qualifier doing the most work in the original framing, and for good reason: graphs where node selection is LLM-decided but the set of possible transitions is fixed and known ahead of time give you most of the flexibility people want from "agentic" without losing testability or an audit trail. My bet is that most production enterprise systems live here for years — not because the technology can't go further, but because this is where you get a bounded blast radius.

"Autonomous" in practice usually means autonomous inside a box. "Give it a goal and let it figure it out" describes the capability correctly, but enterprise deployments almost always scope that goal tightly — "resolve this billing issue using any of these twelve tools" — rather than truly open-ended pursuit across the organization. The approval checkpoints aren't an exception bolted onto autonomy; they're the mechanism that defines the box. Governance hasn't disappeared in tier three — it's moved from the control-flow layer to a policy and approval layer sitting outside it.

The Fault Line That Actually Matters: Can You Enumerate It?

Once you stress-test the three tiers against these edge cases, a single distinction does almost all of the explanatory work: can you enumerate, ahead of time, everything the system might call?

Bounded scopes, fixed graphs, and bounded-autonomy agents with a known toolset all share this property — you can write down, before deployment, the complete space of actions the system is capable of taking. That's what makes pre-deployment testing possible and what makes a post-incident audit mean something. The moment a system can discover and call an agent or tool it wasn't given at design time, that property is gone. You're no longer reviewing a system; you're reviewing a system's capacity to acquire systems. That's not "more autonomous" — it's a different category of governance problem, and almost none of the tooling for it (inter-agent trust, capability negotiation, cost containment on recursive delegation) is mature yet.

Four Tiers, Redrawn

flowchart TD A["Can you\nenumerate ahead\nof time every\ntool, agent, or\nsystem this\ncomponent\nmight call?"] F{{"What's the topology?"}} B["Tier 1\nBounded\nResponse\nfixed scope\nno graph"] C["Tier 2\nStructured\nGraph\nfixed topology\nLLM picks\nthe path"] D["Tier 3\nBounded\nAutonomy\nfixed toolset\nagent plans\nthe sequence"] E["Tier 4\nOpen-Ended\nEmergent\nruntime discovery\nrecursive\ndelegation"] style A fill:#4A90D9,stroke:#2c6fad,color:#fff style F fill:#777777,stroke:#444444,color:#fff style B fill:#5BAD6F,stroke:#3d8a52,color:#fff style C fill:#F5A623,stroke:#c97d00,color:#fff style D fill:#E05C5C,stroke:#b83c3c,color:#fff style E fill:#7B68EE,stroke:#5A4FCF,color:#fff A -- "Yes" --> F F -- "Single scope" --> B F -- "Fixed graph" --> C F -- "Fixed roster + goal" --> D A -- "No" --> E

Figure: the dividing line that predicts governance burden isn't chatbot-vs-workflow-vs-autonomous — it's whether the action surface can be enumerated ahead of time.

Tier	What controls the path	Where governance lives	Example
1. Bounded response	A fixed scope, not a graph — retrieve, reason once, respond or act within a tightly defined surface	Prompt and retrieval scope	FAQ resolution, document Q&A, single-tool lookups
2. Structured graph	A fixed, known set of nodes and transitions; the path through them can be LLM-decided, but the set of possible paths is fixed and testable in advance	The graph topology itself	Claims processing, onboarding, multi-step approvals
3. Bounded autonomy	A goal plus a fixed, known toolset or agent roster within a domain; the agent plans its own sequence but can't reach outside the box	A policy and approval engine sitting outside the control flow	"Resolve this billing issue using these twelve tools," with checkpoints on refunds or escalations
4. Open-ended / emergent	A goal plus runtime discovery of tools or agents not wired in at design time; recursive delegation is possible	Has to live in an inter-agent trust and negotiation protocol — and this layer mostly doesn't exist yet in mature form	An agent discovers and calls another agent it wasn't given access to at build time

Tiers 1 through 3 are different points on a complexity curve. Tier 4 is a different kind of object.

Two Axes That Cut Across All Four Tiers

Two properties looked, at first, like they belonged inside the tiers. They don't — they're orthogonal, and folding them into the tier definition is what made the original three-bucket model leak at the edges.

Interface or trigger. Chat, event-driven, scheduled, API-invoked. Any of the four tiers can sit behind any of these. Stop using "chatbot" as a stand-in for "low complexity" — it's a front door, not a floor plan.
Topology. Single-agent, fixed-roster multi-agent, or dynamic-discovery multi-agent. Fixed-roster multi-agent — a graph node that fans out to five known sub-agents — inherits tier 2's governance story even though it "looks" multi-agent. Dynamic discovery is what actually pushes a system into tier 4, regardless of how the goal was framed.

Why This Distinction Earns Its Place in a Design Doc

Most agentic AI taxonomies answer "where does this fit in our roadmap." This one answers "how much of this can I stop watching." Those are different questions, and only one of them changes how you build the system.

In practice, this reframing changes three things on a project:

It changes what you eval before launch. Pre-build evals are tractable for tiers 1 through 3 precisely because the action space is enumerable — you can write scenarios against a known surface. For tier 4, "eval before launch" is a much weaker guarantee, because the surface itself can change after launch.
It changes where you put your interface investment. A chat front end tells you nothing about the backend's risk profile — see designing a robust chatbot for the query-handling problems that exist regardless of which tier sits behind the window.
It changes your default architecture choice. Liability wants a bounded blast radius. That's a strong, durable reason to expect most production enterprise systems to live in tier 2 or tier 3 for a long time — not because tier 4 is technically out of reach, but because "I can enumerate everything this could do" is worth more to a risk committee than "it's smarter."

Conclusion

The taxonomy that survives contact with real systems isn't chatbot, workflow, or autonomous. It's: can you write down, before you ship it, everything this thing might call? Three tiers answer yes. One doesn't. Everything else — interface choice, topology, how tightly you scope the goal — is a design decision you make within that boundary, not a new category in its own right.