The industry conversation around AI agents has moved quickly from research prototypes to production deployments. Analyst projections suggest that by the end of 2026, roughly 40% of enterprise applications will incorporate some form of task-specific AI agent. Whether or not the exact number holds, the direction is clear: organisations are decomposing monolithic AI applications into cooperating agents, each responsible for a bounded slice of work.
This is a distributed systems problem. The moment you have two agents that must coordinate, you inherit the same challenges that have shaped service-oriented and microservice architectures for decades: partial failure, ordering, observability, and the operational cost of managing many moving parts. The difference is that agents are non-deterministic by nature, which makes some of these challenges harder.
The sections below cover what multi-agent means in concrete engineering terms, the coordination patterns available, how failure boundaries work between agents, the operational complexity that comes with the territory, when a monolithic design remains the better choice, and a checklist for evaluating multi-agent proposals.
What multi-agent means in practice
The word "agent" is overloaded. In this context, an agent is a unit of work with its own execution context, access to specific tools or APIs, a defined scope of responsibility, and an independent failure boundary. It receives a task (or detects one), reasons about how to accomplish it, invokes tools or sub-tasks, and produces a result.
A monolithic AI application, by contrast, routes all tasks through a single model invocation or a single orchestration loop. The model may be large and capable, but the application treats it as one component. All context is shared. All failures are correlated. Scaling means scaling the whole thing.
Decomposing into agents means drawing boundaries. A customer-support system might have one agent that classifies incoming tickets, another that retrieves relevant documentation, a third that drafts a response, and a fourth that checks the draft against policy. Each can be developed, tested, scaled, and updated independently. Each has its own prompt, its own tool access, and its own failure modes.
This is analogous to the shift from monolithic services to microservices, but with a critical difference: agents make decisions that are probabilistic, not deterministic. Two identical inputs may produce different outputs. This affects testing, debugging, and every form of operational reasoning that depends on reproducibility.
Coordination patterns
When agents cooperate, something must govern the flow of work between them. Three patterns appear repeatedly in production systems.
- Orchestrator A central coordinator dispatches tasks to agents, collects results, and decides what to do next. The orchestrator holds the overall plan and each agent is stateless with respect to the broader workflow. This is the simplest pattern to reason about, debug, and monitor. Its weakness is that the orchestrator is a single point of failure and a potential bottleneck.
- Choreography Agents communicate through events or messages. Each agent reacts to signals from others and emits signals in turn. There is no central coordinator. This pattern scales well and avoids a single bottleneck, but reasoning about the overall behaviour of the system is harder. Debugging requires reconstructing causality from event logs, which is non-trivial when agents are non-deterministic.
- Blackboard A shared data structure (the blackboard) holds the current state of the problem. Agents read from and write to the blackboard. Each agent watches for conditions it can act on. This pattern is useful when the problem is naturally decomposed into opportunistic contributions rather than a fixed sequence of steps. It requires careful concurrency control over the shared state.
In practice, most production systems use the orchestrator pattern or a hybrid where an orchestrator manages the top-level flow and individual agents use choreography for sub-tasks. Pure choreography among non-deterministic agents is difficult to operate because the emergent behaviour is hard to predict and harder to bound.
The choice of coordination pattern determines how you reason about consensus and agreement between agents. An orchestrator can enforce sequential consistency trivially. Choreography requires explicit mechanisms for ordering and conflict resolution.
Failure boundaries and retry semantics
Each agent is a failure boundary. When an agent fails, the question is: what happens to the work it was doing, and what happens to agents that depend on its output?
In a monolithic application, a failure typically means the entire request fails. In a multi-agent system, failures are partial. The classification agent might succeed while the drafting agent times out. The orchestrator (if present) must decide whether to retry the failed agent, substitute a fallback, or fail the overall request.
Retries are particularly tricky with agents. A deterministic service retried with the same input will produce the same output. An agent retried with the same input may produce a different output. This means that idempotency guarantees require careful thought. If the drafting agent produced a partial response before failing, retrying it may produce a completely different draft. The system must decide whether to discard the partial result, attempt to resume, or accept the inconsistency.
Key failure patterns to design for:
- Timeout with no output The agent did not respond in time. Safe to retry if the agent's tool calls are idempotent. Unsafe if the agent may have triggered side effects (sent an email, written to a database) before timing out.
- Malformed output The agent responded but the output does not conform to the expected schema. A structured output parser rejects it. Retry is usually safe, but repeated malformed outputs suggest a prompt or model issue, not a transient failure.
- Hallucinated tool call The agent attempts to invoke a tool that does not exist or passes invalid arguments. The tool layer rejects the call, but the agent may have consumed context window budget reasoning about the invalid path.
- Cascading slowdown One agent is slow, causing downstream agents to queue. Backpressure propagates through the system. Without explicit timeouts and circuit breakers at each boundary, the whole pipeline stalls.
The design principle is the same as in any distributed system: make failure boundaries explicit, define what happens at each boundary when things go wrong, and test those paths. The difference is that non-determinism means you cannot rely on exact replay for debugging. You need traces, not just logs.
Operational complexity
Multi-agent systems have a higher operational surface area than monolithic ones. Each agent is a deployment unit with its own model version, prompt version, tool configuration, and scaling parameters. The combinatorial space of versions across agents grows quickly.
Specific operational concerns:
- Versioning An orchestrator built against agent v1.2 prompts may behave differently when an agent upgrades to v1.3. Prompt changes are code changes. They need the same versioning, review, and rollback discipline as any other code.
- Observability A single user request may fan out to five agents, each making multiple LLM calls and tool invocations. Distributed tracing must capture the full tree, including token counts, latencies, and costs per step. Standard service meshes do not instrument LLM calls natively; custom instrumentation is needed.
- Cost attribution LLM inference is priced per token. A multi-agent system where agents pass context to each other can amplify token consumption. Understanding which agent (and which user request) drove a cost spike requires per-call cost tagging, which is straightforward in principle but tedious to implement correctly.
- Testing Unit testing an agent in isolation is possible but insufficient. Integration tests must cover agent interactions, and because agents are non-deterministic, assertions must be probabilistic or structural ("the output contains a valid JSON object with fields X and Y") rather than exact equality checks.
Organisations that have operated microservice architectures will find some of this familiar. The additional burden comes from non-determinism and the fact that LLM-based agents are sensitive to input phrasing in ways that traditional services are not.
When workloads span infrastructure boundaries, the hybrid pipeline constraints compound operational complexity further. An agent running on-prem against sovereign data and another running in the cloud against public APIs need coordinated observability across trust boundaries.
When monolithic is still right
Not every AI application benefits from decomposition into agents. A monolithic design is preferable when:
The task is simple enough that a single model call handles it end to end. Adding an orchestrator and multiple agents to a task that a single well-prompted model can solve introduces latency, cost, and failure modes with no compensating benefit.
The team is small. Operating a multi-agent system requires expertise in distributed systems, observability, and deployment automation. A team of three engineers is better served by a monolithic application they can fully understand than a multi-agent system they can partially observe.
Latency is the primary constraint. Each agent boundary adds at least one network hop and one LLM invocation. For applications where response time is critical, the sequential overhead of multi-agent coordination may be unacceptable. Parallelising agents helps but introduces coordination complexity.
The problem does not decompose naturally. If the agents you define have tightly coupled responsibilities, requiring extensive context sharing and frequent back-and-forth, the boundaries are artificial. You have a monolith with extra network calls.
The pragmatic test is whether the boundaries you draw reduce cognitive load and blast radius. If they do, decompose. If they do not, keep it simple.
Evaluation checklist
Before committing to a multi-agent architecture, work through the following:
- Boundary justification For each proposed agent, articulate why it needs to be a separate unit. Valid reasons include independent scaling, independent deployment, distinct tool access, or distinct failure handling. "It feels cleaner" is not sufficient.
- Coordination pattern selection Choose orchestrator, choreography, or blackboard deliberately. Document the trade-offs. If you cannot explain how a failed agent is handled under your chosen pattern, the design is incomplete.
- Failure mode enumeration List at least five failure scenarios specific to the agent interactions (not just individual agent failures). Define the expected system behaviour for each. Automate tests for at least the top three.
- Observability plan Specify how a single user request will be traced across all agents. Include token counts, tool invocations, and latency at each step. If the observability tooling does not exist yet, budget for building it before launching.
- Cost model Estimate the per-request cost of the multi-agent design versus the monolithic alternative. Include LLM token costs, infrastructure costs, and the engineering time for ongoing operations. Multi-agent is often 3x to 10x more expensive per request in LLM costs alone.
- Rollback plan Define how to roll back a single agent to a previous version without affecting others. If rollback requires coordinated deployment across agents, the boundaries are coupled and the operational benefit of decomposition is diminished.
Multi-agent architectures are a powerful decomposition strategy for complex AI applications. They are also a significant increase in operational surface area. The decision to adopt them should be driven by concrete engineering requirements, not by the appeal of the pattern itself. The same principles of failure handling, observability, and specification discipline that govern any distributed system apply here, and ignoring them is more costly when the components are non-deterministic.
Related notes
- Failure modes, retries, and idempotencyDesigning operations so retry behaviour is safe across agent boundaries.
- Consensus without the hypeWhen agents need agreement and when they do not.
- Hybrid AI pipelines and data sovereigntyArchitectural constraints when AI workloads span environments.