The growing deployment of agentic AI systems raises a question that is more operational than philosophical: how much should an agent be allowed to do before a human has to look at it? This matters for distributed systems because agent actions propagate across services, trigger side effects, and accumulate costs in ways that are difficult to reverse. It matters for formal methods because the constraints we place on agents need to be precise enough to enforce and clear enough to audit.

This article examines human-in-the-loop patterns for AI agent systems. It covers why full autonomy is rarely desirable, a practical taxonomy of autonomy levels, patterns for setting boundaries, escalation protocols for when agents should stop and ask, the real cost of human supervision, and how to design handoffs that do not break the system.

Why full autonomy is rarely the goal

There is a persistent assumption that the ideal end state for any AI system is full autonomy. Remove the human, let the model run, and optimise for speed. In practice, very few production systems actually aim for this, and for good reason.

The first constraint is liability. When an agent takes an action with financial, legal, or safety implications, someone needs to be accountable. Fully autonomous agents create an accountability gap. If a procurement agent commits to a vendor contract that violates compliance policy, the organisation still bears the consequence. The agent does not.

The second constraint is correctness. Large language models and planning agents operate probabilistically. They are capable of producing plausible but wrong outputs, and the failure modes are not always predictable. In systems where actions are irreversible, or where the cost of a mistake is high, a human checkpoint is not overhead. It is risk management.

The third constraint is trust. Trust is built incrementally through demonstrated reliability. Most organisations have not yet accumulated enough operational history with agentic systems to trust them with high-stakes decisions. This is rational, not conservative. As explored in the discussion of agentic AI and system complexity, governance requirements grow in proportion to agent capability.

Full autonomy is a point on a spectrum, not the default destination. The engineering task is to find the right point on that spectrum for each class of action.

Levels of autonomy

It helps to have a shared vocabulary. The following levels describe a progression from fully manual operations to fully autonomous agent behaviour. Not every system needs to reach the highest level, and many systems will operate at different levels for different task types simultaneously.

  • Level 0: Fully manualA human performs every action. The system provides information but takes no independent steps. This is the baseline.
  • Level 1: Agent-suggestedThe agent proposes actions. A human reviews and explicitly approves each one before execution. The agent has no execution authority.
  • Level 2: Human-approvedThe agent queues actions and executes them only after human approval. The difference from Level 1 is that the agent handles execution mechanics. The human decides what happens, the agent decides how.
  • Level 3: Human-monitoredThe agent executes actions independently within defined boundaries. A human monitors outcomes and intervenes when something falls outside expected parameters. This is where most production agentic systems operate today.
  • Level 4: Exception-based oversightThe agent operates autonomously for the vast majority of tasks. Humans are only involved when the agent encounters a situation it cannot resolve or when a predefined escalation threshold is crossed.
  • Level 5: Fully autonomousThe agent operates without any human involvement. In practice, this level is appropriate only for low-stakes, highly constrained, and well-tested task domains.

The key insight is that different actions within the same system can and should operate at different levels. A customer service agent might handle routine refunds at Level 3 but escalate account closures to Level 1. The level is a property of the action, not the system.

Boundary-setting patterns

Boundaries define what an agent is allowed to do. Good boundaries are explicit, enforceable, and auditable. They should be expressed as specs, invariants, and contracts wherever possible, not as informal guidelines buried in prompt text.

  • Scope limitsRestrict the agent to a defined set of actions. An agent that can read customer data but not modify it has a clear scope boundary. Scope limits are the simplest form of boundary and the easiest to enforce through API permissions and role-based access control.
  • Budget capsSet hard limits on resource consumption. This includes financial spend, API call volume, compute time, and token usage. Budget caps prevent runaway costs from agent loops or unexpected input patterns. They should be enforced at the infrastructure level, not by the agent itself.
  • Approval gatesRequire human sign-off before specific action types. Gates work well for actions that are irreversible, expensive, or externally visible. The design challenge is keeping the gate fast enough that it does not become a bottleneck for the entire workflow.
  • Kill switchesProvide immediate shutdown capability. A kill switch halts the agent entirely, not just the current action. This is a safety mechanism for situations where the agent is behaving unpredictably and the fastest correct response is to stop everything.
  • Output validatorsCheck agent outputs against a schema or constraint set before they are acted upon. Validators catch malformed actions, out-of-range values, and policy violations. They operate independently of the agent and cannot be overridden by it.

These patterns layer on top of each other. A well-designed system uses scope limits as the base, adds budget caps for resource control, inserts approval gates for high-risk actions, and keeps a kill switch available for emergencies. Output validators run continuously across all levels.

Escalation protocols

An escalation protocol defines when an agent should stop acting and ask for help. This is distinct from boundary enforcement, which prevents actions. Escalation is about the agent recognising the limits of its own competence.

Good escalation protocols are triggered by measurable conditions, not by vague uncertainty thresholds. Examples of concrete triggers include confidence scores below a defined threshold on classification tasks, responses that fail output validation more than a set number of times in sequence, encountering input patterns that were not represented in the training or evaluation data, and actions that would exceed a budget allocation for the current period.

The protocol itself needs to specify what information the agent passes to the human. A bare "I don't know" is not useful. The escalation message should include the original request, the actions considered, the reason for escalation, and any partial results. This is structurally similar to the observability signals discussed in observability for distributed systems, where context-rich error reporting accelerates diagnosis.

Escalation also needs a timeout. If a human does not respond within a defined window, the system needs a fallback: queue the task, apply a safe default, or reject the request. Escalation that blocks indefinitely is a liveness hazard in any concurrent system.

The cost of supervision

Human oversight is not free. It consumes attention, introduces latency, and scales poorly. Understanding these costs is essential for designing systems that are both safe and practical.

Alert fatigue is the most immediate problem. When agents escalate frequently, humans learn to approve without reviewing. Studies in clinical alert systems, aviation, and security monitoring consistently show that high-volume alerts degrade human performance. The same dynamic applies to agent supervision. If ninety-five percent of escalations are routine, the human will miss the five percent that matter.

Latency is the second cost. Every approval gate adds time to the end-to-end workflow. For systems with latency budgets measured in hundreds of milliseconds, a human-in-the-loop step measured in minutes or hours changes the architecture fundamentally. Batch-oriented approval workflows are one mitigation, but they reduce the responsiveness that agentic systems are often deployed to improve.

Staffing is the third cost. Supervision requires humans with domain expertise, available during the hours the system operates, and trained to evaluate agent outputs. For systems that run continuously across time zones, this means dedicated teams. The operational expense of those teams needs to be weighed against the risk reduction they provide.

The goal is not to eliminate supervision but to make it efficient. This means investing in better escalation quality so that humans receive fewer, higher-signal alerts. It means designing review interfaces that surface the right context quickly. And it means tracking supervision metrics, such as approval rate, review time, and override frequency, to identify where the process can be tuned.

Designing graceful handoffs

The transition from agent to human, and from human back to agent, is where many systems fail. A handoff is not just a notification. It is a transfer of context, authority, and responsibility.

A graceful handoff requires the agent to package its current state in a format the human can understand quickly. This means a summary of what has been done, what remains, what constraints are active, and what the agent would have done next if it had been allowed to continue. The handoff should not require the human to replay the agent's entire decision history to understand the current situation.

The reverse handoff, from human back to agent, is equally important. Once a human resolves an escalation, the system needs a clean mechanism to resume automated processing. This is analogous to the failure and retry patterns in distributed systems: the resumed workflow must not duplicate actions already taken, and it must pick up from the correct state.

Handoff interfaces should be tested as carefully as any other system boundary. Edge cases include simultaneous escalation and human intervention, handoffs during partial failures, and situations where the human's decision contradicts the agent's active constraints. Each of these can create inconsistent state if the handoff protocol does not account for them.

A practical pattern is to treat the handoff as a state machine transition with explicit pre-conditions and post-conditions. The agent enters a "suspended" state on escalation, the human takes ownership, and the system re-enters the automated state only when the human explicitly releases it. Audit logging at each transition point provides the traceability needed for governance and post-incident review, an area covered more broadly in the context of resilience benchmarking.

Designing for human oversight is not a sign that the technology is immature. It is a sign that the system design is honest about the conditions under which autonomous action is safe, and serious about the conditions under which it is not.

Related notes