Runtime Governance: 10 Policies No Agent Should Reach Production Without

Offline evals tell you how an agent behaves on inputs you imagined. Runtime governance protects you from the inputs, loops, tools, and failures you did not.

View the eval content map

Core idea: production agents need a measure-and-act policy layer between the agent and the systems it can affect, covering tools, credentials, sensitive data, outbound communication, runtime duration, and cost.

Production Reality

Why launch-day safety checks are not enough

Most teams treat agent safety as a launch-day checklist: red-team the prompts, evaluate on a benchmark, ship. That model breaks the moment an agent runs in production. Evals tell you how the agent behaves on inputs you imagined. Runtime governance is what protects you from inputs you didn't.

An agent in production is not a model — it's a long-running process with credentials, tools, a budget, and the autonomy to do real work on real systems. The unit of failure is no longer a bad answer; it's a runaway loop, a leaked secret, a hijacked context, or a tool call to the wrong API at 3am. None of these show up in offline evals. All of them are policy problems, and all of them need to be measured and acted on while the agent is running, not after.

What follows is the minimum set of runtime policies every agent should enforce before it touches a production workload. Each one maps to a concrete failure that has already happened to someone, and each one can be implemented as a measure-and-act loop sitting between the agent and the rest of your systems.

Checklist

The 10 policies

Runtime Policy	What to Measure	Automated Action	Failure Mode if Absent
1. Tool call rate limiting	Tool invocations/minute, retries, bursts	Slow down, queue, deny further calls	Downstream API abuse, throttling, self-inflicted DoS
2. Maximum agent duration / TTL	Total runtime, step count, reasoning iterations	Terminate or force escalation	Runaway agent, stuck sessions, unbounded spend
3. Autonomous action budget	Number/value of actions taken	Require approval after threshold	Large blast radius from a single bad decision
4. AI-generated content safety check	Toxicity, unsafe content, policy categories	Block, redact, rewrite, escalate	Brand/reputational harm, policy violations
5. Prompt injection detection	Injection score, instruction override attempts	Isolate context, strip instructions, switch mode	Hijacked agent, exfiltration, unauthorized actions
6. Data sensitivity / PII policy	PII detection, secret leakage, sensitive entities	Mask, redact, prevent transmission	Data exfiltration, GDPR/HIPAA/PCI compliance breach
7. Tool permission boundary checks	Requested action vs role/authorization	Deny tool access and alert	Privilege escalation, unauthorized writes
8. Repetition / infinite-loop detection	Repeated tool patterns, cyclic reasoning	Kill loop and summarize state	Wasted compute, stuck workflows, cost blowout
9. External communication policy	Domain reputation, outbound destinations	Block unknown destinations	Data exfiltration to attacker-controlled endpoints
10. Cost and resource guardrails	Token burn, API cost, memory growth	Degrade gracefully or stop	Budget overruns, OOM crashes, noisy-neighbor failures

Policy Detail

Why each one matters in production

1. Tool call rate limiting

A model that decides to "just retry" can put a thousand calls into a downstream API in under a minute. Rate limiting keeps one misbehaving agent from taking down a service shared by every other agent and human user.

2. Maximum agent duration / TTL

Agents fail open on time. TTLs convert "we have no idea what happened" into "the agent hit its budget and escalated," which is the difference between an incident and a ticket.

3. Autonomous action budget

Budgeting actions by count, value, and reversibility forces a human into the loop exactly where the blast radius gets large, without slowing the agent down on routine work.

4. AI-generated content safety check

The model vendor's built-in safety layer is tuned for general harm, not for your brand, regulatory regime, or customers. Runtime content checks enforce the rules specific to your business.

5. Prompt injection detection

Any content the agent reads can be adversarial input. Without detection and quarantine, the agent's permissions effectively belong to whoever can get text in front of it.

6. Data sensitivity / PII policy

Agents move data across boundaries humans never would. Detecting and masking PII, secrets, and regulated data at egress is the realistic control point.

7. Tool permission boundary checks

The agent's identity, not the prompt's intent, should determine what a tool can do. Authorization at the tool layer prevents clever inputs from becoming privilege escalation.

8. Repetition / infinite-loop detection

Loops are common, boring, and expensive. Detecting cyclic patterns and killing the loop pays for itself the first time it fires.

9. External communication policy

An allowlist of approved domains and channels turns "the agent exfiltrated data" into "the agent tried to and was blocked." The cost is a config file.

10. Cost and resource guardrails

Token burn, context growth, and memory pressure compound silently. Per-session and per-tenant limits keep a single bad session from consuming the rest of the month's budget.

Adoption

How to think about adoption

These ten aren't ten separate projects. They share a common shape: a measurement at the boundary between the agent and something it can affect (a tool, a user, a model, a network), and a policy that decides what to do when the measurement crosses a line. Build that measure-and-act layer once, and adding policies becomes configuration rather than engineering.

Not all ten are equally urgent for every agent. A read-only research agent doesn't need an action budget the way a payments agent does. But every agent needs some answer for each of these — even if the answer is "not applicable, here's why."
Defaults matter more than ceilings. A policy that's only enforced when someone remembers to configure it is not a policy. The framework should fail closed: no rate limit configured means a conservative default, not unlimited.
Observability is the prerequisite. You can't enforce what you can't measure. If you don't yet have per-agent traces of tool calls, token usage, and outbound destinations, that's the first investment — the policies are what you build on top.

Production rule: in offline evaluation, the agent is the system under test. In production, the agent is one component inside a larger system, and runtime governance is what makes that larger system safe to operate.