Runtime Governance: 10 Policies No Agent Should Reach Production Without
Offline evals tell you how an agent behaves on inputs you imagined. Runtime governance protects you from the inputs, loops, tools, and failures you did not.
Core idea: production agents need a measure-and-act policy layer between the agent and the systems it can affect, covering tools, credentials, sensitive data, outbound communication, runtime duration, and cost.
Why launch-day safety checks are not enough
Most teams treat agent safety as a launch-day checklist: red-team the prompts, evaluate on a benchmark, ship. That model breaks the moment an agent runs in production. Evals tell you how the agent behaves on inputs you imagined. Runtime governance is what protects you from inputs you didn't.
An agent in production is not a model — it's a long-running process with credentials, tools, a budget, and the autonomy to do real work on real systems. The unit of failure is no longer a bad answer; it's a runaway loop, a leaked secret, a hijacked context, or a tool call to the wrong API at 3am. None of these show up in offline evals. All of them are policy problems, and all of them need to be measured and acted on while the agent is running, not after.
What follows is the minimum set of runtime policies every agent should enforce before it touches a production workload. Each one maps to a concrete failure that has already happened to someone, and each one can be implemented as a measure-and-act loop sitting between the agent and the rest of your systems.
The 10 policies
| Runtime Policy | What to Measure | Automated Action | Failure Mode if Absent |
|---|---|---|---|
| 1. Tool call rate limiting | Tool invocations/minute, retries, bursts | Slow down, queue, deny further calls | Downstream API abuse, throttling, self-inflicted DoS |
| 2. Maximum agent duration / TTL | Total runtime, step count, reasoning iterations | Terminate or force escalation | Runaway agent, stuck sessions, unbounded spend |
| 3. Autonomous action budget | Number/value of actions taken | Require approval after threshold | Large blast radius from a single bad decision |
| 4. AI-generated content safety check | Toxicity, unsafe content, policy categories | Block, redact, rewrite, escalate | Brand/reputational harm, policy violations |
| 5. Prompt injection detection | Injection score, instruction override attempts | Isolate context, strip instructions, switch mode | Hijacked agent, exfiltration, unauthorized actions |
| 6. Data sensitivity / PII policy | PII detection, secret leakage, sensitive entities | Mask, redact, prevent transmission | Data exfiltration, GDPR/HIPAA/PCI compliance breach |
| 7. Tool permission boundary checks | Requested action vs role/authorization | Deny tool access and alert | Privilege escalation, unauthorized writes |
| 8. Repetition / infinite-loop detection | Repeated tool patterns, cyclic reasoning | Kill loop and summarize state | Wasted compute, stuck workflows, cost blowout |
| 9. External communication policy | Domain reputation, outbound destinations | Block unknown destinations | Data exfiltration to attacker-controlled endpoints |
| 10. Cost and resource guardrails | Token burn, API cost, memory growth | Degrade gracefully or stop | Budget overruns, OOM crashes, noisy-neighbor failures |
Why each one matters in production
1. Tool call rate limiting
A model that decides to "just retry" can put a thousand calls into a downstream API in under a minute. Rate limiting keeps one misbehaving agent from taking down a service shared by every other agent and human user.
2. Maximum agent duration / TTL
Agents fail open on time. TTLs convert "we have no idea what happened" into "the agent hit its budget and escalated," which is the difference between an incident and a ticket.
3. Autonomous action budget
Budgeting actions by count, value, and reversibility forces a human into the loop exactly where the blast radius gets large, without slowing the agent down on routine work.
4. AI-generated content safety check
The model vendor's built-in safety layer is tuned for general harm, not for your brand, regulatory regime, or customers. Runtime content checks enforce the rules specific to your business.
5. Prompt injection detection
Any content the agent reads can be adversarial input. Without detection and quarantine, the agent's permissions effectively belong to whoever can get text in front of it.
6. Data sensitivity / PII policy
Agents move data across boundaries humans never would. Detecting and masking PII, secrets, and regulated data at egress is the realistic control point.
7. Tool permission boundary checks
The agent's identity, not the prompt's intent, should determine what a tool can do. Authorization at the tool layer prevents clever inputs from becoming privilege escalation.
8. Repetition / infinite-loop detection
Loops are common, boring, and expensive. Detecting cyclic patterns and killing the loop pays for itself the first time it fires.
9. External communication policy
An allowlist of approved domains and channels turns "the agent exfiltrated data" into "the agent tried to and was blocked." The cost is a config file.
10. Cost and resource guardrails
Token burn, context growth, and memory pressure compound silently. Per-session and per-tenant limits keep a single bad session from consuming the rest of the month's budget.
How to think about adoption
These ten aren't ten separate projects. They share a common shape: a measurement at the boundary between the agent and something it can affect (a tool, a user, a model, a network), and a policy that decides what to do when the measurement crosses a line. Build that measure-and-act layer once, and adding policies becomes configuration rather than engineering.
- Not all ten are equally urgent for every agent. A read-only research agent doesn't need an action budget the way a payments agent does. But every agent needs some answer for each of these — even if the answer is "not applicable, here's why."
- Defaults matter more than ceilings. A policy that's only enforced when someone remembers to configure it is not a policy. The framework should fail closed: no rate limit configured means a conservative default, not unlimited.
- Observability is the prerequisite. You can't enforce what you can't measure. If you don't yet have per-agent traces of tool calls, token usage, and outbound destinations, that's the first investment — the policies are what you build on top.
Production rule: in offline evaluation, the agent is the system under test. In production, the agent is one component inside a larger system, and runtime governance is what makes that larger system safe to operate.
Part of the evals series
- What Are Evals? A Practical Introduction to Evaluating AI Systems
- Testing vs Evals: How AI Quality Differs from Deterministic Software Quality
- Build-Time Evals: Regression, CI/CD, and Release Gates for AI Systems
- Runtime Evals and Observability for Agentic Systems
- Safety Evals and Red Teaming for AI Systems
- EvalOps Operating Model for AI Systems