Shipping an agent demo takes an afternoon. Shipping one that survives a quarter in production is a different job β and the gap is almost never the model. It's three boring things that are usually missing entirely.
I maintain an open, MIT-licensed Agentic Product Standard, and v2.0 was mostly about turning those three things from advice into code you can run. Here they are, with the actual code.
- Security is structural, not a filter The most common mistake is treating safety as a guardrail β an input/output filter near the edge. The problem is that filters have a ceiling. The best content classifiers run around 97% accuracy, which means ~3% of prompt-injection attempts land by design. That's not a bug you tune away; it's the nature of filtering.
Real safety comes from architecture. The check I reach for first is Simon Willison's lethal trifecta: an agent becomes an exfiltration tool the moment it has all three of β
access to private data,
exposure to untrusted content, and
the ability to communicate externally.
Any one is fine. All three together is a data-exfiltration channel waiting for a payload hidden in a retrieved document. The fix is never "better filter" β it's break a leg: gate egress, quarantine untrusted input, or scope the data.
Here's the gate, as a CI step. You declare what your agent can touch; the build fails if the trifecta is unmitigated:
agent-capabilities.json
{ "name": "support-agent",
"private_data": true, "untrusted_content": true, "external_comms": true,
"mitigations": [{"leg": "external_comms", "control": "egress allow-list + approval"}] }
LEGS = ("private_data", "untrusted_content", "external_comms")
def evaluate(spec):
present = [l for l in LEGS if spec.get(l) is True]
if len(present) < 3:
return 0, f"OK: only {len(present)}/3 legs present"
broken = {m["leg"] for m in spec.get("mitigations", []) if m.get("control")}
if broken & set(LEGS):
return 0, f"OK: trifecta present but broken at {', '.join(broken)}"
return 1, "FAIL: lethal trifecta, no leg broken"
The second structural control is MCP supply chain. Community MCP servers are untrusted code, and a server can hand you a benign tool description at approval time, then mutate it later (a "rug pull," or tool-definition poisoning). So: pin tool definitions by hash and alert on change. A few lines of bash in CI catches it:
canonical hash per tool, sorted so reordering doesn't trip it but content changes do
jq -cS '(.tools)|sort_by(.name)[]|{name,description,inputSchema}' tools.json \
| while read -r t; do printf '%s %s\n' "$(printf '%s' "$t"|sha256sum|cut -d' ' -f1)" \
"$(printf '%s' "$t"|jq -r .name)"; done > current.lock
diff -u tools.lock current.lock || { echo "::error::MCP tool def changed β possible rug pull"; exit 1; }
- Cost is a circuit breaker, not a dashboard Agents burn tokens at a rate chat never did β Anthropic has put multi-agent systems at roughly 15Γ a chat interaction. Most published multi-agent architectures leave the cost ceiling wide open: there's no breaker on a runaway loop.
v2.0 makes cost a hard control:
a per-run token/cost ceiling enforced in code (a breaker, checked before each model call), not a bill you read later;
prompt/KV caching on stable prefixes (system prompt, tool schemas);
model routing β small model for classification, flagship for reasoning;
and the economics rule people learn the expensive way: only pay the 15Γ for multi-agent when the task value justifies it. If one agent clears the bar, the orchestra is waste.
The point is that "cost" stops being a surprise the moment it's a number your code enforces and your traces record per task.
- Delete your own scaffolding This is the one I think about most. Every rule you write to patch a model's weakness becomes dead weight the day that weakness is gone β and worse, it competes for the model's attention and degrades the rules that still matter.
So the standard adds a maintenance doctrine (adapted, with credit, from Daniel Miessler's PAI): audit every instruction on a cadence with one question β
Would a smarter model make this rule unnecessary?
If yes, it's scaffolding, not architecture β cut it. Tag rules anti-fragile (eval sets, verification harnesses, tool contracts, real failure gotchas β keep) vs fragile (chain-of-thought orchestrators, output parsers, retry cascades β cut or re-test on the next model upgrade). A standard that only grows is one that rots.
Making it checkable
Principles are easy to agree with and impossible to audit. So v2.0 ships a self-assessment scorecard β a binary Yes/No maturity check (M0 Prototype β M1 Shippable β M2 Production β M3 Autonomous-ready), mapped to an autonomy ladder. Your level is the highest band where every gate passes; the first "No" is your next task. "Is it production-ready?" becomes a checklist instead of a vibe.
Alongside it: the red-team kit above plus a CI workflow template that blocks merges when the eval pass-rate slips, and a 2026 refresh (MCP + OAuth 2.1, A2A at the Linux Foundation, OpenTelemetry GenAI tracing, trajectory + pass^k eval metrics).
The one rule under all of it
The model is the variable. The harness is the constant. Invest proportionally.
β¦with the v2.0 twist: the harness isn't something you accumulate forever. You curate it β growing the parts that compound, deleting the parts that only propped up a weaker model.
It's MIT and vendor-neutral (deliberately not a framework), with an optional Claude Code skill set. I'd rather be told what's wrong with it than collect stars β the deferred list is where I'm least sure.
Repo: https://github.com/Moai-Team-LLC/agentic-product-standard