Every consequential action your agent takes — a refund, a payout-record change, an outbound message — is designed to route through one chokepoint, and we prove the set is complete before you rely on it. The gate checks typed facts (amount > limit, approval_verified == false) and never the agent's words. Below is the real engine, running in your browser.
This is the deterministic engine itself, ported to JavaScript — no server, no model. Pick a scripted attack, or build your own action and watch the verdict. The attacker's message changes nothing, because the gate never reads it.
The attacker can't add a brand-new payee, so they edit a trusted one. The gate guards the edit, not the verb — and binds the refund to the just-changed record.
Type the most persuasive jailbreak you can. Watch the verdict on the right. It will not move — the gate never reads this box.
Move a fact — the amount, or whether the approval is verified — and the verdict flips instantly. Rewrite the sentence and nothing happens. That is the whole product.
A jailbreak wins by changing what the agent believes. None of these three can be moved by anything the agent believes, says, or is told.
Every rule is a pure function of structured facts — amount_cents > limit, approval_verified == false, payout_changed_recently. There is no natural-language understanding to fool, so a perfect jailbreak buys the attacker nothing.
The guarantee is completeness: no consequential action reaches the outside world except through the gate. A documented chokepoint contract plus a CI drift-check fails the build the day a new tool slips a side door — so coverage can't silently rot.
Default mode logs the decision it would make and lets the action proceed — into a hash-chained, tamper-evident evidence log. You see exactly what it would have stopped on your real traffic before you let it stop anything.
Paste your agent's tool list — one per line, or comma-separated. This runs the same classification the real gate runs at registration: consequential tools must be gated, and any raw execution primitive (a shell, an exec, an arbitrary query) is a side door that makes a completeness guarantee impossible until it's removed or decomposed.
This is a first-pass heuristic on tool names — the real Phase 0 worksheet inspects each tool's parameters and downstream effects with you. But it shows the shape of the answer in ten seconds: a small, fully-gated tool surface is a gate we can guarantee; a shell in the list is a conversation about removing it first.
This page runs the real v1 rule engine against scripted attacks and a mock agent. It proves the mechanism is deterministic: the verdict depends only on facts, and a jailbroken agent gets the same answer as a fooled one. That's a conviction demo — it earns a meeting, not a signature.
The test that proves something about your risk is different. We wire the gate in front of your agent's real tools in shadow mode — it blocks nothing, just logs every action it would have stopped on your live traffic for one to two weeks. The deliverable is your own evidence report: here is the catastrophic action we would have caught, here is the dollar exposure, here is the tamper-evident log. You flip to enforce only when that report convinces you.
We don't claim to harden the agent, stop every jailbreak, or guard actions you don't route through the gate. We guard a finite, typed set of consequential actions — completely — and we prove the set is complete. For everything else, you need other tools, and we'll tell you which.
Two weeks, blocks nothing, and you keep the evidence report either way. If it never would have stopped anything, you've lost nothing but the wiring time.
Book a shadow-mode pilot →