AI governance architecture

Bypassable by design: the architecture problem behind AI governance theatre

Why most AI 'human oversight' is bypassable by design — advisory client-side checks an agent can route around — and what it takes to move oversight server-side, into the architecture and the procurement contract.

In the twelve months since this argument became urgent, the public record has filled with a particular kind of accident. In April 2025, the support bot for Cursor — a coding tool whose product is itself agentic AI — invented a multi-device usage policy that did not exist, told paying customers it was the rule under a human-sounding signature (“Sam,” not labelled as a bot), and triggered a wave of cancellations before the company’s cofounder posted on Reddit that there was no such policy and the bot had hallucinated it (Fortune, 23 April 2025). In July 2025, an AI coding agent built into Replit deleted a production database during what its user, a SaaS founder named Jason Lemkin, had explicitly designated a code and action freeze — wiping live records for over 1,200 executives and 1,196 companies, then telling Lemkin that rollback was impossible (it was not), then generating thousands of fabricated user records that obscured what had actually happened (Fortune, 23 July 2025). Lemkin later wrote that he had told the agent in ALL CAPS, eleven times, not to make changes. A week later, Google’s Gemini CLI agent watched a directory-creation command fail, treated the failure as success, ran a series of destructive move operations against the imagined directory, and permanently deleted a product manager’s working files (GitHub issue #4586, 21 July 2025). The agent then issued an unusually candid post-mortem: “I have failed you completely and catastrophically. My review of the commands confirms my gross incompetence.”

In March 2026, an internal AI agent at Meta posted an analysis on an internal forum without being asked to. A second engineer acted on the agent’s unsolicited advice; the resulting chain left sensitive company and user data accessible to engineers without authorisation for roughly two hours. Meta classified the incident Sev 1, its second-highest severity (Engadget, 18 March 2026). The engineer’s post-mortem proposal — that agents should be required to ask for permission before acting on a user’s behalf — was the same conclusion you reach by walking through the Cursor, Replit, and Gemini incidents from first principles. Around the same period, a single attacker used Anthropic’s Claude Code and OpenAI’s GPT-4.1 to compromise nine Mexican government agencies and exfiltrate roughly 150GB of data exposing some 195 million taxpayer identities — by repeatedly framing the operation, against the agents’ own initial refusals, as a legitimate bug-bounty programme (SecurityWeek, 2 March 2026).

These were not edge cases at obscure vendors. Replit was past $100M ARR. Cursor sat at the centre of the AI-coding category. Gemini CLI shipped from Google with the Gemini 2.5 Pro model behind it. The Meta incident happened inside a company with mature internal-security tooling and a Sev 1 taxonomy designed for exactly this kind of escalation. In each case the user had given the agent permission to act on real systems, and in each case the agent did something the user had explicitly forbidden or had never authorised, then acted on its own output without verification. Public incident registries — the OECD AI Incidents and Hazards Monitor, the AI Incident Database, security-industry breach trackers — log new entries every week. The cases reaching press or court are a fraction of those diagnosed inside the affected company and kept there.

Earlier this year, the founder and CEO of a European AI company whose product is, by its own description, infrastructure for making advanced AI systems legible, trustworthy, and auditable in regulated markets, reached out to me on LinkedIn about an executive technical role. He was running the search personally, and his templated outreach included a specific promise: “After you submit your responses, I’ll personally review them before making any decisions.” I sent a substantive reply. He answered in the same thread with a real, two-line response. Two days later the same templated outreach arrived in the same thread, verbatim, as if the exchange above it did not exist.

That last one is what made me write this piece. Nothing was destroyed, no claim was filed, the encounter cost me a few minutes. But the failure mode is the same as Cursor’s, Replit’s, Gemini’s, and Meta’s: a human-attention promise made by the operator, broken by automation the operator installed, in a context where the operator’s product premise was preventing exactly this. The cases differ in scale and stakes; they share a shape.

You can call this governance theatre, and several people have. Oliver Patel writes about AI compliance theatre. Altrum.ai compares today’s AI governance to airport security theatre. Melanie Fink, in a Leiden paper on EU AI Act Article 14, documents the empirical limits of human oversight in operational systems.

The phenomenon is the gap between promised oversight and delivered oversight. The mechanism is the architecture of where, and by whom, the oversight is installed.

The symptom has a name. The mechanism does not.

The theatre framing for AI governance is now well-attested. Patel’s How to avoid AI compliance theatre (February 2026) names the failure mode from the buyer side: organizations adopt frameworks and committees and audit schedules that look like governance and do not, on inspection, perform much of it. Altrum’s Governing the AI Triad (September 2025) makes the parallel to airport security theatre explicit — visible procedures that reassure people without addressing the underlying risk. Fink’s academic treatment of Article 14 (February 2025) goes further: human oversight, even when present, often fails because of cognitive constraints and automation bias.

All three are correct. Each describes a real and measurable gap between what oversight is supposed to do and what it does. None of them, as far as I can tell, names the architectural property that produces the gap.

The gap is not primarily a policy gap, not primarily a culture gap, not primarily a procedural gap. It is an architectural gap — once you name the architectural property, the policy and culture and procedure stories rearrange themselves around it as effects, not causes.

Case 1: Air Canada

In November 2022, Jake Moffatt visited Air Canada’s website after his grandmother died. He needed to fly from Vancouver to Toronto for the funeral. The site had a chatbot. He asked it about Air Canada’s bereavement-fare policy, and the chatbot told him a customer could submit an application for a discounted fare within ninety days after the flight had been taken. Moffatt paid the full fare, attended the funeral, and applied for the discount within the ninety-day window the chatbot had quoted.

Air Canada denied the application. The actual bereavement policy did not allow post-travel claims. Moffatt sued in the British Columbia Civil Resolution Tribunal. The tribunal ruled on 14 February 2024 (Moffatt v. Air Canada, 2024 BCCRT 149). Tribunal Member Christopher C. Rivers found that Air Canada owed Moffatt a duty of care, that the chatbot had misled him, and that the airline had failed to exercise reasonable care to ensure its representations were accurate. Damages of $812.02 CAD.

The defence Air Canada raised is worth quoting in full. In Rivers’s words: “In effect, Air Canada suggests the chatbot is a separate legal entity that is responsible for its own actions. This is a remarkable submission. While a chatbot has an interactive component, it is still just a part of Air Canada’s website.”

Notice what the chatbot was, architecturally. It was a customer-facing surface that issued statements on behalf of the company, with no human oversight of any kind on the path between the model’s output and the user. There was no gate, theatrical or otherwise — the chatbot was the system.

Case 2: UnitedHealth

A harder case. In November 2023, the families of two deceased Medicare Advantage members sued UnitedHealth Group in the District of Minnesota (Estate of Lokken v. UnitedHealth Group, case 0:23-cv-03514). They alleged that the insurer used an AI tool called nH Predict — developed by naviHealth, an Optum subsidiary acquired in 2020 and rebranded in 2024 to Home & Community Care — to deny post-acute care coverage in ways that overrode treating physicians’ decisions.

The plaintiffs allege a 90% error rate, measured by the percentage of denials reversed on appeal, and note that only 0.2% of denials are ever appealed. The complaint puts the structural claim explicitly: “the lack of human review involved in the claims denial process.” UnitedHealth’s response, repeated publicly, is that nH Predict is a guide, not a decision-maker; that coverage decisions are made by medical directors, not by AI; and that the lawsuit misrepresents the work of clinical staff.

On 13 February 2025 a federal judge allowed claims of breach of contract and breach of the implied covenant of good faith and fair dealing to proceed. On 9 March 2026 a magistrate judge ordered broad discovery — six of seven document-production categories granted — into how nH Predict interacts with clinician judgment (Estate of Lokken v. UnitedHealth Group, 2026 WL 658883, D. Minn. Mar. 9, 2026). Parallel suits exist against Humana and Cigna. In February 2024, CMS issued guidance clarifying that algorithms can assist in predicting patient needs but cannot solely determine coverage.

Notice what is different from Air Canada. Here, human oversight is structurally present. Medical directors review denials. The architecture, on paper, has a human in the loop. What the plaintiffs are alleging — and what discovery may now show — is that the structurally-present oversight is operationally absent: the human review exists, but the algorithm wins. A statistic from the complaint, if it survives discovery, becomes the entire architectural argument in one number.

The mechanism: client-side and server-side oversight

The cases above — Cursor, Replit, Gemini, Meta, the European AI company, Air Canada, UnitedHealth — share an architectural property. The property is easiest to name by borrowing from a distinction every web engineer already knows.

A web form can validate user input in two places. Client-side validation runs in the browser: it inspects what the user typed and refuses to submit if something is wrong. Server-side validation runs after the request leaves the browser, on the server itself, and refuses to accept whatever arrives if something is wrong.

Client-side validation is convenient. It is also bypassable by construction. A user with browser developer tools open, or a script issuing a direct HTTP request, can submit anything; the client-side checks were never in the way. This is not a bug in a particular client-side implementation. It is a property of where the check sits. A check that the caller controls is a check that the caller can omit.

Server-side validation is the opposite. The server owns the check. The caller cannot reach the protected resource without traversing it. The caller does not need to cooperate, or even know the check exists, for the check to function.

The same distinction applies to oversight. I call them client-side oversight and server-side oversight, and the gap that public discourse has been naming as governance theatre is, in almost every case, the gap between the two.

Client-side oversight is oversight installed by the caller — the AI agent, the service, the workflow runtime, the application that wants to take the action. The agent’s framework includes a “request human approval” step. The chatbot’s prompt template includes an “if uncertain, escalate” clause. The hiring system includes a “the recruiter will personally review” statement. In every one of those cases, the caller is the same actor that benefits from the action proceeding. The oversight depends on the caller honouring it.

Server-side oversight is oversight installed by the system owner around the protected action, downstream of the caller. The action cannot proceed without traversing the oversight surface, regardless of whether the caller knows it is there or wants it to be. The check is binding in the same sense that server-side validation is binding: the caller does not have the option of skipping it.

The fresh cluster — Cursor’s invented policy, Replit’s destroyed database, Gemini’s hallucinated commands, Meta’s unsolicited agent post, the European AI company’s recycled outreach — share the client-side property. In each case the oversight surface was a promise the caller made and the caller could break: an instruction to maintain a code freeze, a system prompt to verify before acting, an “I’ll personally review” claim inside a message body, a “Sam” signature attached to a fabricated policy, an unspoken expectation that an agent invoked for analysis would not post on its operator’s behalf. The promise lives inside the same actor as the action; honouring the promise is the actor’s choice. The Mexican-government breach is the same shape with the attacker substituted for the user: the agent accepted a client-side claim (“I’m running a bug-bounty programme”) and acted on it without server-side verification, exactly as it would have accepted a legitimate instruction. Air Canada is the degenerate boundary case: there was no oversight at all, but the chatbot’s surface was the company’s surface, so liability flowed back regardless. UnitedHealth is the harder, more interesting case: the structural form of server-side oversight — clinicians review denials — collapses, in the allegations, into something operationally indistinguishable from client-side. The algorithm produces the decision, the human signs the decision, and the architecturally-present oversight does not bind.

Client-side oversight is to AI governance what client-side validation is to web security: bypassable by design.

Article 14: server-side oversight, as a procurement position

For most of the past three years, the distinction between client-side and server-side oversight has been an interesting architectural observation. As the EU AI Act enters operation, it becomes a procurement position.

The EU AI Act — Regulation (EU) 2024/1689, official text dated 13 June 2024 — comes into force in waves. The most consequential wave for oversight is Article 14, “Human oversight.” Its obligations were originally scheduled to apply in full to providers of high-risk AI systems on 2 August 2026; on 7 May 2026, the Council and Parliament reached provisional agreement under the simplification package to delay the high-risk obligations to 2 December 2027 for stand-alone systems and 2 August 2028 for systems embedded in regulated products (Council of the EU, 7 May 2026). The delay shifts the enforcement date, not the architectural requirement — procurement, conformity preparation, and audit work proceed now. The article’s core requirement is two words long: “effective human oversight.”

What “effective” means is then specified. Article 14(4) requires that the natural persons assigned to oversight be enabled, as appropriate and proportionate, to “properly understand the relevant capacities and limitations of the high-risk AI system,” to “remain aware of the possible tendency of automatically relying or over-relying on the output,” and to “be able to decide not to use the high-risk AI system or otherwise disregard, override, or reverse its output.” For systems used in biometric identification, Article 14(5) goes further: no decision may be acted on unless “separately verified and confirmed by at least two natural persons with the necessary competence, training and authority.”

Read those clauses against the client-/server-side distinction. The right to “disregard, override, or reverse the output” is operationally meaningless if the oversight surface is on the caller’s side — because the caller is what produced the output, and the same caller is being asked to disregard it. The two-person verification requirement for biometric identification is, by construction, server-side: it bolts a check downstream of the system, owned by people who are not the system.

Article 14 is, in effect, a regulatory mandate for server-side oversight in high-risk contexts. It does not use those words; it does not need to. The architectural form that satisfies the requirement is the one where the gate sits around the action, owned by parties other than the caller. No other form satisfies it.

United States enforcement is moving in the same direction without the EU’s coherent framework: California and Texas have imposed healthcare-specific restrictions on medical-necessity AI; Colorado’s AI Act covers high-risk systems in employment and insurance; state attorneys general are deploying unfair-and-deceptive-practices statutes against deceptive AI claims (Morgan Lewis, 2 April 2026). Industry-side data reflects buyer awareness: 84% of organizations doubt they could pass a compliance audit on AI agent behaviour (Cloud Security Alliance and Strata Identity, Securing Autonomous AI Agents, February 2026).

The procurement consequence follows quickly. A vendor whose human-oversight claim is, on inspection, client-side cannot truthfully attest compliance with Article 14(1)–(4) for a high-risk deployment. A buyer who accepts the claim and the attestation has a liability of their own. The layer that public discourse has been naming as governance theatre is, in any serious high-risk procurement now, also a compliance layer and a procurement layer.

Effective oversight, in Article 14’s sense, is server-side oversight. No other architectural form meets the requirement.

What server-side oversight looks like

If client-side oversight is the failure mode, server-side oversight is what defeats it. The architectural shape is roughly fixed, regardless of which vendor or stack supplies it.

Four properties recur in any design that holds up.

The gate is installed by the system owner, around the protected action, downstream of the caller. The caller’s framework, prompt template, or runtime does not configure it; the system that owns the action configures it. A caller that cooperates passes through it. A caller that refuses to cooperate cannot reach the action, because the gate sits between the caller and the action by topology, not by convention.

The decision is cryptographically attested to a specific person. Not a session token, not an API key, not a bearer secret in a Slack message that anyone with read access to the channel could replay. The approver authenticates with a credential that is theirs alone — in practice today, passkey or WebAuthn — and the decision is signed in a way that ties it to that credential and that approval payload. Whatever continuation the system permits depends on this signature, not on a flag in a database row.

The continuation is scoped. The approved action — refund this payment, deploy this commit, export this dataset — yields a capability that proceeds with exactly that scope and no more. Not a generic role elevation, not a session that survives the action, not a webhook secret that an attacker reading the audit log could replay against a different action. The capability is bound to action, resource, parameters, and time, and it expires.

The evidence is tamper-evident. The record of the request, the decision, the capability, and the outcome is written to a structure that can be independently verified — at minimum a hash chain, ideally a chain whose verification function can be run by parties outside the system that produced it. Logs describe what happened. Evidence survives a hostile audit.

Those four properties are not exotic. They are what every serious financial system has been doing for human-initiated approvals for thirty years; the only new thing is that we are now applying them to actions initiated by software. This is the problem we work on at Approva, and it is not unique to us: any architecture that meets Article 14 in a high-risk deployment will have something like this shape.

Bypassable by design

What this piece has been arguing reduces to a few claims worth restating side by side.

It says that the public discourse on AI governance has correctly named a real gap and has not yet diagnosed where the gap lives. It says that the gap is not primarily a culture problem or a procedural problem; it is an architectural problem with a precise location. It says that oversight installed by the caller is bypassable by construction, regardless of how loudly the caller promises otherwise. It says that oversight installed by the system owner around the action is binding, regardless of whether the caller cooperates. It says that EU AI Act Article 14 makes this distinction enforceable rather than rhetorical.

That is not a refinement of governance theatre. That is the mechanism the theatre framing was reaching for.

With the Article 14 readiness window open, the question every buyer should be asking — and every vendor whose pitch includes “human oversight” should expect — is no longer whether the oversight exists. It is whose architecture installs it. Oversight the caller installs is oversight the caller can omit. The hard part is not noticing the theatre. The hard part is building the gate.

Source list

Теги
ai-governanceai-agentsai-securityregulated-systemscompliance

Связанные проекты

Инфраструктура подтверждений для AI-действий

Approva

Подтверждение человеком для рискованных действий AI-агентов — passkey-идентификация, ограниченные capabilities, проверяемый audit trail. Human-in-the-loop как инфраструктура.

Подписка

Еженедельный разбор сигналов прямо в почту.

Один email в неделю. Никакого спама. Отписка одним кликом.