Back to Insights
·
AIAgentic AIGovernanceFinancial ServicesAustralia

The Control Stack Every AI Agent Needs

AI agents in financial services do not fail because they are too autonomous. They fail because firms give them too much ambiguity and too little governance.

The risk is not that your AI agent acts alone

The risk is that it acts inside a mess.

That is where most governance conversations in financial services go wrong. Leaders hear "AI agent" and jump straight to failure modes: a bad decision, the wrong client email, a regulatory incident. Those things matter. They are usually not the first problem.

The first problem is ordinary.

An agent is pointed at unclear workflows, inconsistent data and half-written policies. Then the firm acts surprised when the output is hard to trust.

We keep seeing the same pattern in advice and wealth businesses. The ambition is real. The controls are vague. Compliance teams are asked to review a tool after someone has already fallen in love with the demo.

Friction.

In Australia, regulators are not waiting for a bespoke "AI agent law" before caring. They are applying the obligations that already exist. ASIC has been explicit that technology-neutral duties still apply. Norton Rose Fulbright's February 2026 compliance primer makes the same point across ASIC, APRA, OAIC and AUSTRAC: if an AI system sits inside a regulated process, the existing rules still bite.

That means the real job is deciding what kind of operating environment they are allowed to exist inside.

Agent governance is not one control. It is a stack.

Most firms talk about AI governance as though it is a single document.

It is not. An agent that can read, decide, draft, escalate and trigger actions crosses multiple control surfaces at once. One policy and a vendor checklist will not cover the behaviour.

A better mental model is this: every AI agent in financial services sits on a control stack.

Four components matter most.

  1. Mandate
  2. Evidence
  3. Containment
  4. Escalation

Miss one, and the others start compensating badly.

A firm with good containment but no mandate creates a locked-down system that still does the wrong work. A firm with strong evidence but no escalation gets well-documented mistakes.

That is why the conversation needs to move past "do we have an AI policy?"

The useful question is whether the stack exists.

Component 1: Mandate

Every agent needs a job description.

Not a vague aspiration. Not "help advisers be more productive". A real mandate: what the agent is for, what workflow step it owns, what decisions it may support, what decisions it may never make, and what outcome counts as success.

This is also where firms get lazy.

A note-taker can be given a broad brief and still do something useful. An agent cannot. Once a system starts chaining tasks together, ambiguity compounds. If the mandate is fuzzy, the agent fills the gap with whatever the prompt and data happen to produce.

That is wishful thinking with a user interface.

For Australian financial services providers, this matters because the legal duties attach to the function, not the novelty of the tool. ASIC's position is clear: section 912A obligations to provide financial services efficiently, honestly and fairly do not disappear because an algorithm touched the workflow. Misleading conduct risk does not disappear either. Directors' duties do not disappear either.

So the first control is simple: write down the mandate in operational terms.

For example:

  • An agent may assemble a first draft of a file note from approved meeting transcripts and CRM data.
  • It may not determine product suitability.
  • It may prepare a review pack for an adviser.
  • It may not send client-facing recommendations without human approval.
  • It may flag suspicious transaction patterns for analyst review.

That level of specificity forces the firm to separate support work from judgement.

That distinction matters more with agents than with copilots. A copilot waits for instructions. An agent keeps moving until it hits a boundary.

Component 2: Evidence

An agent is only as reliable as the record it can see.

This is where a lot of "agentic AI" enthusiasm crashes into the plumbing of a real business. The demo assumes a neat world. The live environment is not neat. Client names differ across systems. Risk profiles are current in one place and stale in another. Advisers keep important nuance in emails, PDFs and half-structured notes.

In one recent stream of work around compliance mapping, the interesting question was how much of a conversation could still be trusted once it met the actual record. That is the issue. Extraction is easy. Evidence is harder.

Agents make this sharper because they draft from data, then act from it.

If a file-note agent draws from a transcript, CRM fields and prior review documents, the firm needs to know which source takes priority when they conflict. If a transaction-monitoring workflow uses machine learning to surface suspicious patterns, AUSTRAC still expects the reporting entity to maintain effective systems, controls and implementation plans that manage money laundering, terrorism financing and proliferation financing risk through change.

AUSTRAC's 2025-26 expectations are blunt on this point. Current reporting entities were expected to act from 31 March 2026, continue existing controls, document implementation plans, show sustained progress and strengthen frameworks now. "We are transitioning" is not a defence if controls are ineffective.

For agent governance, evidence means three things:

  • Approved sources: the systems an agent is allowed to read from.
  • Source hierarchy: what wins when records conflict.
  • Traceability: what the agent used, when, and in what version.

Without that, review becomes theatre.

The human sees a polished draft, but cannot tell whether it came from the right facts.

Component 3: Containment

Containment is the part firms usually mistake for governance.

Permissions. Access controls. Vendor due diligence. Security review. Environment separation. Human approval gates. All necessary. None sufficient on their own.

This layer matters because agents widen the blast radius of ordinary operational weaknesses. A chatbot that drafts text is one thing. An agent that reads internal systems, calls APIs, updates records and triggers tasks is a different category of exposure.

APRA's standards make this concrete for prudentially regulated firms. CPS 234 pulls information assets, model inputs and outputs into the information security frame. CPS 230 pulls operational resilience and service-provider dependency into the picture. Norton Rose Fulbright highlighted a practical deadline many boards will care about: AI vendor contracts need to be updated to align with CPS 230 by the next renewal and no later than 1 July 2026.

Even where a wealth firm is not directly APRA-regulated, the operating lesson still holds. Agent governance fails quickly when third-party dependencies, access privileges and failure modes are treated as technical footnotes.

Containment should answer a short list of questions:

  • What systems can this agent access?
  • What actions can it trigger?
  • What credentials does it use?
  • What happens if a downstream vendor fails or returns a bad result?
  • Can the agent be switched to read-only mode?
  • Can the workflow be paused without breaking the business?

Privacy belongs here too.

OAIC guidance has been consistent: commercially available AI products can infer and generate personal information beyond what a user thinks they provided. That matters in advice, where client context is sensitive by default.

A good containment model makes the safe path the easy path.

A bad one leaves the shortcuts wide open.

Component 4: Escalation

This is the layer most firms skip.

They assume a human review step equals control.

It does not. If the agent has no explicit conditions for when to stop, ask, or hand over, then every exception arrives at the human in the same shape: too late, half-processed, and harder to untangle.

Agents need escalation rules the same way staff do.

A competent paraplanner knows when a case is routine and when it needs to be pushed up the chain. A compliance analyst knows when something is outside tolerance. Agents need the same operating logic built in.

That means defining escalation triggers such as:

  • conflicting client data across approved systems
  • missing mandatory evidence
  • low-confidence extraction from a meeting transcript
  • recommendations touching high-risk product areas
  • unusual transaction patterns that meet threshold conditions
  • model or vendor failures during a multi-step task

This is where "human in the loop" becomes a design choice.

Not every human check is equal. Some are cosmetic. Some are genuine control points. The difference is whether the human is reviewing a routine output, or resolving an exception the system was designed to recognise.

The second one scales. The first one becomes admin.

That is the practical shift from copilot thinking to agent governance.

Do not ask a human to rubber-stamp everything. Ask the system to know when it has reached the edge of its authority.

The assembled view

Mandate. Evidence. Containment. Escalation.

That is the control stack.

Most firms do not need a 40-page responsible AI framework before they can govern an agent properly. They need a clearer operating model.

When the stack is in place, useful things happen.

Mandate reduces ambiguity. Evidence reduces fabricated confidence. Containment reduces blast radius. Escalation reduces silent failure.

The components reinforce each other. A clean mandate makes escalation easier to design. Good evidence makes human review faster and more defensible. Strong containment makes vendor risk visible early. Clear escalation rules stop staff from treating every polished output as trustworthy.

That is how governance becomes usable.

By creating enough structure that agents can do bounded work inside regulated environments without relying on luck.

This is the free consulting bit

If you are an advice firm, platform, licensee or wealth business considering agents, start here:

  1. Pick one workflow, not ten. File notes, review-pack preparation, onboarding checks, AML triage. One lane only.
  2. Write the mandate in one page. What the agent may do, may support, and may never do.
  3. Name the evidence sources. Which systems are approved, which record wins, what gets logged.
  4. Map containment before rollout. Access, vendors, credentials, failure modes, read-only fallback.
  5. Design escalation conditions up front. Do not wait for the first ugly edge case to teach you where the boundary was.

If a firm cannot do those five things, it is not ready for an agent in a regulated workflow.

It may be ready for experimentation. It is not ready for production.

That is the difference between curiosity and control.

Australian financial services firms do not need to fear AI agents. But they do need to stop treating governance as a document produced after procurement.

Because the hard part is not getting an agent to act.

The hard part is making sure it knows exactly where to stop.