Edward Roozenboom: AI agents as a prime target for hackers

Edward Roozenboom: AI agents as a prime target for hackers

Risk Management Artificial Intelligence

This column was originally written in Dutch. This is an English translation.

By Edward Roozenburg, Senior Risk Management Consultant, Probability & Partners

If you’re not careful with AI agents, you’ll be handing them the keys to the pension fund and letting them write down the safe combination straight away. Just what a hacker needs!

Let’s take a quick look into the near future: a pension fund deploys an AI agent to carry out control tests on processes relating to benefit awards and payments. The agent reads files, goes through log files, compares decisions with policy, interprets the results of the tests and assesses the residual risk. This is faster and more accurate than the control tester has managed so far. This is precisely the kind of development we see in organisations that are taking AI more seriously: from assistant to actor.

That is where the temptation begins. Anyone wishing to automate control testing properly must recognise that anomalies rarely occur within neat, well-defined datasets. They arise precisely at the edges of the process: exceptions, corrections, historical decisions and manual workarounds. An agent that sees only part of the reality misses exactly where things go wrong. The question soon arises: can’t we give the agent just a little more access?

That step feels logical. After all, to interpret anomalies properly, an agent needs context. That means insight into the entire process, access to historical data and the ability to establish links between datasets. And there’s something else: this agent not only carries out checks, but also assesses the residual risk. You’ll want to ensure that assessment isn’t based on a limited view. After all, the more data, the better the analysis. AI excels at processing large amounts of raw data and recognising patterns that are difficult for humans to spot.

Before you know it, the scope expands. First a limited dataset, then additional sources ‘for better context’, followed by links to other systems. The agent gains access to participant records, benefit files, audit trails, and sometimes even to communications and internal notes. Not because anyone is reckless, but because it helps to produce better analyses. The logic is consistent. So is the outcome: an agent with broad access rights.

This creates a risk that is fundamentally different from what we are used to. It not only increases the possibilities for attack, but also centralises them. Whereas data and permissions are normally spread across systems and roles, they are now brought together in a single component designed to oversee everything. It is precisely this that makes such an agent, on the one hand, highly attractive to the organisation and, on the other, particularly attractive to the hacker.

It is easy to imagine what happens if this agent falls into the wrong hands due to a vulnerability in a tool being used, a leak in an API connection or a misconfigured account. A hacker then no longer needs to move step by step through the organisation. The agent is already doing that: legally, efficiently and with the correct permissions. Instead of having to breach different systems individually, an attacker gains access via a single point of entry to precisely the information that is normally fragmented and protected: participants’ personal data, details of benefits, exception files and audit information. Not because everything is stored centrally without ‘firewalls’, but because a single agent establishes the connections and has the rights to use them.

In practice, this is not an exceptional scenario. In fact, we see that AI applications develop organically. First a small-scale application, then expansion to other processes, and finally integration into the core of the organisation. Governance and authorisation policies regularly lag behind the technological possibilities. That is understandable. The value of AI is immediately apparent, whilst the risks only manifest themselves later.

In light of these developments, it is necessary to rethink authorisation policy. This is particularly true now that AI agents are increasingly able to perform actions independently and even generate code.

What should you focus on in this regard? First and foremost, access must be task-oriented and temporary. Not ‘everything that might be relevant’, but only what is necessary for a specific check, and only for as long as is necessary. Furthermore, the separation of duties must also apply to agents. An agent that checks, analyses and advises combines roles that you would deliberately keep separate in the case of humans. Furthermore, every action taken by the agent must be transparent and traceable. Without clear logging and auditability, the output quickly loses its evidential value. Finally, this requires technical measures to limit the impact should things go wrong. Think of sandbox environments and clear boundaries on what an agent is and is not permitted to do.

Broad access rights for agents also exacerbate other AI risks, such as data breaches, prompt injection and supply chain issues. They also amplify the consequences of the absence of a human in the loop. AI is excellent at identifying and analysing, but the interpretation of residual risks remains context-dependent. Fully automating that responsibility is a step too far for organisations.

The call is therefore to actively prepare your authorisation policy (and, of course, other information security measures) for AI use. What we see in practice is that the appropriate control measures depend heavily on what is already in place and on the characteristics of the organisation. There is no one-size-fits-all solution. And it is precisely this that offers many opportunities to deploy AI in such a way that opportunities are exploited and risks are mitigated where necessary.

One thing is certain: the more powerful the AI agent, the more critical it becomes to know what it is allowed to see and, above all, what it is not. And… the more attractive it becomes to the hacker.