In this Help Net Security interview, Gidi Cohen, CEO at Bonfy.AI, addresses what he sees as the most pressing gap in AI agent security: data-layer risk. While the industry focuses on prompt injection and model behavior, Cohen argues the deeper threat is autonomous AI agents operating across systems with no visibility into what data they access, combine, or expose.
He explains how Bonfy.AI approaches this through three areas: controlling what data agents can access for grounding, monitoring content as it moves through tool calls and MCP servers, and letting agents query Bonfy in real time to check whether an action is safe before they take it. The conversation covers threat modeling, anomaly detection, multi-agent delegation, model versioning, and practical advice for CISOs navigating pressure to deploy AI at scale.

When we talk about “AI agent security,” most people immediately think about prompt injection or jailbreaks. What’s the threat vector that keeps you up at night that almost nobody in the industry is preparing for?
The threat that keeps us up at night isn’t another clever jailbreak, it’s autonomous data misuse by AI agents operating across systems the enterprise doesn’t fully see, understand, or govern yet.
Most of the conversation today is still “LLM-centric,” prompt injection, jailbreaks, model behavior. But in large organizations, the real risk is shifting to the data layer of increasingly autonomous workflows: agents that can read from many systems, call tools and MCP servers, and then take actions (like send emails, update records, publish content) without a human in the loop at every step. Once you have that, any mistake in how data is accessed, combined, or shared quickly becomes a systemic exposure problem, not just a bad answer on a chat screen.
What almost nobody is prepared for is that these agents don’t live on a single endpoint or inside a neat perimeter. They run in Microsoft, Google, Salesforce, custom app frameworks, MCP-based toolchains, often as “system agents” that aren’t even tied to a specific user session. Traditional DLP, DSPM, and browser-centric controls were never designed to watch data as it flows through a multi-hop chain of LLM calls, vector stores, MCP servers, and downstream automations. So organizations end up effectively flying blind: they don’t know which sensitive content is feeding agents, which tools receive it, or where AI-generated outputs with regulated or customer-specific data land.
That’s the vector we focus on at Bonfy: protecting the organization’s data throughout the full lifecycle of AI and agents, not just protecting the model from bad prompts. Our platform applies the same contextual, entity-aware controls to humans, systems, and AI agents, across email, SaaS apps, collaboration tools, Copilot, MCP-connected agents, and custom GenAI workflows. We control what data is available for grounding, inspect what goes into prompts and tools, monitor what comes out into emails, files, and knowledge bases, and now even let agents call our MCP server during their own reasoning to ask, “Is this safe to share?” before they act.
If you assume agents will be everywhere, and that they will eventually touch your most sensitive customer, employee, and IP data, the critical security question is no longer “Can someone jailbreak the model?” but “Do we have data-layer guardrails that work at the speed and scale of autonomous AI?”
A traditional application has a relatively predictable blast radius when it’s compromised. An AI agent that can browse the web, write files, call APIs, and send emails does not. How do you even begin to threat-modeling something that dynamic?
You cannot threat-model AI agents the way you threat-model a single web app; you have to threat-model the data flows and actors around them, end-to-end.
For us, the starting point is to stop thinking “one application, one blast radius” and instead map a chain of four things:
- What data the agent can be grounded on
- Which tools and MCP servers it can call
- Which humans and systems it is effectively impersonating
- All the outbound channels where its outputs can land.
Once you can see that full multi‑hop path, you can assign risk not just to the model, but to specific agents, tools, users, and data sets.
In practice, we do three concrete things here. First, we control the grounding step with granular, contextual labeling and access control on the underlying data sources, so you can define which content is even eligible to be pulled into an agent workflow for a given business context. Second, we monitor upstream and downstream traffic, prompts, retrieved docs, emails, files, SaaS updates, across channels, so you can see when an agent’s behavior creates a confidentiality, integrity, or privacy incident in the real world. Third, we plug into the agent’s reasoning loop via our own MCP server, so agents can ask us in real time, “Is this content safe to send to this tool, this user, or this destination?” before they act.
That gives you a very different kind of threat model: instead of trying to predict every possible action of a dynamic agent, you define and enforce guardrails on the data that can flow through it, the entities it can impact, and the points where that flow must be inspected or stopped. Over time, because Bonfy tracks humans, systems, and AI agents as first‑class risk entities, you can see which agents consistently operate near dangerous trust boundaries and tighten controls there, rather than treating “AI” as one monolithic, uncontrollable blast radius.
When an agent chains together multiple tools, each tool call potentially exposes data to the next step in the chain. Is anyone auditing those intermediate states, and what does that audit look like?
Right now, almost nobody is truly auditing those intermediate states, and that’s exactly where a lot of the real risk hides.
When an agent chains tools together, each call is effectively a mini data‑sharing event: the agent is taking some slice of context, handing it to a calendar API, then to a CRM MCP server, then maybe to an email‑sending service. As Gidi put it, when that happens “each tool call potentially exposes data to the next step in the chain,” but most of today’s “agent security” focuses on configuration – what tools are allowed – not on the actual content flowing between those tools.
Our view is that you have to treat those intermediate states as first‑class audit points. That’s why we expose Bonfy as an MCP server the agent can call during reasoning: instead of blindly passing context from Tool A to Tool B, the agent can invoke Bonfy in between, “Is this safe to share with this specific tool or destination, given who the data belongs to and where it’s going?” Every one of those checks is logged with what was inspected, which policies fired, what entities (customers, employees, consumers) were involved, and what decision was made, so you have an auditable trail across the entire chain – not just at the first prompt and the final email.
In practice, that audit looks less like a traditional API log and more like a data‑plane journal for the agent’s workflow: step‑by‑step records of the content the agent read, what it tried to send to each tool, the risk rating and labels Bonfy applied, and whether we allowed, modified, or blocked the action. Because it’s the same entity‑aware engine we use for email and SaaS, security teams can answer questions like “Which agents exposed EU customer data to external MCP servers last week?” with real evidence, instead of hoping the agent framework’s configuration pages tell the whole story.
Traditional SIEM and log analysis is built around human actors with consistent behavioral baselines. What needs to change about anomaly detection when your “actor” can spin up, complete a goal, and disappear in under 30 seconds?
When your “actor” is an AI agent that lives for 30 seconds, you still can’t anchor anomaly detection in long‑term agent behavior alone; you have to anchor it in content, context, and the human or system behind that agent instance.
Traditional SIEM assumes stable identities and patterns over days, weeks, and months; you baseline a human, then look for deviations. With agents, the pattern is inverted: the agent identity is ephemeral, but the data it touches, the user it may be acting on behalf of, and the trust boundaries it crosses are very real and often persistent. So anomaly detection has to move from “Is Alice behaving strangely today?” to “Is this combination of content, destination, and actor – human, system, or agent instance – acceptable for our business context right now?”
That’s exactly where Bonfy focuses. We analyze the unstructured content itself, enriched with entity awareness – which customer, which consumer, which product line, which regulatory regime – and correlate that with who or what is acting: an employee, a service account, a Copilot scenario, or a short‑lived AI agent, plus the relationship between them. Even if the agent spins up and down in under a minute, the data trail it creates across email, SaaS apps, collaboration tools, and AI systems is visible through a single, contextual lens.
We then model both humans and agents – and the links between them – as first‑class entities in our Knowledge Graph, so you can attribute risky patterns not just to a transient agent ID, but to the user behind it (where applicable), specific agents or agent classes, and their role in the broader business context. Over time, you’re no longer flying blind with thousands of invisible bots; you’re managing a portfolio of human and non‑human actors and their relationships, all evaluated through the same data‑centric risk model.
Multi-agent systems, where one agent orchestrates several others, introduce a delegation chain. How do you prevent a compromised sub-agent from poisoning the trust relationship with the orchestrator?
The uncomfortable answer is that in most multi‑agent systems today, nobody is really protecting that delegation chain, they’re trusting that if the orchestrator is “good,” everything downstream will behave. From Bonfy’s point of view, you have to flip that: you treat every sub‑agent call as untrusted from a data perspective and give the supervising agent the tools to inspect what goes in and what comes out before it accepts or forwards anything.
Concretely, the orchestrator should never blindly consume a sub‑agent’s output. At least from a data perspective, we give the supervising agent a Bonfy MCP tool it can call inline to inspect the sub‑agent’s input and output and verify it does not violate any policy, including confidentiality, privacy, and data‑integrity checks such as “does this summary suddenly include another customer’s data or unexpected PHI?” The orchestrator’s prompt literally encodes this behavior: “Delegate to sub‑agents, but before acting on their results or passing them on, verify with Bonfy that the content is safe for this destination and business context.”
Because Bonfy looks at the content itself, enriched with entity awareness, which customer, which consumer, which product line, which jurisdiction, it can flag when a compromised or mis‑behaving sub‑agent tries to inject sensitive or inconsistent data into the chain, even if all the agents are short‑lived and share a generic identity in the framework. All of those checks are logged on the same platform we use for email and SaaS: you get an audit trail of which orchestrator called which sub‑agent, what data flowed, what policies triggered, and whether Bonfy allowed, modified, or blocked the orchestrator’s next step. In other words, we’re not trying to “trust” the delegation chain into behaving – we’re instrumenting it so that any sub‑agent output has to pass a data‑centric policy gate before it can poison the rest of the workflow.
AI agents frequently rely on MCP servers, plugins, and third-party tool integrations. That ecosystem is growing faster than anyone can vet. Are we sleepwalking into a supply chain crisis?
We’re not just sleepwalking into a supply chain crisis, in many enterprises, we’re already there. We’ve just decided to trust whatever tool an agent feels like calling.
An MCP server or plugin is effectively a black-box micro‑vendor that your agents can hand sensitive data to in the middle of a workflow. In a typical environment you can have dozens or hundreds of these tools (internal services, third‑party APIs, enrichment feeds) all being orchestrated dynamically by LLMs with no human in the loop and very little security review. From a data‑security perspective, every one of those tools is now part of your AI supply chain, but very few organizations treat them that way.
Most of the early “AI agent security” market is focused on configuration posture: what agents exist, which tools they can see, which permissions they’re granted. That’s necessary, but it’s not sufficient, because just like any other software, those tools can be used safely or abused depending on what data flows through them. We deliberately focus on the data layer instead of just the configuration layer: what content is being sent to which MCP server, which entities it refers to, which jurisdictions it touches, and whether that combination is acceptable for your business and regulatory context.
Concretely, we give organizations three levers. First, we control grounding at the data source with granular, contextual labeling so you can prevent certain classes of information, say PHI, EU PII, or customer‑specific deal terms, from ever being eligible for a given agent or tool in the first place. Second, we monitor and enforce on the way out, analyzing emails, files, SaaS updates, and other outputs generated via agents, regardless of which plugins they used along the way. And third, through our own MCP server, we let agents ask us in real time, “Is this safe to send to this tool or this destination?” before data is handed off to a third‑party service.
So yes, there is an AI‑era supply chain problem building up around MCP servers and plugins, but the way out is not to freeze innovation or somehow vet every tool in the ecosystem. It’s to put data‑centric guardrails in place so that, no matter how fast the agent ecosystem grows, sensitive content is governed consistently across every agent, every tool, and every workflow.
Model providers update their weights, sometimes silently. An agent that behaved one way on Monday may behave differently on Friday with no change to your own code. How should security teams be thinking about model versioning as a compliance and risk issue?
Security teams need to assume the model is a moving part of the supply chain, not a fixed component they can fully certify once and forget.
You may not control when your provider tweaks weights, safety layers, or routing, but you can control the data guardrails around whatever model happens to sit behind an endpoint. For us, that starts with treating model versioning as a compliance‑relevant change: you want to know which classes of data each application can send to “an LLM,” and you want evidence that, regardless of whether that’s Model X on Monday or Model Y on Friday, the same policies are being enforced on prompts, retrieved documents, tool calls, and outputs.
Our approach is intentionally model‑agnostic. We don’t embed ourselves into a specific customer model; we operate as a customer‑agnostic AI data‑security layer that inspects content in and out of agents, copilots, and LLMs using our own entity‑aware engine. As Gidi has emphasized, the fact that Bonfy operates with customer‑agnostic AI models means you can apply the same safeguards even if your underlying LLM usage changes over time – knowingly or not – because the policies live in our platform, not in a particular model checkpoint.
From a risk and compliance perspective, that gives you two critical things. First, a stable, auditable layer: you can show regulators and auditors a consistent record of which sensitive, regulated, or customer‑specific data was allowed or blocked at the data plane, even as model versions evolved behind the scenes. Second, a way to detect when model behavior shifts in risky ways – for example, suddenly including more granular customer details in summaries – because Bonfy continues to classify, label, and enforce policies on the content itself, independent of which model produced it.
Some vendors are marketing “secure AI agents” almost as a feature checkbox. What does rigorous agent security look like, and how does a security buyer cut through the noise?
“Secure AI agents” is not a checkbox, it’s an end‑to‑end discipline that has to follow the data wherever agents read, reason, call tools, and write.
From our perspective, rigorous agent security has three pillars. First, you control the grounding: which content an agent is allowed to see in the first place, using granular, contextual labeling and access rules on systems like SharePoint, email, CRM, and other SaaS apps. Second, you protect data in‑use during the agent’s reasoning: when it calls MCP servers, plugins, or internal APIs, you need inline inspection that can tell you whether it’s about to hand PHI, customer‑specific details, or regulated content to the wrong tool or third party. Third, you govern the outputs: emails, files, tickets, and other artifacts the agent generates must be checked for leakage and policy violations before they hit a human or an external system.
Where buyers get lost is that a lot of “agent security” offerings stop at configuration posture – listing agents, toggling tools, managing permissions – without ever truly seeing what data flows through those automations. That’s necessary hygiene, but it won’t save you from an agent that’s perfectly “configured” and still exfiltrates customer data via an allowed MCP plugin. Bonfy deliberately focuses on the data layer instead of just the control plane: the same entity‑aware engine we use for email and SaaS applies to agent prompts, retrieved documents, MCP calls, and outputs, with one set of policies governing humans, systems, and AI agents alike.
If you’re a security buyer trying to cut through the noise, we’d suggest three simple tests. Ask vendors: Can you see and classify the actual content flowing into and out of my agents, across all my major channels – not just log which tools they’re allowed to call? Can you enforce policy consistently for both humans and agents, so that “this customer’s data cannot leave this boundary” is true everywhere? And can my agents query your platform in real time – for example via an MCP server – to check whether a given action is compliant before they execute it? If the answer to any of those is no, you’re looking at a checkbox, not rigorous agent security.
If you could change a thing about how the security industry is approaching AI agent risk, before we reach a major public breach that forces the conversation, what would it be?
I’d change one thing: stop treating AI agent risk as an abstract “future AI problem” and start treating it as a very concrete data problem that is already in production today.
Right now, AI adoption is outpacing governance; agents are already reading, transforming, and generating sensitive content across email, SaaS apps, internal systems, and MCP‑connected services, while most organizations have no unified visibility into what data those agents touch. The industry is pouring energy into models, prompts, and configuration posture, but far less into a basic question: where is my confidential, regulated, or customer‑specific information flowing as these automations execute multi‑step workflows?
From Bonfy’s perspective, the shift we need is to put data at the center of the agent‑risk conversation. That means building systems that can see and classify unstructured content wherever it moves, understand the people, customers, and jurisdictions behind that content, and apply consistent policy whether the actor is a human, a SaaS app, or an ephemeral AI agent. It also means giving agents a way – via mechanisms like our MCP server interface – to ask in real time, “Is this safe to send or store here?” instead of assuming their toolchain will do the right thing by default.
If we make that mental shift now, we don’t have to wait for a headline‑grabbing breach to discover that we were effectively flying blind while AI automated the movement of our most sensitive information.
For a CISO who is being pressured by the business to deploy AI agents at scale while simultaneously being held responsible for data security outcomes, what is the most honest advice you can give them?
The most honest thing we can tell a CISO in that position is: do not accept “deploy first, figure out the data risk later” as the operating model, even if that’s the pressure you’re under.
You’re not going to stop AI agents; you can, however, insist on a phased rollout where the first deliverable is visibility, not automation. Start by instrumenting the channels where agents will read and write – email, collaboration, SaaS apps, internal systems, MCP‑connected tools – so you can see what sensitive, regulated, or customer‑specific content they would touch if you turned them fully loose. That real data gives you the leverage to have an adult conversation with the business: “Here is where we can safely automate today, here is where we need guardrails, and here are the use cases that stay human‑in‑the‑loop for now.”
From there, move deliberately from visibility to policy to prevention. Use entity‑aware controls so the same policies apply whether the actor is a person, a SaaS workflow, or an AI agent, and give agents a way to call into a service like Bonfy’s MCP interface to check content in‑flight rather than trusting static configuration alone. That lets you say “yes” to AI at scale, but on your own terms – with measurable controls, auditable decisions, and a defensible story when the board or regulators ask how you kept data safe in an agent‑driven world.
If you’re a CISO being told to ‘deploy AI agents now and keep all the data safe,’ don’t argue AI versus no AI – insist on the order of operations. First you turn on deep visibility into where sensitive, regulated, and customer‑specific content flows across email, SaaS, collaboration, and agents; then you layer in policies; only then do you allow large‑scale automation with agents calling a service like Bonfy in‑flight to check ‘Is this safe to send or store here?’ before they act. That’s how you say yes to AI at scale without betting the company on blind trust in someone else’s configuration.
from Help Net Security https://ift.tt/aPJAIoG
0 comments:
Post a Comment