Security & GRC Decoded

The Trust Gap in AI: Why Agents Need a New Certification Model ft Rajiv Dattani & David Meyer @ AIUC

Raj Krishnamurthy Season 1 Episode 37

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 38:08

In this episode of Security & GRC Decoded, Raj Krishnamurthy sits down with Rajiv Dattani and David Meyer from Artificial Intelligence Underwriting Company (AIUC) to explore one of the biggest unanswered questions in AI security:

Can organizations actually trust AI agents?

As enterprises rapidly deploy AI-powered products, copilots, and autonomous agents, traditional security assessments, compliance frameworks, and cyber insurance models are struggling to keep pace. Rajiv and David explain why insurers are beginning to exclude AI-related risks, why historical loss data no longer works in the age of AI, and how AIUC-1 was designed to become a trust and assurance framework for AI systems.

The conversation explores AI certification, AI insurance, agent security testing, reliability, safety, accountability, statistical risk modeling, and the growing challenge of securing increasingly autonomous systems.


Key Takeaways:

  • Traditional cyber insurance models are struggling to underwrite AI risk because historical loss data becomes obsolete as models rapidly evolve.
  • AIUC-1 combines governance controls, technical evaluations, and large-scale simulation testing to assess AI agent security and trustworthiness.
  • AI assurance requires more than security controls—it must also evaluate reliability, safety, accountability, privacy, and societal impact.
  • Statistical testing and large-scale simulations may become the foundation for measuring AI risk in probabilistic systems.
  • The AI security community will play a critical role in shaping standards, liability models, and best practices for future AI deployments.


What You’ll Learn:

  • Why many insurance carriers are beginning to exclude AI-generated risks from cyber policies
  • How AIUC-1 differs from frameworks like NIST AI RMF, OWASP LLM Top 10, and MITRE ATLAS
  • How AI agents are tested through both black-box and white-box security evaluations
  • Why reliability and hallucination risks become more important in multi-agent environments
  • How AI certification may influence future insurance pricing, risk management, and enterprise adoption


This podcast is brought to you by ComplianceCow — the smarter way to manage compliance. Automate evidence collection, eliminate screenshots, and scale your program with confidence. Learn more: https://www.compliancecow.com

Watch more episodes: https://www.compliancecow.com/podcast

Connect With Our Guests:
Rajiv Dattani | Cofounder | AIUC
David Meyer | GTM | AIUC

Connect on LinkedIn: https://www.linkedin.com/in/rajiv-dattani/

https://www.linkedin.com/in/david-meyer-8586b17b/

Rate, review, and share if you enjoyed the show!

Subscribe to Security & GRC Decoded wherever you get your podcasts:

Spotify: https://open.spotify.com/show/5pigcMwOrYIA6d9OOOsxqr?si=416b82ab5c474683


Apple Podcasts:

https://podcasts.apple.com/us/podcast/security-grc-decoded/id1795144450


Raj Krishnamurthy (00:00)
Hey, hey, hey, welcome to another episode of Security and GRC Decoded. I'm your favorite host, Raj Krishnamurthy. Today we have, I have the pleasure of having Rajiv Dattani and David Meyer from AIUC, artificial intelligence underwriting company. Did I say that right, Rajiv? Okay. ⁓ Rajiv is the co-founder of AIUC and AIUC is in the business of certifying and insuring AI agents and David works with Rajiv in the certification program at AIUC.

Rajiv Dattani (00:18)
You did.

Raj Krishnamurthy (00:30)
Welcome to the show guys. Rajiv, let me go straight to you. What prompted you to start AIUC? Why?

Rajiv Dattani (00:31)
Great to be here. Thanks for having us.

David Meyer (00:32)
Yeah.

Rajiv Dattani (00:37)
We spent a lot of time exploring the gap that's emerging between the frontier of AI capability and risk and where the rest of the world is on assessing that risk. So my my professional background is I spent a decade at McKinsey. I helped a lot of large enterprises on how do you get comfortable with the risk with cloud and new technologies that come out. And and we've seen with with cloud, for example, this process takes decades for some companies to get comfortable with. On the other hand, since I left McKinsey, I was at

Meta, the AI risk evaluator, and whilst there I led the partnerships with the labs and the and the governments, I saw just how sophisticated some of the techniques are and how advanced some of the research now is around assessing these capabilities, and really just felt this enormous gap. And there's a trust and a confidence gap between the frontier and the rest of the world. And so we we built AAUC to really bridge that gap. We think of ourselves as building the confidence infrastructure.

That allows a CISO, a security leader, a risk leader, to definitively answer can I trust this AI agent?

Raj Krishnamurthy (01:41)
Okay, and when you say trust this AI agent, is this about AI agents being used inside the company? Is it about AI agents being embedded as products, which are both? What is it?

Rajiv Dattani (01:53)
It's AI agents that are ⁓ it it it's both. And it's AI agents that are speaking to customers, handling sensitive data, exposed to have access to systems that are high risk and contain contain sensitive information. And really this is now everywhere, right? Even even your traditional SaaS vendors are embedding AI. And the question we we spend a lot of time with with CISOs with on is ⁓ how are you assessing that risk? And

What are the risks out there that your data is being exposed to and how's it being used? And and how do we help you get a get a grip

Raj Krishnamurthy (02:30)
Got it. One question I had, I was trying to look this up, and one of my friends, Mosey Platt, pointed me to this. And this was completely unrelated, right? And he pointed to me at an article that basically was written on somewhere mid-April that says, insurance carriers quietly back away from covering AI outputs. And this was an article written by Grant Ross. And what they're basically saying is that,

The insurance companies don't want to underwrite any policies for claims related to AI-generated outputs for cyber security errors and omissions. I guess your entire purpose is to go solve the problem, right? Is it that these guys do not know about you? How should I process this?

Rajiv Dattani (03:14)
Yeah, and and ⁓ there there's the article you mentioned that the FT have also done a couple of the Financial Times have done a couple of deep dives on this in the last few months as well. Most of the major carriers have now filed for ⁓ what's known as insurance language as exclusions for AI risks. ⁓ the reason for that is usually insurers wait to let data emerge. They'll wait on the sidelines for a couple of years of a new technology. We saw this with cyber twenty-five years ago, and

They want to see in the real world how is the system or the or the risk emerging, what's it gonna cost, how frequent is it gonna how frequently is it gonna occur. And with AI, clearly it we're too early for that. Now, our thesis is actually the insurance model needs to be disrupted by this. And the reason is even if you had three years of lost data on AI, it wouldn't help you today. If you had lost data on how ChatGPT creates issues, is that gonna help you ensure an Open Claw instance or a Claude code instance? Absolutely not.

And the way we solve that is through our AIUC-1 certification, we have three kind of layers of this. One is we check for the policies, ⁓ much much like other standards do. The second is we check the code base, how are the safeguards configured, and ⁓ what what's the input filtering, what's the output filtering. And then the third layer is, and this is what's new to AI, is we run thousands of simulations. You can think of it a bit like a cyber pen test.

Raj Krishnamurthy (04:38)
Mm.

Rajiv Dattani (04:41)
And the reason this matters is ⁓ AI systems are valuable but also risky because they're stochastic in nature. They're not deterministic. So but by running the same tests and advanced tests thousands and thousands of times, you can simulate what the loss distribution will be for an insurer. And so we partner with some of the world's oldest cyber insurers at Lloyd's of London, and they have provide they have built with us a dedicated AI.

E and O policy that is designed to cover these risks for companies that are AC-1 certified.

Raj Krishnamurthy (05:16)
Got it. No, think that's fantastic. I want to talk about the certification process. I'm sure our listeners will want to understand. But before I do that, Rajiv and David, I want to talk about your business model. It looks like a very interesting proposition, because we in the cybersecurity world are typically used to seeing standards that is different from advisors and implementers and auditors and certifiers. You seem to be all rolled up into one. Is that correct?

Can you talk about the business model? What is the business model for you?

Rajiv Dattani (05:47)
Yeah. Absolutely.

We we were all of these when we launched ⁓ initially. And the reason we were all of these is we think the especially early on, the feedback loop between building the standard and applying it in practice, helping companies get ready and then certifying against it shows and we learnt a lot through this process where are all the gaps, what are the things in the standards that aren't clear? What are the risks that are still being left unaddressed by the standard? ⁓ now that we've done that and

We've run a number of these. We're now opening up the ecosystem. So ⁓ Shellman are w were our first accredited auditor. Coalfire have now been provisionally accredited. There's many more auditors coming in the accreditation pipeline. We work with them to train them. We will do the quality assurance on their work to make sure they've interpreted the standard correctly, consistently. They're holding a high bar. We must avoid another SOC two where the standard just gets hollowed out over time and is applied inconsistently.

⁓ we're also at the start of the journey of of opening up the technical testing. Today we run the technical testing in-house. And the reason for that is very hard to know if somebody's run the technical tests accurately, if it's been run rigorously. ⁓ but we're working through that problem at the moment, and we'll have we'll have more on that this year, certainly. and then we're also now working with MSPs who can go and do the readiness. So we're we're ⁓ slowly opening all of these out. We will continue to do some of these ourselves, right? So to your question on the business model.

We will we will continue to be the certification body, we'll charge a fee for that. And we will do some of these services ourselves also because that keeps us close to it. And then we will also make revenue on the insurance product, where if a company has been certified wants to then be insured, the insurers pay us a fee for that, for that data that they're using, the AAC one data. And we think that matters because that also aligns our interest with our customers' end users' interest. Because if ultimately our our revenue is linked

to how well we price the policy, then actually we have every incentive in the world to hold the bar high rather than hollow it out.

David Meyer (07:50)
One one thing I would add as well to to what Rajiv is saying is the the way we develop the standard and the way we update the standard every quarter is also ⁓ a process which is very collaborative and we involve we have a consortium of maybe Fortune five hundred security leaders we meet with every month who

really are there to tell us what's happening on the ground, what risks they're seeing, and we incorporate this feedback in every quarterly release of the standard. So even the definition of the requirements is also something that is pretty open and collaborative process on top of the verification itself.

Raj Krishnamurthy (08:23)
OK,

got it. And how do you see yourself coexisting with a whole bunch of other requirements, if I will? I mean, there is the OS top 10 from an LLM perspective, that is Mitra Atlas. There are a whole bunch of these frameworks. And NISA has an AI Databricks, for example, has an AI governance and AI security framework. How do they all correlate with what you're doing at AI UC-1?

Rajiv Dattani (08:48)
Yeah, we we partner with all of them. So all all the ones you mentioned, ⁓ we've partnered closely with. We have crosswalks on the website that show just how these frameworks relate to each other. So that's with OWASP, with Databricks, with Mitra Atlas, with a lot of the Cloud Security Alliance work as well, which is which is excellent in this space. The way we think about it is a lot of these other frameworks are either specific on a subset of risks. So OWASP top ten I would put in this category, it covers a specific set of

technical risks, it doesn't go broader into the accountability in the policies. Or they're operating at one level higher of abstraction than where we are. So the way we think about it is something like NIST, say IRMF, is kind of the overall principles at the highest possible level. Then one level low lower than that is something like the CSA's AICM, ⁓ which takes it and makes it much more practical. But still neither of these frameworks define the specific

evidence and the control activities that would definitively tell you, yes, I've met the standard or not met it. We take all of that into the operational level. And we we think that that that's the role of the market. The the nonprofits and the industry bodies are fantastic at setting the principles, but actually the market on on the front line by is by working with the vendors and with the companies that we can iterate very quickly on what exactly does good assurance look like, what does good security look like, and then incorporate that into the standard

We have a quarterly refresh cycle that that you might be familiar with, Raj, where we, together with the AOC One Consortium, will always review what are the requirements that didn't make sense, where is the ⁓ technology and the frontier advancing on both safeguards and on capabilities, and how do we incorporate that into the standard operationally within those principles that the other frameworks have set out.

Raj Krishnamurthy (10:31)
Got it. No, I think that's pretty cool. ⁓ Maybe we can, what does the certification process look like? How long does it take? ⁓ Can you describe that a little bit,

David Meyer (10:43)
Yeah, ⁓ happy to go into that. So typically the certification process applies at the level of the agentic product. So the first thing we need to do is understand ⁓ what the customer wants to ⁓ wants to certify as a product. ⁓ once we've understood this, we kick off a kind of eight weeks process. The first half of those eight weeks is focused on running those technical evaluations that Rajiv mentioned before. So we'll understand the context of the agent, the risk it's exposed to, its capabilities, design a suite of evaluations that represent

the risks the agent is exposed to in a normal production context, run them, grade them, and then at the end of those four weeks, first four weeks, that's when the auditors come in to verify the policies and verify the artifacts in the code that help govern the agent and ensure it's configured properly. ⁓ so typically eight weeks from kickoff to to certificate issued and and audit report delivered, ⁓ obviously this can change a little bit based on the the capabilities of the agent itself, but that's what you

Customer should should budget for.

Rajiv Dattani (11:45)
And just to clarify there, Raj, that is ⁓ from when the company is ready, meaning they have all the policies in place. So often it's a six month process to get ready for the AAC one audit and then the the final step is the is the eight weeks David described.

Raj Krishnamurthy (11:57)
Got it, beautiful. how should, when, let's fast forward this. For example, ⁓ UiPath went through this and we recently met with Sharon as well in the podcast. So when, let's say the leadership, the security office, right, sees this, or even the management leadership sees this, sees AIS, what do you want them to take away? Saying that we are AI.

UC-1 certified. What do you think it means for A, the CISO, B, let's say the CU?

Rajiv Dattani (12:31)
Yeah, I'll I'll I'll maybe say the highest level thing and then David you you can add about the the departments we spent time with. I think the highest level is ⁓ and and this was true at UAPath, these are the companies today getting certified who are not only ⁓ these are the companies who are not only being certified and having good security, but they are leading the way on setting the standard. This space is so new.

That if you want to be known as not only a AI capability leader, but also an AI security leader, that's where A partnering with AIUC one makes sense. And for lots of companies that doesn't make sense. They they would say we want this to be more established in the market and that to come later. That's completely fine. But the companies we're working with, we we really see as being on the frontier of AI security. David, you could talk more about what that means in practice.

David Meyer (13:20)
Yeah, absolutely. ⁓ in practice it means companies who have typically invested a lot in their AI security posture already and are looking for a third party validation of this investment and and kind of ensure certification that it's it's on the standard and it's compliant with the best practices ⁓ defined by really the people buying AI ⁓ who are represented on our on our consortium. ⁓ typically the agents that we recommend for certification are in production, obviously or close to production.

commercially facing and ⁓ sold to enterprises. And those three kind of criteria really determine where where the companies will get the biggest return on their investment in time for the the certification that we have.

Raj Krishnamurthy (14:05)
OK. And I think you guys have done a phenomenally great job of laying out some of the pillars on AI UC-1 data. Some of these are sort of very understandable data and privacy, security. But then you also have reliability, which is very interesting, and safety, accountability, and society. Maybe it will be helpful if you can maybe go down how you see these pillars. What should I take away from these pillars?

and how do you actually go test for them? You talked about Rajiv being ⁓ sort of AI pen testing. Maybe explain to us a little bit, how do you go about each of these pillars and what is the intent of each of these pillars?

Rajiv Dattani (14:44)
Yeah, may maybe I'll start with the intent and then David can talk more about how we test each of them. ⁓ maybe the highest level in intent is we want to cover all the risks that matter in one place. We don't want to be another framework where you need this and something else and something else. And so the way this came together is through we we've spoken to thousands now of security and risk leaders on really trying to understand all of the risks that matter to them and make sure that AAUC one can address them.

Some of the requirements are optional. So not everyone has to be tested for everything. But if your ⁓ system, for example, society that you mentioned is in there, if your system is capable of generating deep fakes, actually that is a societal level risk that matters to all of us. And we think it's important that that is covered alongside the specific data leak risk. We're seeing obviously with Mythos the ability of models to conduct cyber attacks, and we think it's a matter of time before ⁓ what's known as C B R N risks

⁓ are also here. So this is things like can can Yeah, so so this is things like ⁓ bio bioweapons. And can AI systems help with that is probably the next risk that will happen. Chemical is the C radioactive nuclear. Th these models are dual purpose and so they can ⁓ they can always help with the attack as well as helping with the defense. And cyber's the first domain we've seen this with Mythos, but that there's gonna be a lot more of these risks coming down the line.

Raj Krishnamurthy (15:44)
CBRN.

Rajiv Dattani (16:10)
And we want to be clear that AAUC one is designed to cover that, ⁓ as as those risks happen. ⁓ and then maybe just to to touch on the others you mentioned, accountability, clearly there is a lot here in how do the s how do humans relate to the systems? And this is going to become more important in the agentic era. And the EU I EU AI Act has a lot of requirements on this. So we want to make sure also that the these are in harmony and reliability we think matters because

Things like hallucinations and tool call failures actually can have quite catastrophic effects. And this is going to get dialed up when we have multi agent pipelines. So today we're mostly still in the era of single agents. It's kind of human to single agent. But as soon as we have agent to agent interactions and multi agent orchestration, the way those systems interact and the potential risk for cascading failures, if the ⁓

if that interaction is happening either insecurely or i is failing in some way, we think is significant and and so that's why those risks are covered in there. David, maybe you wanna talk a little bit about the the way we test for some of these.

David Meyer (17:17)
Yeah, I'll ⁓ I'll give you an example where so for instance, ⁓ if you look at our safety pillar, typically we'll look at the likelihood of ⁓

An agent engaging in offensive speech, right, when it's being elicited by a by a user. And our testing methodology here is to approach this by by two angles. ⁓ we have what we call kind of a black box testing. So that's where our evaluations will come in. So we don't care about what's in the agents, we'll submit ⁓ a prompt trying to elicit the agent to engage in this kind of speech, and then we'll take the answer, grade it, and see how likely it is to do it. And we'll couple that with a white box testing approach where we'll actually open the lead.

Look at what guardrails have been implemented, what mechanisms are there and configured in the code to prevent this type of behavior from the agent. And combining both of those approaches is what kind of gives us the certainty that one, the company has taken steps to secure their agents, and two, that those steps are actually efficient in preventing the type of behavior that they want to avoid. So that's really how we think about testing.

Raj Krishnamurthy (18:18)
Love it.

Love it. No, I think I love it. think ⁓ you said it, black box testing has to be coupled with some deeper insights into how the companies architect and build and deploy these systems. ⁓ I wanted to ask you ⁓ is on, I think you touched on MyThos and Project Glasswing, right? I think that has to some extent shunned sewers across, right, especially with zero-day vulnerabilities.

very late and 27 year old world vulnerability is being uncovered right now. How do you see that intersection? What does that mean for AIUC-1? Because how are you going to address, or is there a way you're addressing or not addressing those questions?

Rajiv Dattani (19:02)
Yeah, we we address the risks that come from AI systems specifically. So, to the extent that Mythos is is kind of exposing existing cyber risks, that is something that is ⁓ largely out of scope still. And the reason for that is there's clearly an entire cyber industry that's been built over decades that that that's working on that. We don't want to create overlap and confusion with that. The place we think this gets interesting is ⁓

If you are building AI agents, you should know that they are now capable of conducting some of these attacks if they don't have the right guardrails. So that's one area we help. I think a second area we can help is ⁓ the response to Mythos. And we've we've just published a white paper on this with a hundred or so of our consortium members. The response to Mythos needs to include ⁓ automated ⁓ SOC and ⁓ detection and how do you how do you as a security leader use AI?

internally inside your security org to keep up. And then I think some of the same questions that we're addressing in AAC one suddenly come up again of well, how do you know it's reliable? How do you know that it's secure actually? Because if it finds a vulnerability and then that agent gets hacked, that's a huge issue for you and your and your security team. And so that's where we see AAC one playing a role is in helping the CISOs who are using agents to respond to mythos and mythos like capabilities to make sure those agents are reliable, safe and secure.

Raj Krishnamurthy (20:28)
I see. see. Got it. One thing as an engineer that I've always found it very difficult to sort of comprehend is that a few years ago, when we build models, we build models. And there is a way to explain these models. Because there are a few hundred, few thousand, few tenths of thousands of parameters. That, explainatively, is practically impossible in the world of large language models. But you're talking about sort of

parameters going billions and trillions of tokens being processed on and so forth. Right, so my question to you is, a lot of this seems to be prompt and prompt outcome based, which means that it is inherently probabilistic as well. So how do you try to create explainability based on prompts and getting the outputs from those prompts?

David Meyer (21:17)
We do this by achieving statistical significance in the number of prompts that we run against the the system. So the way this would work is ⁓ once we understand an agent, what it does, what data it has access to, we'll define a number of

scenarios that are addressable scenarios that this agent might be exposed to in production. And therefore each of the scenarios and the scenarios are characterized by a risk and a type of attack. So a risk could be leaking data, a type of attack would be an encoding you know, a specific encoding of the prompt on the ⁓ on the agent itself.

Once we have the list of those risks, we'll define a distribution, which we think is representative of what the agent is exposed to in production, and then we'll run ⁓ and a total number of prompts and the type of prompt that needs to be run to achieve this kind of statistical significance. So the idea is really to try and describe the entire landscape of risk and threat that the agent is exposed to, and have enough evaluations to understand how likely the agent to engage in a type of behavior. ⁓

and and how to understand quantify this risk at the end of the

Raj Krishnamurthy (22:23)
Can you describe more about the community? Talk to us about sort of the, I see you guys are doing a phenomenally great job. I think I see some fantastic people joining the community and that's a great thing. But how are you thinking about this community? How are you going about building this community? And what do you see is the role of the community and what do you do?

Rajiv Dattani (22:42)
Yeah, it's a it it's a privilege to have the community we have, Raj, and and I saw lots of them have have been guests on your podcast as well. So there's a nice overlap there. ⁓ the ⁓ the question of how do we make sure security keeps up with AI, we think is just such an important question and such a hard question. It's just so hard for any one group to solve this alone. And we wanted to make sure that as we set about this mission, we

Bring in all of the wisdom that is out there in the community from across industries, from across geographies, from across domains. And so what that looks like in practice is, you know, clearly it's it's kind of CISO and security the heavy ⁓ from from from across these industries. But also we have AI researchers there. We have the the professor who leads Stanford University's trustworthy AI Research Lab. We have a professor at MIT and and and others there who are who are kind of helping us with.

What's happening on the frontier of AI research, because often research, especially in this domain, is a is a front runner to what's gonna happen in industry and in deployment in the months that follow. ⁓ we have policy experts, so some of them are in academics. We have ⁓ Dean Ball who who wrote President Trump's AI Action Plan and and worked in the White House until recently, ⁓ advising on how does this all fit together with with regulations and

and and other frameworks and and how should we think about this? We have legal scholars because clearly the these interactions are are going to be questions that the courts are going to answer as well. Who's liable when an AI agent fails? If you have bought, if you're a CISO and you have bought an AI agent from a vendor and that agent fails in a way the vendor knew could it could fail, are you now liable as because it's deployed on your systems and it's your customers or is the vendor liable? ⁓ and how should AUC one handle questions like that?

And and so that's really the intent of bringing it together. ⁓ we meet across topics. We we often just have have closed door sessions under Chatham House rule on what are you seeing, compare learnings and and and a space for for these industry leaders to to really ask their questions and share their learnings all the way through to at least once a quarter when we update the standard, we'll do a formal process to gather their input, they peer re peer review all the changes in the standard, and they also can help.

To hold us to account as as the company to make sure that ⁓ w we're acting in the best interest of the community.

Raj Krishnamurthy (25:09)
And how big is, can you describe, is there any way to describe how big is the community? How big, mean how much growth are you seeing? And more importantly, somebody who's watching this podcast wants to be part of the community, what should they do?

Rajiv Dattani (25:26)
Yeah, I the sh the short answer is contact me. We'd we'd love to have you. ⁓ R Rajev at AIUC dot com is my is my email or you can find me on LinkedIn. ⁓ we'd we'd love to hear from you. The community has grown extremely fast. So we only launched it six months ago. ⁓ at the time we launched it, we had about fifty names in there. ⁓ and and I think we had, yeah, thirty-five people or something on our first kickoff meeting. ⁓ we're now at two hundred and fifty people. And so also I'll I'll I'll say I'll share Raj that we're

Raj Krishnamurthy (25:52)
Okay.

Rajiv Dattani (25:56)
Exploring a little bit of what should the next evolution of the community be, right? Clearly we've now exceeded a scale where everyone can be part of one community and we're starting to think about how do we get working groups going and how do we start to specialize the the knowledge and the and and and the expertise we have in that community. ⁓ so more to come on this. We we haven't got all the answers figured out, but ⁓ that that's ⁓ it's certainly our hope to to build a community of of experts who are excited to to contribute to building AAC one and and share their learnings.

Raj Krishnamurthy (26:25)
that's beautiful. OK. I want to go back to the other question, is once I get certified, then what? mean, what is the, is there a continuous certification process? How should customers think about this?

David Meyer (26:36)
Yeah, so first when you get certified, you get a hundred-page audit report that describes everything that you're doing to make your AI agent secure, that includes the result of the test that we've been running on the agent. And this report is something that is typically kind of produced downstream to customers, stakeholders, as kind of a a way to signal trust and explain what you're doing. ⁓ we then rerun our technical test every quarter.

To make sure to catch any evolution in the product or in the underlying foundation models, and we'll just append the kind of most recent results to the certification report that you have. And every year you need to renew this certification. So we have this kind of yearly cycle with quarterly retesting that runs in parallel to the update of our standard, which is something we run with the community you just mentioned, where every quarter we'll issue new requirements or we'll

evolve and precise the requirements that we have to keep up with AI. And so we'll try to run those two timelines in parallel, making sure that the AIC1 certificate basically is up to date and attached to the reality of the latest threats, capabilities, risks that AI agents are exposed to.

Raj Krishnamurthy (27:47)
OK, got it. And is there a, do you eventually expect as you're sort of disintermediating this, right, where you're becoming the standard's body and you're going to have other participate in this? I don't know if you have heard of the STAR program in CSA where you can go report these signals. Do you have the infrastructure today where you can allow others to report their signals to you? Or are you building one? Can you describe a little bit about that?

Rajiv Dattani (28:17)
Yeah, we're we're certainly moving in that direction. The the challenge and I alluded to this earlier is it's very hard to know if somebody else has done the technical testing well. The tests firstly there's thousands of them and often each test is hundreds, some of them are thousands of lines of code, or text in foreign languages, or voice conversations, depending on the modality. And so if if we or anyone just received all of this evidence, we think it'd be very hard to pass and know, did you run the

kind of write rigour tests, have you graded them accurately? Have you covered all the risks? Would be a challenge. ⁓ there are solutions to these problems. We in particular we think standards like PCI did a very good job early days of thinking through how do you equip others to run the tests themselves and and submit evidence and and still hold that bar. So we're working actively on it, but we don't have a a solution we're quite ready to roll out.

Raj Krishnamurthy (29:12)
how do folks join your cause? And I know that you described this a little bit, Rajeev, who do you want to be part of this? Is there a particular sort of ⁓ type of folks that you would want to be part of this community?

how inclusive is it? Maybe can you describe that a little bit, you know, and what should people get equipped with to come join AIUC-1?

Rajiv Dattani (29:36)
Yeah. there's a couple of things we're looking for. One is ⁓ the pe the people who are really on the front lines of either making the decision on deploying a system or making a risk determination or not, and the people who are charged with kind of implementing a lot these practices. Right. So so typically you see this at the kind of CISOs and the GRC leaders are the other two kind of profiles we look for, but it it varies by org, especially in

financial services, for example, there's kind of technology and risk professionals as well of ⁓ are tasked with very similar questions and and are extremely welcome. ⁓ the core thing we're looking for is we're looking that the standard is only valuable if A, the content's good, so we always want input on that. But then B, if it's adopted in the real world, because otherwise we've just built an academic thing that that isn't actually ad advancing security. And so the people who are tasked with

approving their internal systems, building their internal systems, approving vendors who are coming in and need to make the risk determination on on can we trust this this agent or this vendor or not, and their input on what would it take for for us to want to use AAUC one and what what's the packaging that needs to be there, what's the product form that we should be building to serve them and serve the community. That input is extremely valuable for us and we'd always welcome it.

Raj Krishnamurthy (30:58)
OK. And one question I want to ask you is that when you write all these harnesses as you test, I can understand some of the, for example, profanity, right? And you can think like testing things like that. But how do you test for things like security and data leak? Because that is very something very specific to the company and its context, right? How are you going about, your end users equipped to make changes to these prompts? Are these prompts publicly published? ⁓

how am I going to tweak this to my context?

David Meyer (31:30)
⁓ I think the main thing we're looking to do is see how easy and how much effort does it take to manipulate the agent into a type of behavior that it's either not allowed to do according to its guardrails or to kind of filtering of data. And so the the type of techniques that we use to elicit you know, offensive speech or out-of-scope advice can also be reused to elicit other types of behaviors which relate to you know sharing data, ⁓

you know, going beyond your guard like going beyond the the intended guard rails. ⁓ then what this means

And how important it is will vary depending on the type of context and the type of customer, right? If you have an agent that's hosting data across multiple customers, making sure data is not shared across tenants is typically ⁓ extremely important. If you have an internal agent, that might be less important, but you want to be sure that it cannot be manipulated by employees to access specific sensitive data. So the context here is really the key. The expertise that we have is both mapping the risks.

And finding techniques to get agents to behave in a way that they're not supposed to, and then upgrading this the result of this behavior.

Raj Krishnamurthy (32:43)
Got it? OK. I think the one thing that I like in what you guys are doing is that I think in cybersecurity insurance, and I've heard this anecdotally before. I've not been an insurance practitioner or anything, so don't take me to task. for example, in order to insure cybersecurity insurance, one of the questions that they ask, especially for ransomware, is how many records do you have in a database or how many files? That's sort of very rudimentary sort of questions to assess how much I should price your premium, right?

But I think what you guys are doing is something very, very interesting, where you're actually providing more data and statistics behind how to model. I mean, how do you risk model these things? My question then becomes, the intent eventually, is the expectation that insurance premiums will go down or they will come to market to a happy balance through AIUC1? What is your expectation at a macro context of what AIUC-1 can do?

Rajiv Dattani (33:42)
Well the the highest level I think, Graj, at the moment, is if if you go back to what you said earlier, the a lot of insurers aren't actually offering AI insurance. And so how do we first give them confidence they should offer any kind of insurance here and the risk is well managed enough, well understood enough that they can price it and they can cover it. So that's step one for us. I think step two will be

Insurers, if you go into into history into other technologies, have played a really important role in creating the incentives for researching and deploying security and secure techniques. I'll just give you one anecdote. In the 1950s, when people were dying after World War II in car crashes on highways, Progressive, the auto insurer, said, We will give you a 20% discount on your car if you buy a car with an airbag in it. Way before this was federal law.

Or this was commonplace as as we think of it today. this is the role of insurance because they they're the ones seeing the data. They're the ones seeing the interventions that matter the most and the ones that don't. And then they can incentivize you to go out and adopt it. And so we'd love to build a similar flywheel here for AI, where every time it fails, there is a group of people who are incentivized to go and analyze that that data, research it, gather it, and say, here's the thing that would prevent the failures. And now let's go and get it.

get it adopted. And and that can happen through insurance, it can happen through regulation, it can happen through standards. Th there's lots of options here, but that's the core mechanism and the incentives we're trying to create.

Raj Krishnamurthy (35:07)
I'm sorry.

Got it. And I think in some ways, a lot of this has to be front loaded. A lot of this is architectural design discussions as well, ⁓ which I think you have to make sure that your R-backs are proper, your A-backs are proper. All those foundational things that we take for granted have to apply to AI as well. And I think what is super cool about what you guys are doing is that as you catch them, we get to build better systems and more secure systems. So absolutely, I agree with you.

Anything else that you both want to before we close out? I just want to give EELD a minute or so to you. Anything that you want to call out for school.

Rajiv Dattani (35:55)
Maybe just the the overall reflection is ⁓ there is just so much to do and it feels like unfortunately the timelines are just getting shorter for how quickly we need to develop the secure security techniques, get them adopted, get them deployed in the real world because ⁓ the next Methos class model is coming and before we know it there's gonna be open source models of that capability. and so

the invitation for all your listeners is is please join us on this journey. We'd be we'd be honoured to have you and ⁓ it it's it's a privilege to play a small part in in helping advance AI security.

Raj Krishnamurthy (36:33)
Rajiv, David, great having you on the show. Thank you very much.

Rajiv Dattani (36:36)
Thanks for having us, Russia.

David Meyer (36:36)
Yeah.