Your AI Vendor Passed the Security Audit. But Did Anyone Check the Model?

Your AI Vendor Passed the Security Audit. But Did Anyone Check the Model?

Your AI Vendor Passed the Security Audit. But Did Anyone Check the Model?

How the OWASP Top 10 for LLM Applications gives procurement teams the language they have been missing

Your AI Vendor Passed the Security Audit. But Did Anyone Check the Model?
Your AI Vendor Passed the Security Audit. But Did Anyone Check the Model?

Procurement teams in regulated organisations have spent years getting good at evaluating cloud vendors. They know how to read a SOC 2 report. They know what to look for in a Data Processing Agreement. They can spot a vague sub-processor clause at fifty paces.

None of that prepares them for what happens when the product being procured is an AI system built on a large language model.

The attack surface of an LLM application is structurally different from a conventional SaaS platform. The model itself introduces vulnerabilities that have no equivalent in traditional software. No ISO certification was designed to catch them. No penetration test was scoped to look for them. The standard evaluation toolkit does not fail because it is bad. It fails because it was built for a different kind of product.

There is a framework that was designed for the model layer specifically. The OWASP Top 10 for LLM Applications names the risks that traditional security checklists leave completely unaddressed, and most procurement teams in Europe have never encountered it.

A security framework built for the model layer

OWASP, the Open Worldwide Application Security Project, has been publishing its Top 10 list of web application vulnerabilities for more than two decades. If you work in IT security or software development, you have almost certainly come across it. The web version is a de facto industry standard, referenced in audit frameworks and compliance requirements worldwide.

In 2023, OWASP published a companion list specifically for applications built on large language models. The reasoning was simple: LLM applications introduce vulnerability categories that do not exist in traditional software, and the original OWASP Top 10 was never going to cover them. The LLM list was updated to its 2025 edition in late 2024, which tells you something about how fast this space is moving.

The ten entries span prompt injection (manipulating a model into ignoring its instructions), sensitive information disclosure (the model revealing data it should not have access to), supply chain risks (unaudited third party components in the AI stack), excessive agency (the model taking actions beyond its intended scope), system prompt leakage, vector and embedding weaknesses, data poisoning, and others.

What makes it particularly useful for a procurement audience is that OWASP is a nonprofit, community driven project. No vendor wrote this list. No consultancy is licensing it. It is freely available, well documented, and maintained by practitioners with no commercial stake in how your evaluation turns out. That independence matters when you need a standard the vendor did not author themselves.

What your current vendor assessment does not cover

SOC 2 Type II evaluates whether a vendor’s internal controls around security, availability, and confidentiality operate effectively over time. ISO 27001 certifies that a vendor has implemented an information security management system. Both are valuable. Neither was designed to answer the question: can this model be manipulated into leaking personal data through a carefully worded prompt?

That is a question about scope, not quality. SOC 2 and ISO 27001 evaluate infrastructure and process controls. They cover how data is stored, who has access, how incidents get handled. What they do not cover is the behaviour of a probabilistic system that processes natural language, produces unpredictable outputs, and may have been granted access to sensitive data or downstream tools.

The practical result is that a vendor can hold every relevant certification, pass every audit, and still deploy an LLM application with serious vulnerabilities at the model layer. If you have sat through a vendor evaluation where the security section consisted entirely of certification logos on a slide, you know how this goes. The building passes inspection. Nobody checks what the tenant is doing inside it.

Five risks procurement teams should care about

Not all ten entries on the OWASP LLM list carry equal weight for someone writing an RFP or evaluating a vendor proposal. Some are primarily relevant to the development team building the application. Others land squarely on the desk of whoever is deciding whether to let this system into the organisation. Here are the five with the most direct consequences for procurement.

Prompt injection sits at the top of the list. It describes the class of attacks where a user or adversary crafts input that causes the model to override its original instructions. The direct version might trick a customer facing chatbot into revealing its system instructions or internal data. The indirect version is sneakier: malicious instructions embedded in a document the model retrieves as part of a search or summarisation task. The vendor should be able to explain their layered defence strategy for this. If they look uncomfortable when you ask, that tells you something.

Sensitive information disclosure is where the list intersects most directly with GDPR. If a model can surface personal data in its outputs, whether from its training set, its retrieval context, or its conversation history, the data controller has a problem. Under GDPR, the controller is liable regardless of whether the disclosure was intentional. The vendor needs to explain how personal data enters and exits the model at both training and inference, and what data minimisation measures are actually in place. “We take data privacy seriously” is not an answer to this question.

Supply chain vulnerabilities deserve more attention than they typically get. Almost no enterprise AI deployment is built from scratch. The vendor’s application probably depends on a base model from one provider, an embedding model from another, a vector database, retrieval plugins, fine tuning datasets sourced externally, and evaluation tools from somewhere else entirely. The procurement team evaluates the vendor. But who evaluates the vendor’s vendors? An RFP should require disclosure of the AI supply chain. If the vendor cannot produce one, you are buying a system whose foundations you have no way to inspect.

Excessive agency becomes relevant the moment an AI system can do more than answer questions. If the model can send emails, modify records, trigger workflows, or execute code, the issue is no longer just what the model says. It is what the model does. OWASP identifies three root causes: the model has access to tools it does not need, it operates with higher privileges than required, or it takes high impact actions without human approval. Procurement teams should ask for a clear map of what the system can do, what permissions it holds, and where a human being has to sign off before something irreversible happens.

System prompt leakage is a newer addition to the 2025 list and one that many organisations underestimate. System prompts often contain operational logic, internal rules, access configurations, and in some cases even API keys. Because LLMs are probabilistic, there is no reliable way to guarantee that a system prompt stays confidential once it is part of the model’s context. OWASP is blunt about this: system prompts are not security controls. If a secret is in the prompt, treat it as exposed. Ask the vendor whether any credentials, internal logic, or access rules live inside the system prompt. The correct answer is no.

Turning the list into an evaluation instrument

The value of the OWASP LLM Top 10 for procurement is that it gives procurement officers a structured set of questions that security, legal, and compliance teams can all work from. Nobody needs to become a security engineer to use it. The framework does the translation work.

In practice, this means taking each relevant OWASP entry and turning it into an RFP question the vendor must answer, an evaluation criterion the assessment team can score, or a contract clause that creates an enforceable obligation once the deal is signed. For prompt injection, the RFP question might ask the vendor to describe their layered defence strategy including input filtering, output validation, privilege separation, and monitoring. For sensitive information disclosure, it might ask for a data flow diagram showing how personal data moves through the system at training and inference.

Some organisations are beginning to create what might be called an LLM Security Schedule as an annex to their Data Processing Agreements. It is a dedicated section that addresses model layer risks separately from the infrastructure risks the standard DPA already covers. This is still uncommon, but the organisations doing it report that it reduces ambiguity on both sides and makes it considerably easier to hold the vendor accountable if something goes wrong.

Checkbox compliance versus structural verification

There is a meaningful difference between a vendor saying they address prompt injection and a vendor showing you how. Self attestation is where most vendor security assessments start, and too often, where they stop. The vendor fills in a questionnaire, attaches a policy document, and procurement checks the box.

For traditional IT risks, that approach has known weaknesses but broadly works. A firewall rule either exists or it does not. An encryption standard is implemented or it is not. The controls are deterministic and auditable. LLM risks are different. A vendor can implement input filtering for prompt injection and still be vulnerable to techniques that bypass those filters, because the model’s behaviour cannot be fully predicted.

This is why procurement should push for architectural evidence. Data flow diagrams that show how information actually moves through the system. Documentation of model access controls and inference isolation. Evidence that the vendor has run red team testing against OWASP LLM categories. Hosting jurisdiction documentation for both the model and the data it processes. None of this makes the procurement process adversarial. It makes it specific.

Where infrastructure architecture reduces the attack surface

Several risks on the OWASP LLM list get worse depending on how the underlying infrastructure is set up. Multi tenant inference environments, where multiple customers share the same model infrastructure, widen the surface area for sensitive information disclosure. Cross border data routing creates uncertainty about which privacy regime applies to any given interaction. Opaque third party model dependencies make supply chain risks harder to evaluate because the vendor themselves may not have full visibility into what they depend on.

EU hosted, single tenant infrastructure removes some of these risks at the architecture level. When inference runs in an isolated environment within a single jurisdiction, certain categories of cross contamination and jurisdictional exposure go away by design. That does not make prompt injection or excessive agency or system prompt leakage disappear. Those are application layer problems that need application layer controls regardless of where the infrastructure sits. But it does mean the infrastructure is not making them worse.

Procurement as the last line of defence

Procurement is no longer just buying software. When an organisation procures an AI system, it is deciding which model architectures, which data flows, and which risk profiles will operate inside its perimeter. That decision has consequences under GDPR if data leaks. It has operational consequences if the system behaves in ways nobody anticipated. And it has reputational consequences that no policy document or insurance contract will fully absorb.

The OWASP Top 10 for LLM Applications does not make those risks go away. What it does is give procurement a shared vocabulary with the security team, the legal team, and the DPO. It provides a framework for asking the right questions before the contract is signed, not after the incident report lands on someone’s desk. Traditional certifications still matter and should remain part of every evaluation. But they need a companion that covers the model layer. OWASP provides one. The organisations that integrate it now will spend less time and money cleaning up problems they could have caught at the front door.

References

OWASP Top 10 for LLM Applications 2025 genai.owasp.org/llm-top-10/

Full PDF - OWASP Top 10 for LLMs v2025 owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-v2025.pdf

OWASP GenAI Security Project owasp.org/www-project-top-10-for-large-language-model-applications/

OWASP Top 10 for Agentic AI Applications genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/

GitHub repository github.com/OWASP/www-project-top-10-for-large-language-model-applications

About GLBNXT

GLBNXT provides sovereign AI solutions built for regulated industries. Our platform is 100% EU-hosted, GDPR-compliant by design, and built with zero training on client data. We work with legal practices, government advisory firms, and enterprises that need AI they can trust. To learn more or schedule a demonstration, visit www.glbnxt.com.


This website and its contents are the exclusive property of GLBNXT. No part of this site, including text, images, or software, may be copied, reproduced, or distributed without prior written consent from GLBNXT B.V. located at Druivenstraat 5-7, 4816 KB Breda, The Netherlands, registered with the Dutch Chamber of Commerce (KvK) under number 95536779. VAT identification numer (VAT ID) NL867171716B01. All rights reserved.

This website and its contents are the exclusive property of GLBNXT. No part of this site, including text, images, or software, may be copied, reproduced, or distributed without prior written consent from GLBNXT B.V. located at Druivenstraat 5-7, 4816 KB Breda, The Netherlands, registered with the Dutch Chamber of Commerce (KvK) under number 95536779. VAT identification numer (VAT ID) NL867171716B01. All rights reserved.

This website and its contents are the exclusive property of GLBNXT. No part of this site, including text, images, or software, may be copied, reproduced, or distributed without prior written consent from GLBNXT B.V. located at Druivenstraat 5-7, 4816 KB Breda, The Netherlands, registered with the Dutch Chamber of Commerce (KvK) under number 95536779. VAT identification numer (VAT ID) NL867171716B01. All rights reserved.

This website and its contents are the exclusive property of GLBNXT. No part of this site, including text, images, or software, may be copied, reproduced, or distributed without prior written consent from GLBNXT B.V. located at Druivenstraat 5-7, 4816 KB Breda, The Netherlands, registered with the Dutch Chamber of Commerce (KvK) under number 95536779. VAT identification numer (VAT ID) NL867171716B01. All rights reserved.