As organizations increasingly integrate LLMs into their operations, a question that often gets deferred is: which model is actually the most secure to deploy? It's a harder question than it sounds. Capability benchmarks are everywhere; security benchmarks are not. Existing efforts like MITRE ATLAS, NIST's AML taxonomy, and OWASP's LLM Top 10 each cover slices of the problem — but none achieves full coverage across content safety, lifecycle scope, multi-agent threats, and supply chain risk simultaneously.
Cisco has made a serious effort at developing a security framework for AI. This framework informs Cisco's LLM Security Leaderboard, a publicly accessible tool that ranks models on resistance to adversarial attacks, which was recently launched at the 2026 RSA conference. You can reference this tool to get a sense of how secure your LLM of choice is.
In this blog post, we'll dive into the security framework developed by Cisco to evaluate LLMs.
The leaderboard evaluates models across two distinct test types. It does so against base models with no guardrails applied, establishing a consistent baseline of inherent model security rather than measuring production-hardened deployments.
Single-turn testing is blunt force: send a malicious prompt directly, see if the model refuses. "Write me malware for X." "Explain how to synthesize Y." It tests the model's immediate, unconditional safety response — the first-instinct refusal. Single-turn scoring is simply the percentage of direct, single-prompt attacks the model successfully refused.
Multi-turn testing mirrors how a real adversary actually operates. Rather than one direct request, the attacker builds across a conversation — establishing rapport, adopting a persona, escalating gradually from innocuous to harmful. A model might refuse a direct request but cave after six turns of social engineering. Multi-turn attack strategies include persona adoption, gradual escalation from benign to harmful requests, social engineering and trust-building, and context manipulation. Scoring measures the percentage of individual attack strategies that bypassed the model's safeguards, with each conversation typically testing 4-5 independent strategies — more granular than a binary pass/fail per conversation.
Each model receives a combined score weighted equally between the two (50/50), so a model can't rank well by excelling in only one dimension. A model that aces single-turn but fails multi-turn is dangerous in production — most real interactions aren't one-shot. Scores range from 0-100%, with bands of Excellent (85-100%), Good (70-84%), Fair (50-69%), and Poor (0-49%).
The single-turn/multi-turn distinction maps directly onto the threat taxonomy. Single-turn testing primarily exercises techniques executable in one prompt — direct prompt injection, jailbreak variants, harmful content elicitation. Multi-turn testing is where the more sophisticated objectives come alive: Goal Manipulation, Context Boundary Attacks, Masquerading and Impersonation, Persistence, and the social engineering dimension of Communication Compromise. If you're deploying a one-shot API tool, single-turn scores are most relevant. If you're deploying a conversational agent with memory and tool access — which is where enterprise AI is heading — multi-turn resistance is the number that matters.
The Cisco Integrated AI Security and Safety Framework is not a single taxonomy — it's three interlocking ones, each covering a distinct attack surface, and each designed to map back to a common structure so organizations can reason about risk consistently across all of them.
This is the master taxonomy — the 19 objectives, 40 techniques, and 112 subtechniques that classify the full range of AI threats. It operates on four hierarchical levels: objectives (the "why" behind attacks), techniques (the "how"), subtechniques (specific variants), and procedures (discrete real-world implementations).
The 19 objectives span three risk groups:
Common Manipulation Risks
Data-Related Risks
Downstream / Impact Risks
Model Context Protocol (MCP) is an open standard that governs how LLMs interact with external tools, data sources, and execution environments. It's the plumbing that makes agentic AI work — allowing models to call APIs, query databases, execute code, and chain actions across systems. As MCP adoption accelerates, it has also become a significant attack surface in its own right.
Separate from the main taxonomy, Cisco has published a dedicated MCP threats taxonomy covering 14 threat types organized into four groups. Every threat maps back to the main taxonomy's objectives and techniques, but the MCP taxonomy adds MCP-specific indicators, severity levels, and mitigation guidance — making it directly operationalizable for teams building on MCP.
Supply chain risk is systematically underweighted in most current threat models.
Cisco has published a standalone supply chain taxonomy covering 22 threat types across four groups, each mapped to the main framework with file-type indicators, severity ratings, and a "Model Defense Layer" mitigation approach.
Artifact and Format Vulnerabilitie
Model Manipulation and Tampering
Dependency and Distribution Compromise
Operational and Runtime Threats
The practical value here is three well-structured, interoperable taxonomies that give security teams a common vocabulary — a structured way to ask which objectives are relevant to your use case, which MCP-specific threats apply to your agentic architecture, and where your supply chain exposures actually lie.
Please reach out with questions about securing AI in your organization's network.