Setting the Stage

The Shift to Behavior-Based Security

For as long as code security has been a concern, one question has always arisen from the lens of security.

Does this code do something dangerous?

The question was well-suited to the problem. Software did what it was programmed to do. A vulnerability was a flaw in the logic; a memory error that could be exploited, a misconfigured access control, an input handler that didn’t sanitize what it received. Find the flaw, fix the code, close the gap. Static analysis could read the code before it ran and flag patterns that looked like known vulnerability classes. Dynamic testing could exercise the application against known attack payloads and observe whether they produced unexpected behavior. The tools were built around a coherent model of how software fails, and that model was reliable enough to build an entire discipline around.

The question hasn’t become wrong. It has become insufficient. And insufficient, at the pace AI is being adopted and the scale at which it’s being deployed, functions the same as wrong.

What Changed About the Attack Surface

AI systems and agents fail differently than traditional software.

A language model doesn’t have logic in the way a traditional application does. It has behavior that emerges from the interaction between its training, its configuration, and the inputs it receives in the moment. That behavior is probabilistic, not deterministic. The same input can produce different outputs across invocations. The same model configuration can behave differently depending on what came before it in the context window. The application that passed every pre-deployment test can behave differently in production, not because something changed in the code, but because production inputs are different from test inputs, and the model’s responses to that difference weren’t tested for and couldn’t have been fully predicted.

This non-determinism is not a bug to be fixed. It is a fundamental characteristic of how these systems work, inseparable from the capabilities that make them useful. A model that was deterministic, that produced the same output for every input, without sensitivity to context, would not be a useful language model. The flexibility that enables genuine reasoning, nuanced response, and adaptation to diverse inputs is the same flexibility that makes the system’s behavior in any given moment not fully predictable from its code alone.

Traditional security tools were not built for this. Static scanning reads code. It identifies patterns that match known vulnerability signatures. It finds the buffer overflow, the SQL injection point, the hardcoded credential. It does none of these things by observing behavior, it does them by reading instructions and recognizing dangerous ones. For AI systems, the dangerous behavior doesn’t live in the instructions. It lives in the interaction between the model’s trained tendencies and the inputs it receives. That interaction is invisible to a scanner reading source code. It always will be, because the scanner is reading the wrong artifact.

The Attack That Lives in Language

Prompt injection is the clearest example of what behavior-based attacks look like and the clearest illustration of why static analysis cannot find them.

A traditional injection attack exploits the boundary between data and instructions in a software system. SQL injection works because the database interpreter doesn’t reliably distinguish between the query a developer intended to construct and the query an attacker can construct by embedding SQL syntax in user-supplied input. The fix involves enforcing that distinction through parameterized queries, input validation, or both. The vulnerability has a location, a specific point in the code where the boundary breaks down, and a fix that can be applied at that location.

Prompt injection doesn’t exploit a boundary weakness. It exploits the absence of a boundary. Language models are designed to follow instructions expressed in natural language. That is their core capability. When an attacker embeds instructions in content the model is processing, a document, an email, a web page, or an image, the model encounters those instructions the same way it encounters any other instructions, because it has no reliable mechanism to distinguish between instructions from a trusted source and instructions embedded in untrusted content. The attack doesn’t find a flaw in the implementation. It uses the implementation as designed.

There is no line of code to fix. There is no input field to sanitize in the way that closes a SQL injection vulnerability. The susceptibility is a property of the model’s behavior under certain input conditions; conditions that can be induced by an attacker who understands how the model processes text and who has any ability to influence the content the model will encounter. A static scanner looking at the application code would find nothing, because there is nothing in the code to find. The vulnerability exists in the behavioral layer, not the logic layer.

This is the new attack surface. It is made of language. It is accessible to anyone who can write text and deliver it to a system that will process it with a language model. It does not require technical sophistication in the traditional sense, no memory corruption, no network packet construction, no exploit development. It requires understanding how the model tends to behave and constructing inputs that exploit those tendencies.

Static scanning was never designed to see this. Not because static scanning tools are inadequate, they are excellent at what they were designed for, but because they were designed for a model of software risk that doesn’t describe AI systems accurately. Applying them to AI applications produces a false confidence. The scan passes. The vulnerability is invisible. The system ships.

The Moving Target

The non-determinism problem compounds in a way that security programs built around pre-deployment gates haven’t fully reckoned with.

Pre-deployment testing made sense as the primary validation mechanism when the system being tested was static after deployment. The code that passed the test was the code that ran in production. The configuration that was reviewed was the configuration that operated. A passing test was evidence of a secure state that would persist until something intentionally changed it, a new deployment, a configuration update, or a dependency upgrade, at which point the change would trigger a new test cycle.

AI systems are not static after deployment. The model that was evaluated before deployment is not necessarily the model operating in production six months later. Providers update their models without always notifying the downstream applications that depend on them; changing behavior, altering refusal patterns, adjusting how the model handles edge cases that might include the edge cases a security evaluation specifically tested. The retrieval data that feeds a RAG pipeline changes as documents are added, updated, or removed from the knowledge base. The tools connected to an agent expand as new integrations get approved. The context the model operates in shifts continuously through all of these mechanisms, producing a system whose behavioral profile is a moving target rather than a fixed state.

A pre-deployment gate that validates security at the moment of release cannot account for any of this. It validates a snapshot. The system it validated is already evolving beyond that snapshot the moment it reaches production. The test that passed is evidence that the system was secure at a specific point in time. There is no evidence that the system is secure now.

This is the deepest structural challenge that AI poses for conventional security thinking. Security has always understood that systems need to be maintained; patches applied, configurations reviewed, and vulnerabilities remediated as they’re discovered. But the maintenance model assumed that between intentional changes, the system was stable. The state that was validated persisted until someone changed it. AI systems don’t offer that stability. They can drift in ways that no one initiated, producing security-relevant behavioral changes that no maintenance event triggered and no monitoring was watching for.

The Question That Has to Change

Behavior-based security asks a different question than the one traditional security was built around.

Original Question: Does this code do something dangerous?

Instead, let’s rethink the question with the lens of behavioral security.

New Question: Is this system doing something it shouldn’t?

These questions sound similar. They are not. The first is answered before the system runs, by examining its instructions. The second is answered while the system is running, by observing what it does. The first requires a scanner and a codebase. The second requires continuous observation and a definition of what “shouldn’t” means for this system in this context, a behavioral baseline that can be compared against what’s actually happening.

Behavior-based security for AI systems means logging what the model receives and what it produces. It means establishing a picture of normal operation, typical input patterns, typical output characteristics, and typical tool invocations for agent-based systems, that makes anomalies detectable. It means building the capacity to ask, in real time or near-real time, whether the system is behaving as intended:

Are outputs consistent with the system’s stated purpose?
Is the model refusing requests it should refuse and fulfilling ones it should fulfill?
Is the agent invoking tools in patterns that match the use cases it was designed for?
Is the content flowing through the system consistent with what was expected, or does it include patterns that suggest manipulation or compromise?

These are observability questions, not code review questions. Answering them requires instrumentation in the deployed system, not analysis of the code before it ships. They require defining what correct behavior looks like…something that has to be done with specificity for each AI system, because “shouldn’t” is contextual in ways that don’t generalize across applications the way vulnerability signatures do.

This is significantly harder than running a scanner.

The operational discipline required to establish behavioral baselines, instrument systems for continuous monitoring, and build the analytical capacity to distinguish meaningful anomalies from noise is not something that most security teams have mature capabilities around today. The tooling is less established than the tooling for traditional application security. The patterns are less well-understood. The signal-to-noise problem is real — AI systems produce varied outputs by design, and distinguishing that intentional variation from variation that indicates a security problem requires judgment that is still being developed as a field.

None of this is a reason not to do it. It is a reason to be honest about where the discipline is, what capabilities organizations need to build, and what the alternative looks like.

The alternative is deploying AI systems with pre-deployment gates as the primary validation mechanism and minimal ongoing observation — which is the current state for most organizations. It produces systems that pass their tests and then operate, sometimes for months, in states that the tests never evaluated, producing outputs that no monitoring is watching, with behavioral drift that no baseline can detect because no baseline was established. It produces the conditions for consequences that announce themselves as breaches, regulatory findings, and incidents rather than as AI security failures — because by the time the consequence surfaces, the behavioral problem that caused it has been running undetected long enough that tracing it back requires reconstruction rather than observation.

Deployment Is Not the Finish Line

This isn’t the first time security has had to extend its reach past deployment. Cloud security has spent a decade building exactly that capability; runtime protection, workload monitoring, misconfiguration detection, CSPM. The discipline exists. The tooling is mature. The organizational muscle for watching what runs in production, not just what ships to it, is real.

The problem with AI systems isn’t that post-deployment security is a new idea. It’s that AI breaks the assumptions the existing discipline was built on.

Cloud security works because infrastructure behaves deterministically. A misconfigured S3 bucket is misconfigured. An over-permissioned IAM role stays over-permissioned until someone changes it. The thing being monitored holds still long enough to be characterized, baselined, and alerted on when it deviates. The runtime is stable. The risk is structural.

AI systems don’t hold still. The model a team evaluated last quarter may not be the model running today; providers update weights, change behaviors, and deprecate versions without the change notifications that would trigger a security review. The data flowing through a RAG pipeline changes as the underlying sources change. New tools get connected. OAuth grants accumulate. The system’s behavior evolves continuously, driven by inputs no static analysis ever sees.

A behavioral baseline that makes anomaly detection possible can only be established from production data. The edge cases that reveal unexpected model behavior only surface when real users submit real inputs. The drift that accumulates through provider updates and integration expansions can only be caught by something watching the running system…not by a scanner that assessed it at deployment.

The shift isn’t from no post-deployment security to some. It’s from a model built for stable artifacts to one built for systems that keep changing after the handoff. The tools and the organizational capabilities required to make that shift are still being built.