Threat modelling generative AI

Evaluate security risk of your gen AI model workload with threat modelling.

Firstly, you should be aware of the most crtical vulnerabilities found in applications using LLMs. This information is useful for developers, data scientists, and security experts tasked with designing and building applications and plug-ins leveraging LLM technologies. I have written an article about LLM03: Training Data Poisoning

What is a threat model?

These are questions important to threat modelling, taken from Shostack's 4 Question Frame for Threat Modelling:

What are we working on?
What can go wrong?
What are we going to do about it?
Did we do a good job?

Mapping out threats and vulnerabilities to you LLM application can build the foundation to justify development of mitigations. Being able to trace a technical security control to a threat will be important aspect to your DevSecOps / LLM development program.

Threat modelling AI/ML systems:

Microsoft quotes:

Assume compromise/poisoning of the data you train from as well as the data provider. Learn to detect anomalous and malicious data entries as well as being able to distinguish between and recover from them

Threat Modelling Scenario

Given my background in cyber security for critical infrastructure, I have selected the following scenario:

LLM-Powered Critical Infrastructure Control with User Prompts - A power grid operator is developing an LLM-powered system to assist human operators in managing the power grid. Operators can use natural language prompts to request actions, such as adjusting power output, switching loads, or analysing grid conditions.

What are we working on?

An LLM-powered system for assisting power grid operators through natural language prompts.
Key components: natural language processing, LLM model, grid control system, human-machine interface.

Let's try and design a data flow diagram:

This has been simplified to the following elements in their trust boundaries:

External prompt sources (e.g., users, website content, email bodies, etc.)
The LLM model itself (e.g., GPT4, LLaMA, Claude, etc.)
Server-side functions (e.g., LangChain tools and components)
Private data sources (e.g., internal documents or databases)

TB01: External Endpoints and LLM

Trust boundary TB01 is bidirectional and requires strict controls. Not only can users manipulate inputs to the LLM, but the LLM's output can also be harmful if not properly sanitised.

TB02 and TB03: LLM and Application/Data

Both trust boundaries are critical for LLM-based applications. Proper controls must prevent the LLM from directly influencing system functions or accessing sensitive data.

Model Threats with STRIDE

Now you can start with the threat enumeration phase.

STRIDE

Threat

Mitigation

Spoofing

Authenticity

Tampering

Integrity

Repudiation

Non-repudiation

Information disclosure

Confidentiality

Denial of service

Availability

Elevation of privilege

Authorisation

STRIDE is the industry accepted framework for threat modelling. The UK government even recommend using this as mentioned in their published guidance for threat modelling indented for "Secure Connected Places", which is relevant to this scenario.

I will be creating a table with identified threats, mitigations, and STRIDE reference. In reality, there could be more to this like assigning priority, defining strengths (and weaknesses)

We will also be uses "threat statements" with the following format:

A [threat source] with [pre-requisites], can [threat action], which can lead to [threat impact], negatively impacting [goal] of [impacted assets].

Examples:

Threat statement

Mitigation

A malicious actor with knowledge of the system's prompt format and LLM capabilities, can craft malicious prompts, which can lead to unauthorised system actions, negatively impacting the reliability and safety of the power grid.

Ensure the LLM isn’t trained on non-public or confidential data. Additionally, treat all LLM output as untrusted and enforce necessary restrictions to data or actions the LLM requests.

A user who can specify API parameters like temperature / length / model, can modify model parameters, which can lead to unexpected or incorrect system outputs, negatively impacting the reliability and accuracy of the power grid control system.

Limit the API attack surface that is exposed to external prompt sources (users, website content, email bodies, etc.). As always, treat external input as untrusted and apply filtering where appropriate.

A system operator with access to sensitive grid information, can inadvertently disclose this information through prompts to the LLM, which can lead to unauthorised access and misuse of the data, negatively impacting the confidentiality and integrity of the power grid.

This is a user behaviour problem but we can educate users via statements presented during the signup process and through clear, consistent notifications every time a user connects to the LLM.

An LLM with limitations in filtering sensitive information, when provided with sensitive data within a prompt, can potentially expose that information within its response, leading to unauthorised disclosure and misuse of data, negatively impacting the confidentiality of sensitive information.

Do not train LLMs on sensitive data that all users should not have access to. Additionally, do not rely on LLMs to enforce authorization controls to data sources. Instead, apply those controls to the data sources themselves.

A malicious actor can manipulate prompt inputs, exploiting vulnerabilities in the LLM, to generate specific, potentially harmful outputs, leading to system compromise or unauthorized actions, negatively impacting the system's integrity and availability.

Treat all LLM output as untrusted and apply appropriate restrictions prior to using results as input to additional functions. This will mitigate the impact a maliciously crafted prompt could have on functions and services within the internal environment.

Other resources:

PreviousPoisoning Models Nexto1 coding capabilities

Last updated 10 months ago

Was this helpful?