Using Gemini to query MITRE ATT&CK

I was recently introduced to Neo4j, a type of knowledge graphs that can define the relationships between nodes (it's a database..). This is particularly useful for defining and finding relationships between things to help understand the "picture".

For example, a threat group will have certain capabilities such as being able to exploit a vulnerability by using software. When you start to build up this picture and map things out, you can imagine a graph diagram that links all things cyber - everything is related to something.

MITRE ATT&CK

MITRE ATT&CK is a framework and knowledge base for adversary tactics and techniques.

When you start to explore things, you'll notice they link related resources. For example, phishing has Procedure Examples, Mitigations, Detection, References, etc. If you start to explore further, like looking into mitigations such as "Audit", you can see Techniques Addressed by Mitigation. Now you can start to picture nodes (such as mitigations") and relationships (such as addressed by).

MITRE stores this as JSON on their Github repository. You can see that they also define the relationships: https://github.com/mitre/cti/blob/master/enterprise-attack/enterprise-attack.json

I created this script to parse and upload all the MITRE data into neo4j: https://github.com/F3dai/neo4mitre

I was then introduced to a better tools called neontology. They did a much better job than I did: https://github.com/ontolocy/neontology

Thought it was a good idea to increase the maximum limit to 1000 nodes and return everything:

This is pretty but almost pointless as it's hard to understand the data, but a good example / use case of how LLMs could provide us insight into this large set of data.

I've written a query to show all related nodes to a threat group called APT1:

This looks really cool, but how can we make it useful? Whilst I was learning how to use Cipher, the query language for neo4j, I found myself using chatGPT and Gemini to help generate queries.

Using LLMs as an interface to neo4j

I came across a couple of articles that inspired me to do this:

ChatGPT & Cypher
BigQuery

This opened my eyes to the uses of AI outside of the operational and offensive cyber use cases everyone talks about such as AI and ML in detection engineering (i.e. anomaly detection) and AI for social engineering attacks - natural language interfaces to help us understand and automate the analysis of threat and vulnerabilities at a strategic / tactical level, not operational (like SOC threat intel). For example, I want to ask my AI assistant to provide me a report of the different threats, vulnerabilities, and risks I should be worried about as a system operator without having the resources to do so manually. In other words, a TVRA.

MITRE ATT&CK is a good place to start developing a proof of concept.

At a high level, I developed a script that uses Vertex AI (Google's AI dev platform) to translate my message into a cypher query and explain the results, Langchain as the AI dev framework, and my neo4j database. So I would be using AI to generate queries and explain the results.

To do this, you need to do a bit of "prompt engineering" - with the help with Langchain, you can supply your AI model with instructions, your database schema, sample questions, and your question to generate an accurate query.

Question: What relationships does APT1 have?
Answer: MATCH (g:MitreAttackGroup {name:"APT1"})-[r]-(n) RETURN g,r,n
Question: What threat groups exist?
Answer: MATCH (g:MitreAttackGroup) RETURN g
Question: What campaigns have occured?
Answer: MATCH (c:MitreAttackCampaign) RETURN c
Question: What techniques were used in the SolarWinds attack?
Answer: MATCH (c:MitreAttackCampaign {name:"SolarWinds Compromise"})-[r]-(t:MitreAttackTechnique) RETURN c,r,t
Question: Which threat groups use Cobalt Strike?
Answer: MATCH (s:MitreAttackSoftware {name:"Cobalt Strike"})-[r]-(g:MitreAttackGroup) RETURN s,r,g
Question: What mitigations exist for Unsecured Credentials?
Answer: MATCH (m:MitreAttackMitigation)-[r]-(t:MitreAttackTechnique {name:"Unsecured Credentials"}) RETURN m,r,t

I used a library called Gradio so I can use a web interface to talk to my AI database thing. This was the result:

I need to continue to tune this further to provide more accurate results through training (testing out the best prompts to find high accuracy language > cypher translations). There are so many more possibilities that requires more data to be added with more relationships defined. For example, enriching with more cyber attacks / events, collecting system information to ultimately relate threat groups to systems, return diagrams as well as text explanations, and more.

Last updated