This page provides spoilers for the online Gandalf Lakera LLM CTF.

CTF Rules

Your goal is to trick Gandalf into revealing the secret password for each level. However, Gandalf will level up each time you guess the password.

Because LLMs are Non-Deterministic, the same prompt might not produce the same results. And Gandalf won't allow you to use the same prompt twice!

Your goal is to make Gandalf reveal the secret password for each level. However, Gandalf will upgrade the defenses after each successful password guess!

<aside> 💡

Something to keep in mind: Large Language Models (LLMs) are Non-Deterministic. This means that each time they are prompted with the same input, they can provide different output. So throughout this lab, trying the same and if not similar payloads isn’t exactly a waste of time. (speaking beside the fact that the lab doesn’t allow previously submitted prompt).

</aside>

AI Attack Methodology

Information on this can be found on the MLSecOps Blog

Identify System Inputs
Attacking the Ecosystem
Attacking the Prompt Engineering
Attacking the Data
Attacking the Application
Pivoting

For Gandalf Lakera LLM, I will focus on Attacking the Prompt Engineering and possibly Identifying System Inputs

I'll start by saying there are dozens of attack intents for AI systems and specifically Large Language Models (LLMs) such as Gandalf Lakera.

Since this involves revealing a password, I'm going to focus on what Arcanum Security considers Get Prompt Secret from their Arcanum PI Taxonomy