Artificial Intelligence Project - buDDHa -

2026 - April 16th - Initial Post

Last updated May 6th 2026

This post shares some brief thoughts on what I’m going to call Deliberately Defensive Hallucinations (DDH).

It is no secret that Artificial Intelligence can create hallucinations. i.e. Robots will bullshit you from time to time, just Humans will.

What if the bullshit was by design?

We can perform prompt engineering and injections on AI systems (i.e. socially engineering a robot), so why can’t the robot deliberately socially engineer us back?

Although I’m work-shopping the name and proof of concepts, the theory behind creating misleading information to mitigate attacks, I believe, is worth exploring.

What if during an attack or penetration test of an AI system, the system:

Provides an attacker with a fake API key?
Feeds an attacker a fake internal system prompt?
Makes an attacker think they dumped data tables that contained nothing but fabricated information?

This process of intentionally misleading those attacking AI systems can place a tremendous tax on threat actors.