2026 - April 16th - Initial Post
Last updated May 6th 2026
This post shares some brief thoughts on what I’m going to call Deliberately Defensive Hallucinations (DDH).
It is no secret that Artificial Intelligence can create hallucinations. i.e. Robots will bullshit you from time to time, just Humans will.
What if the bullshit was by design?
Although I’m work-shopping the name and proof of concepts, the theory behind creating misleading information to mitigate attacks, I believe, is worth exploring.
What if during an attack or penetration test of an AI system, the system:
This process of intentionally misleading those attacking AI systems can place a tremendous tax on threat actors.