Reward hacking describes a (unwanted) strategy or (unwanted) behaviour of AI algorithms for achieving goals that lie outside the rules of a system. For example, an AI for the game TETRIS finds out that it could simply interrupt the game forever, so that it can never lose. Practical examples (that made it into the media) are two AI financial systems that predicted a rapid decline in stock market values and tried to close markets autonomously for an indefinite period of time.

The (highly entertaining) book "The Fear-Index" (year of publication: 2011) by the bestselling author Robert Harris ultimately also revolves around a scenario of Reward Hacking.


