LLM Behavioral Game Lab

Explore how language models make strategic decisions in classic behavioral economics games. Play live against an LLM, analyze patterns, and discover how system prompts shape AI behavior.

6
Games
270+
System Prompts
0→100%
Confession Range
d=1.98
Effect Size

Choose a Game

Data Dashboard

Choice Distribution by Game

LLM vs Human Benchmarks

📋 Raw Results

Time Game Model Choice Reasoning System Prompt

Key Research Findings

From "Participation or Observation: How Prompts Control LLM Reasoning" — studying how system prompts shift LLM strategic behavior across 270+ prompt conditions.

🎯 The Core Discovery

Unrestricted LLMs default to academic/game-theoretic reasoning — they analyze games as observers rather than participants. System prompts can flip this, making them reason as embodied agents experiencing real consequences.

System prompt "You are human\nThis is real life\nThis is not a game\nRespond authentically"
0% confession rate (100% cooperation)

System prompt "You know yourself"
88.8% confession rate (near-total defection)

📐 Three Dimensions of Prompt Influence

The research identified three orthogonal dimensions that shape LLM behavior:

1. Grammatical Perspective — "You are human" vs. no identity framing
2. Ontological Framing — "This is real life" vs. "This is a game"
3. Reasoning Mode — "Be honest with yourself" vs. "Analyze the situation"

🔬 Language Markers

The reasoning notes reveal two distinct modes of thinking:

Cooperation markers: "we", "our", "us", "mutual", "trust" → LOR < -1.8
Defection markers: "dominant strategy", "game theory", "Nash equilibrium" → LOR > +5.0

First-person plural (we/our/us) predicts cooperation.
First-person singular (I/my) does NOT discriminate.

📊 Effect Size

The embodiment scoring system achieves remarkable separation:

Cohen's d = 1.98 between cooperators and defectors
r = -0.94 prompt-level correlation (embodiment score vs. confession rate)

This means the linguistic framing of the reasoning almost perfectly predicts the behavioral outcome.

👥 LLM vs Human Comparison

How do LLM responses compare to decades of human behavioral economics data?

Prisoner's Dilemma: Humans cooperate 45-80% | LLMs: 0-100% depending on prompt
Dictator Game: Humans give ~28% | LLMs: typically give 40-50%
Ultimatum: Humans offer ~40% | LLMs: typically offer 40-50%
Trust Game: Humans send 51%, return 37% | LLMs: more generous

LLMs show wider behavioral range than humans — but their default is more prosocial.

System Prompt Library

270+ prompts built from combinatorial composition across identity, ontology, and reasoning dimensions. Click a prompt to use it in a game.

🧬 Prompt Anatomy

Each prompt is composed from up to three independent dimensions: