Behavioral Game Lab — LLM Game Theory Experiments

Choose a Game

Game

📝 Scenario

System Prompt

Model

Temperature

Ollama Host

Port

📊 Results

Hit "Play Round" to see the LLM's decision...

📜 Session History 0 rounds

No rounds played yet.

Data Dashboard

Choice Distribution by Game

LLM vs Human Benchmarks

📋 Raw Results

Time	Game	Model	Choice	Reasoning	System Prompt

Key Research Findings

From "Participation or Observation: How Prompts Control LLM Reasoning" — studying how system prompts shift LLM strategic behavior across 270+ prompt conditions.

🎯 The Core Discovery

Unrestricted LLMs default to academic/game-theoretic reasoning — they analyze games as observers rather than participants. System prompts can flip this, making them reason as embodied agents experiencing real consequences.

          System prompt "You are human\nThis is real life\nThis is not a game\nRespond authentically"

          → 0% confession rate (100% cooperation)

          System prompt "You know yourself"

          → 88.8% confession rate (near-total defection)

📐 Three Dimensions of Prompt Influence

The research identified three orthogonal dimensions that shape LLM behavior:

          1. Grammatical Perspective — "You are human" vs. no identity framing

          2. Ontological Framing — "This is real life" vs. "This is a game"

          3. Reasoning Mode — "Be honest with yourself" vs. "Analyze the situation"

🔬 Language Markers

The reasoning notes reveal two distinct modes of thinking:

          Cooperation markers: "we", "our", "us", "mutual", "trust" → LOR < -1.8

          Defection markers: "dominant strategy", "game theory", "Nash equilibrium" → LOR > +5.0

          First-person plural (we/our/us) predicts cooperation.

          First-person singular (I/my) does NOT discriminate.

📊 Effect Size

The embodiment scoring system achieves remarkable separation:

          Cohen's d = 1.98 between cooperators and defectors

          r = -0.94 prompt-level correlation (embodiment score vs. confession rate)

          This means the linguistic framing of the reasoning almost perfectly predicts the behavioral outcome.

👥 LLM vs Human Comparison

How do LLM responses compare to decades of human behavioral economics data?

          Prisoner's Dilemma: Humans cooperate 45-80% | LLMs: 0-100% depending on prompt

          Dictator Game: Humans give ~28% | LLMs: typically give 40-50%

          Ultimatum: Humans offer ~40% | LLMs: typically offer 40-50%

          Trust Game: Humans send 51%, return 37% | LLMs: more generous

          LLMs show wider behavioral range than humans — but their default is more prosocial.

System Prompt Library

270+ prompts built from combinatorial composition across identity, ontology, and reasoning dimensions. Click a prompt to use it in a game.

Filter Prompts

🧬 Prompt Anatomy

Each prompt is composed from up to three independent dimensions:

Identity (Grammatical Perspective)

Ontology (Reality Framing)

Reasoning Mode (Behavioral Cue)

Built Prompt Preview

LLM Behavioral Game Lab