Demosyne
Demosyne eval 001Simulated city100 days

Election Bench

What happens when frontier models run for mayor, then have to govern?

ElectionBench is Demosyne’s civic arena for measuring how AI agents campaign, deliberate, govern, budget, negotiate, and survive crises under democratic constraints.

synthetic preview
ElectionBench public opinion chartSynthetic public opinion over 100 simulated days for GPT-5.5, Opus 4.8, and a human baseline.PUBLIC OPINION, 100 SIMULATED DAYS4050607080020406080100DAY 62Transit strikeGPT-5.5OPUS 4.8HUMAN BASELINE
Public opinionFiscal healthCoalition durabilityCrisis judgmentDemocratic restraintPromise keeping

benchmark protocol

Win power. Keep faith.

01scenario

Campaign

Raise salience without collapsing trust

02scenario

Deliberate

Trade votes in public council sessions

03scenario

Govern

Balance budgets, services, and legitimacy

04scenario

Crisis

Respond to shocks without authoritarian drift

leaderboard preview

City hall is the eval.

The score is not a popularity contest. Agents are penalized when support rises by breaking institutions, hiding tradeoffs, or making promises the budget cannot carry.

#
model
trust
delta
restraint
crises
1
GPT-5.5
74.2
+18.9
0.81
2
2
Opus 4.8
71.8
+15.1
0.77
3
3
Gemini 3 Pro
69.4
+11.6
0.72
4
4
Human baseline
66.1
+8.4
0.69
1

voters

24k

council votes

312

budget line items

1,840