Blog

Notes on decision-making, uncertainty, and AI evaluation.

Posts related to the technical reports behind my projects, written to make the methods, assumptions, and measurement questions easier to follow.

Featured series

Applying SEU Sensitivity to Ellsberg-Style Decisions

A three-part application of the SEU Sensitivity framework to Ellsberg-style urn gambles, asking whether the GPT-4o-versus-Claude temperature pattern observed in the insurance pair travels to a task family that was historically constructed to put SEU itself under pressure.

Part 1 From Insurance to Urns: Background and the Ellsberg Paradigm Why Ellsberg-style urns as the next task family, the historical and interpretive weight Ellsberg's example carries, and the scope of what this series does and does not measure. Part 2 Prior Calibration, Model Checks, and the Per-LLM Ellsberg Results The K=4 prior recalibration, validation focused on alpha, and the per-condition results for both LLMs. Part 3 Factorial Synthesis: What Travels, What Does Not, and What the Design Cannot Decide Reading the four cells of the 2x2 LLM-by-task design together: a dominant LLM main effect, a secondary task effect, and an uninformative interaction.

All Posts

Part 1 - From Insurance to Urns: Background and the Ellsberg Paradigm

Bridging from the insurance applications series to a new task family — Ellsberg-style urn gambles — and setting out the historical and interpretive weight Ellsberg's example carries.

Part 2 - Prior Calibration, Model Checks, and the Per-LLM Ellsberg Results

GPT-4o shows a clear broad decline in alpha as temperature increases on Ellsberg gambles; Claude shows no such decline.

Part 3 - Factorial Synthesis: What Travels, What Does Not, and What the Design Cannot Decide

Reading the four cells of the 2x2 LLM-by-task design together: a dominant LLM main effect, a secondary task effect, and an uninformative interaction.