---
title: "Prior Predictive Analysis"
subtitle: "Foundational Report 3"
description: |
Analysis of the prior predictive distribution for model m_0, examining
what range of choice behaviors the prior permits before seeing any data.
categories: [foundations, validation, m_0]
execute:
cache: true
---
```{python}
#| label: setup
#| include: false
import sys
import os
# Add parent directories to path
sys.path.insert(0, os.path.join(os.getcwd(), '..'))
project_root = os.path.dirname(os.path.dirname(os.getcwd()))
sys.path.insert(0, project_root)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
# Set random seed for reproducibility
np.random.seed(42)
```
## Introduction
Having established the abstract formulation ([Report 1](01_abstract_formulation.qmd)) and its concrete implementation ([Report 2](02_concrete_implementation.qmd)), we now examine the **prior predictive distribution**: what range of choice behaviors does the model permit before observing any data?
Prior predictive analysis serves several purposes:
1. **Prior validation**: Do the priors produce sensible behaviors?
2. **Model understanding**: What parameter combinations are a priori plausible?
3. **Experimental design**: Is the design rich enough to distinguish different behaviors?
::: {.callout-note}
## The Prior Predictive Distribution
The prior predictive distribution is the distribution over observables (choices) induced by:
1. Drawing parameters from the prior: $(\alpha, \boldsymbol{\beta}, \boldsymbol{\delta}) \sim p(\theta)$
2. Computing derived quantities: $\boldsymbol{\psi}, \boldsymbol{\upsilon}, \boldsymbol{\eta}, \boldsymbol{\chi}$
3. Simulating choices: $y_m \sim \text{Categorical}(\boldsymbol{\chi}_m)$
This gives us a distribution over possible datasets *before* conditioning on actual observations.
:::
## Study Design
We use a moderately complex study design to illustrate the prior predictive distribution:
```{python}
#| label: study-design-config
#| echo: true
# Study design configuration
config = {
"M": 25, # Number of decision problems
"K": 3, # Number of consequences
"D": 5, # Feature dimensions
"R": 15, # Number of distinct alternatives
"min_alts_per_problem": 2, # Minimum alternatives per problem
"max_alts_per_problem": 5, # Maximum alternatives per problem
"feature_dist": "normal", # Feature distribution
"feature_params": {"loc": 0, "scale": 1}
}
print(f"Study Design Configuration:")
print(f" Decision problems (M): {config['M']}")
print(f" Consequences (K): {config['K']}")
print(f" Feature dimensions (D): {config['D']}")
print(f" Distinct alternatives (R): {config['R']}")
print(f" Alternatives per problem: {config['min_alts_per_problem']}-{config['max_alts_per_problem']}")
```
::: {.callout-note}
## Design Rationale
The dimensions are chosen purposefully:
- **K=3 consequences**: The minimal interesting case for ordered utilities, allowing meaningful intermediate preferences
- **D=5 features**: Provides sufficient variation for diverse feature-to-probability mappings without excessive dimensionality
- **M=25 problems**: Represents a plausible small-study scenario with reasonable statistical power
- **R=15 alternatives**: Creates varied choice sets while keeping computational burden manageable
- **2–5 alternatives per problem**: Reflects realistic variation in typical behavioral experiments
Note that prior predictive distributions depend on study design; different designs may yield different behavioral ranges. Report 4 examines whether this design supports accurate parameter recovery.
:::
```{python}
#| label: generate-study-design
#| output: false
from utils.study_design import StudyDesign
# Create and generate the study design
study = StudyDesign(
M=config["M"],
K=config["K"],
D=config["D"],
R=config["R"],
min_alts_per_problem=config["min_alts_per_problem"],
max_alts_per_problem=config["max_alts_per_problem"],
feature_dist=config["feature_dist"],
feature_params=config["feature_params"],
design_name="prior_analysis"
)
study.generate()
```
```{python}
#| label: design-summary
#| echo: false
# Extract design properties
w = np.array(study.w)
I = np.array(study.I)
N = I.sum(axis=1)
print(f"\nGenerated Study Design:")
print(f" Total alternatives across problems: {I.sum()}")
print(f" Alternatives per problem: min={N.min()}, max={N.max()}, mean={N.mean():.1f}")
print(f" Alternative appearances: min={I.sum(axis=0).min()}, max={I.sum(axis=0).max()}, mean={I.sum(axis=0).mean():.1f}")
```
### Design Characteristics
```{python}
#| label: fig-design-characteristics
#| fig-cap: "Study design characteristics. Left: Number of alternatives in each decision problem. Right: Frequency with which each alternative appears across problems."
M = config["M"]
R = config["R"]
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Alternatives per problem
axes[0].bar(range(1, M+1), N, color='steelblue', alpha=0.7)
axes[0].axhline(y=N.mean(), color='red', linestyle='--', label=f'Mean = {N.mean():.1f}')
axes[0].set_xlabel('Decision Problem')
axes[0].set_ylabel('Number of Alternatives')
axes[0].set_title('Alternatives per Problem')
axes[0].legend()
# Alternative frequency
alt_freq = I.sum(axis=0)
axes[1].bar(range(1, R+1), alt_freq, color='coral', alpha=0.7)
axes[1].axhline(y=alt_freq.mean(), color='red', linestyle='--', label=f'Mean = {alt_freq.mean():.1f}')
axes[1].set_xlabel('Alternative')
axes[1].set_ylabel('Frequency (# problems)')
axes[1].set_title('Alternative Appearance Frequency')
axes[1].legend()
plt.tight_layout()
plt.show()
```
## Prior Predictive Simulation Using m_0_sim.stan
We use the simulation model `m_0_sim.stan` to generate prior predictive samples. This ensures that our analysis matches exactly how the model generates data, including all numerical details of the Stan implementation.
```{python}
#| label: run-prior-predictive
#| output: false
from analysis.prior_predictive import PriorPredictiveAnalysis
import tempfile
# Create a temporary output directory for this analysis
output_dir = tempfile.mkdtemp(prefix="prior_pred_")
# Prior predictive configuration
# - 200 parameter configurations: sufficient for central statistics (MC SE ≈ 0.04 at SD=1)
# - 5 choice samples per config: captures choice variability given parameters
# - Total: 1,000 samples for behavioral diagnostics
n_param_samples = 200
n_choice_samples = 5
# Run prior predictive analysis using m_0_sim.stan
analysis = PriorPredictiveAnalysis(
model_path=None, # Uses default m_0_sim.stan
study_design=study,
output_dir=output_dir,
n_param_samples=n_param_samples,
n_choice_samples=n_choice_samples
)
# Run the analysis (this calls Stan)
samples = analysis.run()
```
```{python}
#| label: samples-info
#| echo: false
print(f"Prior Predictive Simulation Complete:")
print(f" Parameter configurations sampled: {samples['param_set'].nunique()}")
print(f" Total samples: {len(samples)}")
print(f" Columns available: {len(samples.columns)}")
```
## Prior Distribution of Parameters
### Sensitivity Parameter ($\alpha$)
The sensitivity parameter is sampled from a Lognormal(0, 1) prior in `m_0_sim.stan`:
```{python}
#| label: fig-alpha-prior
#| fig-cap: "Prior distribution of the sensitivity parameter α sampled from m_0_sim.stan. The distribution has median ≈ 1 with substantial probability mass on both low sensitivity (random-like choice) and high sensitivity (near-optimal choice)."
# Extract unique alpha values (one per parameter set)
alpha_samples = samples.groupby('param_set')['alpha'].first().values
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Histogram with truncation for visualization
alpha_plot = alpha_samples[alpha_samples < 15]
axes[0].hist(alpha_plot, bins=40, density=True, alpha=0.7, color='steelblue', edgecolor='white')
axes[0].axvline(x=np.median(alpha_samples), color='red', linestyle='--', linewidth=2,
label=f'Median = {np.median(alpha_samples):.2f}')
axes[0].axvline(x=1, color='orange', linestyle=':', linewidth=2, label='α = 1')
axes[0].set_xlabel('Sensitivity (α)')
axes[0].set_ylabel('Density')
axes[0].set_title('Prior Distribution of α')
axes[0].legend()
axes[0].set_xlim(0, 15)
# Log scale to see full distribution
axes[1].hist(np.log10(alpha_samples + 0.01), bins=40, density=True, alpha=0.7,
color='steelblue', edgecolor='white')
axes[1].axvline(x=0, color='red', linestyle='--', linewidth=2, label='α = 1')
axes[1].set_xlabel('log₁₀(α)')
axes[1].set_ylabel('Density')
axes[1].set_title('Prior Distribution of α (log scale)')
axes[1].legend()
plt.tight_layout()
plt.show()
# Compute Monte Carlo standard errors
n_alpha = len(alpha_samples)
mc_se_mean = np.std(alpha_samples) / np.sqrt(n_alpha)
mc_se_median = 1.253 * np.std(alpha_samples) / np.sqrt(n_alpha) # Asymptotic SE for median
print(f"\nAlpha summary statistics (n={n_alpha} samples):")
print(f" Mean: {np.mean(alpha_samples):.2f} (MC SE: {mc_se_mean:.2f})")
print(f" Median: {np.median(alpha_samples):.2f} (MC SE: {mc_se_median:.2f})")
print(f" Std: {np.std(alpha_samples):.2f}")
print(f" 95% interval: [{np.percentile(alpha_samples, 2.5):.2f}, {np.percentile(alpha_samples, 97.5):.2f}]")
```
The lognormal prior covers a wide range of sensitivity values:
- **Low α (< 0.5)**: Near-random choice, weak sensitivity to SEU
- **Moderate α (0.5–3)**: Probabilistic choice with clear SEU preference
- **High α (> 5)**: Near-deterministic choice, strong SEU maximization
### Utility Parameters
```{python}
#| label: fig-utility-prior
#| fig-cap: "Prior distribution of utilities under Dirichlet(1,1) as sampled from m_0_sim.stan. Left: Distribution of the middle utility υ₂. Right: Distribution of utility increments δ."
# Extract unique delta and upsilon values
delta_cols = [col for col in samples.columns if col.startswith('delta[')]
upsilon_cols = [col for col in samples.columns if col.startswith('upsilon[')]
# Get one value per parameter set
param_samples = samples.groupby('param_set').first()
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Middle utility distribution (upsilon[2])
if 'upsilon[2]' in param_samples.columns:
upsilon_2 = param_samples['upsilon[2]'].values
axes[0].hist(upsilon_2, bins=40, density=True, alpha=0.7, color='forestgreen', edgecolor='white')
axes[0].axvline(x=0.5, color='red', linestyle='--', linewidth=2, label=f'Mean ≈ 0.5')
axes[0].set_xlabel('Middle Utility (υ₂)')
axes[0].set_ylabel('Density')
axes[0].set_title('Prior Distribution of υ₂')
axes[0].legend()
axes[0].set_xlim(0, 1)
# Delta distribution
if delta_cols:
delta_1 = param_samples[delta_cols[0]].values
delta_2 = param_samples[delta_cols[1]].values if len(delta_cols) > 1 else 1 - delta_1
axes[1].scatter(delta_1, delta_2, alpha=0.3, s=20, c='forestgreen')
axes[1].plot([0, 1], [1, 0], 'r--', linewidth=2, label='δ₁ + δ₂ = 1')
axes[1].set_xlabel('δ₁ (first increment)')
axes[1].set_ylabel('δ₂ (second increment)')
axes[1].set_title('Prior Distribution on Utility Simplex')
axes[1].set_xlim(-0.05, 1.05)
axes[1].set_ylim(-0.05, 1.05)
axes[1].legend()
axes[1].set_aspect('equal')
plt.tight_layout()
plt.show()
```
The symmetric Dirichlet(1,1) prior induces a **uniform distribution** over the simplex of valid utility configurations.
### Feature-to-Probability Weights ($\boldsymbol{\beta}$)
The matrix $\boldsymbol{\beta} \in \mathbb{R}^{K \times D}$ maps alternative features to (unnormalized) log-probabilities over consequences. In `m_0_sim.stan`, each element is drawn independently from $\text{Normal}(0, \sigma_\beta)$:
```{python}
#| label: fig-beta-prior
#| fig-cap: "Prior distribution of the feature-to-probability weights β sampled from m_0_sim.stan. Left: Distribution of all β coefficients. Right: Heatmap of median β values by position."
# Extract beta coefficients
beta_cols = [col for col in samples.columns if col.startswith('beta[')]
# Get unique values per parameter set
param_samples = samples.groupby('param_set').first()
# Collect all beta values
all_betas = []
for col in beta_cols:
all_betas.extend(param_samples[col].values)
all_betas = np.array(all_betas)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Distribution of all beta coefficients
axes[0].hist(all_betas, bins=50, density=True, alpha=0.7, color='purple', edgecolor='white')
# Overlay theoretical normal
x_range = np.linspace(all_betas.min(), all_betas.max(), 100)
axes[0].plot(x_range, stats.norm.pdf(x_range, 0, 1), 'r-', linewidth=2, label='Normal(0,1)')
axes[0].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[0].set_xlabel('β coefficient value')
axes[0].set_ylabel('Density')
axes[0].set_title('Prior Distribution of β Coefficients')
axes[0].legend()
# Heatmap of median beta values (not absolute value)
K, D = config['K'], config['D']
beta_medians = np.zeros((K, D))
for k in range(K):
for d in range(D):
col = f'beta[{k+1},{d+1}]'
if col in param_samples.columns:
beta_medians[k, d] = np.median(param_samples[col].values)
im = axes[1].imshow(beta_medians, cmap='RdBu_r', aspect='auto', vmin=-1, vmax=1)
axes[1].set_xlabel('Feature Dimension (d)')
axes[1].set_ylabel('Consequence (k)')
axes[1].set_title('Median β by Position')
axes[1].set_xticks(range(D))
axes[1].set_xticklabels([f'{d+1}' for d in range(D)])
axes[1].set_yticks(range(K))
axes[1].set_yticklabels([f'{k+1}' for k in range(K)])
plt.colorbar(im, ax=axes[1], label='Median β')
plt.tight_layout()
plt.show()
print(f"\nBeta coefficient summary:")
print(f" Mean: {np.mean(all_betas):.3f}")
print(f" Std: {np.std(all_betas):.3f}")
print(f" 95% interval: [{np.percentile(all_betas, 2.5):.2f}, {np.percentile(all_betas, 97.5):.2f}]")
```
The prior on $\boldsymbol{\beta}$ is deliberately uninformative—it does not favor any particular feature-consequence relationship. This allows the data to determine how features predict consequences. See [Report 2](02_concrete_implementation.qmd) for the interpretation of the $\text{Normal}(0,1)$ prior in terms of odds ratios (log-odds differences up to $\pm 2$, corresponding to probability ratios up to $e^2 \approx 7.4$).
### Subjective Probabilities Over Consequences ($\boldsymbol{\psi}$)
The weights $\boldsymbol{\beta}$ interact with alternative features to produce **subjective probabilities** over consequences via the softmax transformation:
$$
\psi_{rk} = \frac{\exp(\boldsymbol{\beta}_k^\top \mathbf{x}_r)}{\sum_{k'=1}^K \exp(\boldsymbol{\beta}_{k'}^\top \mathbf{x}_r)}
$$
We compute $\boldsymbol{\psi}$ for each of the $R$ distinct alternatives from the sampled $\boldsymbol{\beta}$ values.
```{python}
#| label: fig-psi-prior
#| fig-cap: "Mean subjective probability for each distinct alternative and consequence, averaged over all prior draws."
# Compute psi from beta for the R distinct alternatives
# (The psi in Stan's generated quantities is vectorized over problems, causing repetition)
K = config['K']
R = config['R']
D = config['D']
# Feature matrix for distinct alternatives
w_array = np.array(study.w) # Shape: (R, D)
# Get unique parameter sets
param_sets = samples['param_set'].unique()
n_param_samples = len(param_sets)
# For each parameter sample, compute psi for all R distinct alternatives
# Shape: (n_param_samples, R, K)
psi_array = np.zeros((n_param_samples, R, K))
for p_idx, param_set in enumerate(param_sets):
# Get one row from this param_set (parameters are same within a set)
row = samples[samples['param_set'] == param_set].iloc[0]
# Extract beta matrix
beta = np.zeros((K, D))
for k in range(K):
for d in range(D):
col = f'beta[{k+1},{d+1}]'
if col in samples.columns:
beta[k, d] = row[col]
# Compute psi for each distinct alternative
for r in range(R):
logits = beta @ w_array[r] # K-dimensional
psi_array[p_idx, r] = np.exp(logits - np.max(logits)) # numerically stable softmax
psi_array[p_idx, r] /= psi_array[p_idx, r].sum()
# Compute mean psi across samples for each (alternative, consequence)
mean_psi = psi_array.mean(axis=0) # Shape: (R, K)
all_psi_values = psi_array.flatten()
fig, ax = plt.subplots(figsize=(10, 4))
# Heatmap of mean psi for each distinct alternative and consequence
im = ax.imshow(mean_psi.T, cmap='YlOrRd', aspect='auto', vmin=0, vmax=1)
ax.set_xlabel('Distinct Alternative (r)')
ax.set_ylabel('Consequence (k)')
ax.set_title('Mean ψ by Alternative and Consequence')
ax.set_yticks(range(K))
ax.set_yticklabels([f'{k+1}' for k in range(K)])
# Use odd numbers for x-axis labels (1, 3, 5, 7, ...)
xtick_positions = list(range(0, R, 2)) # 0, 2, 4, 6, ... (indices)
ax.set_xticks(xtick_positions)
ax.set_xticklabels([f'{x+1}' for x in xtick_positions]) # 1, 3, 5, 7, ... (labels)
plt.colorbar(im, ax=ax, label='Mean ψ')
plt.tight_layout()
plt.show()
print(f"\nSubjective probability summary (from {n_param_samples} prior parameter draws):")
print(f" Mean ψ: {np.mean(all_psi_values):.3f} (theoretical uniform = {1/K:.3f})")
print(f" Std ψ: {np.std(all_psi_values):.3f}")
print(f" P(ψ < 0.1): {np.mean(all_psi_values < 0.1):.3f}")
print(f" P(ψ > 0.9): {np.mean(all_psi_values > 0.9):.3f}")
```
::: {.callout-important}
## Interpretation of Subjective Probabilities
The vector $\boldsymbol{\psi}_r$ represents the decision-maker's **subjective beliefs** about which consequence will obtain if alternative $r$ is chosen. The Normal(0,1) prior on $\boldsymbol{\beta}$ induces a prior on $\boldsymbol{\psi}$ that:
1. **Centers on uniformity** when features are balanced
2. **Allows extreme beliefs** ($\psi_{rk} \approx 0$ or $\approx 1$) when features strongly predict consequences
3. **Varies across alternatives** based on their feature profiles
This captures the idea that alternatives with different feature profiles may induce quite different probability distributions over consequences.
:::
```{python}
#| label: fig-psi-across-alternatives
#| fig-cap: "Heatmaps showing subjective probability distributions across the R distinct alternatives for six individual prior draws. Each heatmap shows how ψ varies across alternatives (columns) and consequences (rows) under a single parameter configuration."
#| fig-height: 8
# Show how psi varies across distinct alternatives for a few parameter draws
# We already computed psi_array above with shape (n_param_samples, R, K)
fig, axes = plt.subplots(2, 3, figsize=(14, 8))
axes = axes.flatten()
# Select 6 diverse parameter draws
np.random.seed(123)
sample_indices = np.random.choice(n_param_samples, min(6, n_param_samples), replace=False)
for idx, (ax, sample_idx) in enumerate(zip(axes, sample_indices)):
# Get psi matrix for this parameter draw (already computed above)
psi_matrix = psi_array[sample_idx] # Shape: (R, K)
# Heatmap
im = ax.imshow(psi_matrix.T, cmap='YlOrRd', aspect='auto', vmin=0, vmax=1)
ax.set_xlabel('Distinct Alternative (r)')
ax.set_ylabel('Consequence')
ax.set_title(f'Prior Draw {idx+1}', fontsize=11)
ax.set_yticks(range(K))
ax.set_yticklabels([f'k={k+1}' for k in range(K)])
# Show fewer x-ticks for readability
n_xticks = min(8, R)
xtick_positions = np.linspace(0, R-1, n_xticks, dtype=int)
ax.set_xticks(xtick_positions)
ax.set_xticklabels([f'{x+1}' for x in xtick_positions])
# Add a shared colorbar
fig.subplots_adjust(right=0.88)
cbar_ax = fig.add_axes([0.91, 0.15, 0.02, 0.7])
fig.colorbar(im, cax=cbar_ax, label='ψ (subjective probability)')
plt.suptitle('Subjective Probability Distributions Under Individual Prior Draws',
fontsize=13, y=0.98)
plt.tight_layout(rect=[0, 0, 0.88, 0.95])
plt.show()
```
The diversity of $\boldsymbol{\psi}$ profiles across alternatives—driven by variation in $\boldsymbol{\beta}$—is what allows the model to capture differences in how decision-makers perceive distinct alternatives.
## Expected Utilities Under the Prior
The expected utilities $\eta_r = \boldsymbol{\psi}_r^\top \boldsymbol{\upsilon}$ depend on both the subjective probabilities ($\boldsymbol{\psi}$, derived from $\boldsymbol{\beta}$) and the utility vector ($\boldsymbol{\upsilon}$).
```{python}
#| label: fig-expected-utilities
#| fig-cap: "Distribution of expected utilities across a subset of decision problems. Each boxplot shows the distribution of η for alternatives in that problem across prior samples."
# Extract problem_etas columns
eta_cols = [col for col in samples.columns if 'problem_etas' in col]
# Get data for first 10 problems
n_problems_to_show = min(10, config["M"])
fig, axes = plt.subplots(2, 5, figsize=(15, 8))
axes = axes.flatten()
for m in range(n_problems_to_show):
ax = axes[m]
# Get columns for this problem (1-indexed in Stan)
problem_cols = [col for col in eta_cols if f'problem_etas[{m+1},' in col]
# Filter out padding values (very negative)
valid_data = []
valid_labels = []
for col in problem_cols:
data = samples[col].values
if np.median(data) > -1e9: # Not padding
valid_data.append(data)
alt_idx = col.split(',')[1].rstrip(']')
valid_labels.append(alt_idx)
if valid_data:
bp = ax.boxplot(valid_data, patch_artist=True)
for patch in bp['boxes']:
patch.set_facecolor('steelblue')
patch.set_alpha(0.7)
ax.set_xticklabels(valid_labels)
ax.set_title(f'Problem {m+1}')
ax.set_xlabel('Alternative')
ax.set_ylabel('η')
ax.set_ylim(0, 1)
plt.suptitle('Prior Distribution of Expected Utilities by Problem', fontsize=14)
plt.tight_layout()
plt.show()
```
## SEU Maximizer Selection
A key diagnostic is the **probability of selecting the SEU-maximizing alternative** under the prior. The simulation model `m_0_sim.stan` tracks this directly via the `selected_seu_max` variable.
::: {.callout-note}
## The Random Baseline
For each decision problem with $N_m$ alternatives, a random chooser would select the SEU maximizer with probability $1/N_m$ (assuming a unique maximizer). We use this as a baseline: values above $1/N_m$ indicate that the prior allows for above-chance SEU-maximizing behavior. The baseline varies across problems because problems have different numbers of alternatives.
:::
### By Decision Problem
```{python}
#| label: fig-seu-max-by-problem
#| fig-cap: "Probability of selecting the SEU-maximizing alternative for each decision problem, computed from m_0_sim.stan samples. The coral bars show the random choice baseline (1/N for each problem)."
# Extract SEU maximizer selection for each problem
seu_max_cols = [col for col in samples.columns if 'selected_seu_max' in col and 'total' not in col]
M = config["M"]
prob_seu_max = np.zeros(M)
for m in range(M):
col = f'selected_seu_max[{m+1}]'
if col in samples.columns:
prob_seu_max[m] = samples[col].mean()
# Random baseline
random_baseline = 1.0 / N
fig, ax = plt.subplots(figsize=(12, 5))
x = np.arange(1, M+1)
width = 0.35
bars1 = ax.bar(x - width/2, prob_seu_max, width, label='Prior Predictive',
color='steelblue', alpha=0.7)
bars2 = ax.bar(x + width/2, random_baseline, width, label='Random Choice Baseline',
color='coral', alpha=0.7)
ax.set_xlabel('Decision Problem')
ax.set_ylabel('P(SEU Maximizer Selected)')
ax.set_title('Probability of Selecting SEU Maximizer by Problem')
ax.set_xticks(x)
ax.legend()
ax.set_ylim(0, 1)
plt.tight_layout()
plt.show()
print(f"\nSummary:")
print(f" Mean P(SEU max): {prob_seu_max.mean():.3f}")
print(f" Mean random baseline: {random_baseline.mean():.3f}")
print(f" Ratio (observed/random): {prob_seu_max.mean() / random_baseline.mean():.2f}x")
```
### Total SEU Maximizers Selected
```{python}
#| label: fig-total-seu-max
#| fig-cap: "Distribution of the total number of SEU maximizers selected (out of 25 problems) across prior samples. This is computed directly by m_0_sim.stan."
# Get total SEU max selected
if 'total_seu_max_selected' in samples.columns:
total_seu_max = samples['total_seu_max_selected'].values
else:
# Compute from individual columns
total_seu_max = np.zeros(len(samples))
for m in range(M):
col = f'selected_seu_max[{m+1}]'
if col in samples.columns:
total_seu_max += samples[col].values
fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(total_seu_max, bins=range(0, M+2), density=True, alpha=0.7,
color='steelblue', edgecolor='white', align='left')
# Expected under random choice
expected_random = sum(1/n for n in N)
ax.axvline(x=expected_random, color='coral', linestyle='--', linewidth=2,
label=f'Expected (random): {expected_random:.1f}')
ax.axvline(x=total_seu_max.mean(), color='red', linestyle='-', linewidth=2,
label=f'Prior mean: {total_seu_max.mean():.1f}')
ax.set_xlabel(f'Number of SEU Maximizers Selected (out of {M})')
ax.set_ylabel('Density')
ax.set_title('Prior Predictive Distribution of Total SEU Maximizer Selections')
ax.legend()
plt.tight_layout()
plt.show()
print(f"\nTotal SEU Maximizer Selection Summary:")
print(f" Mean: {total_seu_max.mean():.1f}")
print(f" Std: {total_seu_max.std():.1f}")
print(f" Min: {total_seu_max.min():.0f}")
print(f" Max: {total_seu_max.max():.0f}")
print(f" Expected under random: {expected_random:.1f}")
```
### Relationship with $\alpha$
```{python}
#| label: fig-seu-max-vs-alpha
#| fig-cap: "Relationship between sensitivity α and the proportion of SEU maximizers selected. Higher α leads to more consistent SEU maximization, as predicted by the theoretical properties from Report 1."
# Get alpha and total SEU max for each sample
alpha_per_sample = samples['alpha'].values
seu_max_per_sample = total_seu_max / M # Proportion
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# Scatter plot
axes[0].scatter(alpha_per_sample, seu_max_per_sample, alpha=0.2, s=15, c='steelblue')
axes[0].set_xlabel('Sensitivity (α)')
axes[0].set_ylabel('Proportion SEU Max Selected')
axes[0].set_title('SEU Maximizer Selection vs. Sensitivity')
axes[0].set_xlim(0, min(15, np.percentile(alpha_per_sample, 99)))
# Binned averages
alpha_bins = [0, 0.25, 0.5, 1, 2, 3, 5, 10, 100]
bin_centers = []
prop_means = []
prop_stds = []
for i in range(len(alpha_bins) - 1):
mask = (alpha_per_sample >= alpha_bins[i]) & (alpha_per_sample < alpha_bins[i+1])
if mask.sum() > 5:
bin_centers.append((alpha_bins[i] + alpha_bins[i+1])/2)
prop_means.append(seu_max_per_sample[mask].mean())
prop_stds.append(seu_max_per_sample[mask].std())
axes[1].errorbar(bin_centers, prop_means, yerr=prop_stds, fmt='o-',
color='steelblue', linewidth=2, markersize=8, capsize=5)
axes[1].axhline(y=1/np.mean(N), color='coral', linestyle='--',
label=f'Random baseline: {1/np.mean(N):.2f}')
axes[1].set_xlabel('Sensitivity (α)')
axes[1].set_ylabel('Mean Proportion SEU Max Selected')
axes[1].set_title('SEU Maximizer Selection vs. α (Binned)')
axes[1].set_xscale('log')
axes[1].legend()
axes[1].set_ylim(0, 1)
plt.tight_layout()
plt.show()
```
This plot empirically confirms **Property 1 (Monotonicity)** from Report 1: as $\alpha$ increases, the probability of selecting SEU-maximizing alternatives increases.
## Prior Predictive Checks: Are the Priors Reasonable?
Based on the prior predictive analysis using `m_0_sim.stan`, we can assess whether the default priors produce sensible behaviors:
### ✓ The Prior Covers the Full Range of Behaviors
```{python}
#| label: behavior-range-check
# Summarize the range of behaviors
prop_seu_max_by_sample = total_seu_max / M
print("Proportion of SEU Maximizers Selected (across prior samples):")
print(f" Minimum: {prop_seu_max_by_sample.min():.2%}")
print(f" 5th percentile: {np.percentile(prop_seu_max_by_sample, 5):.2%}")
print(f" Median: {np.median(prop_seu_max_by_sample):.2%}")
print(f" 95th percentile: {np.percentile(prop_seu_max_by_sample, 95):.2%}")
print(f" Maximum: {prop_seu_max_by_sample.max():.2%}")
```
The prior places substantial probability on:
- **Near-random choice** (~20-30% SEU max selection)
- **Moderate SEU-sensitivity** (~40-60% SEU max selection)
- **Strong SEU-maximization** (~80-100% SEU max selection)
### ✓ No Pathological Behaviors
The prior does not generate:
- Negative expected utilities (impossible given $\upsilon \in [0,1]$)
- Degenerate choice probabilities (softmax always assigns positive probability)
- Systematic biases toward particular alternatives (symmetric prior on $\boldsymbol{\beta}$)
### ✓ Alignment with Theoretical Properties
The simulations confirm the three properties from Report 1:
1. **Monotonicity**: Higher $\alpha$ → higher P(SEU max selection) ✓
2. **Perfect rationality limit**: Very high $\alpha$ → near-deterministic SEU max selection ✓
3. **Random choice limit**: Very low $\alpha$ → near-uniform choice ✓
## Prior Hyperparameter Sensitivity
::: {.callout-note}
## Scope Limitation
This report examines the prior predictive distribution under the **default** hyperparameters: $\text{Lognormal}(0, 1)$ for $\alpha$, $\text{Normal}(0, 1)$ for $\boldsymbol{\beta}$, and $\text{Dirichlet}(1, \ldots, 1)$ for $\boldsymbol{\delta}$. A complete prior sensitivity analysis—varying these hyperparameters to examine how the prior predictive distribution responds—is beyond scope here but is an important consideration for applied work.
The most impactful hyperparameter is the scale of the lognormal prior on $\alpha$. Reducing the scale (e.g., $\text{Lognormal}(0, 0.5)$) would concentrate mass on moderate sensitivity and reduce the probability of extreme near-random or near-deterministic behaviors. Increasing the scale (e.g., $\text{Lognormal}(0, 2)$) would spread mass more widely, placing more probability on extreme behaviors. Practitioners should adjust this based on domain knowledge about the expected range of sensitivity in their application.
:::
## Detecting Prior-Data Conflict
The prior predictive analysis establishes what behaviors are *a priori* plausible. When data arrives, we can check for **prior-data conflict**—situations where the observed data lies in a region assigned low prior predictive probability. Key indicators would include:
- **SEU maximizer selection rate** outside the prior predictive 95% interval (approximately 20%–90% under default priors)
- **Choice entropy** substantially higher or lower than the prior predictive range
- **Observed $\hat{\alpha}$** falling in the extreme tails of the Lognormal(0,1) prior (e.g., $\alpha < 0.05$ or $\alpha > 20$)
Such conflicts would suggest either (a) the priors are poorly calibrated for the application domain, or (b) the model is misspecified. Either warrants further investigation before trusting posterior inferences.
## Summary
The prior predictive analysis using `m_0_sim.stan` reveals that the default priors for model m_0 produce a sensible range of choice behaviors:
| Aspect | Finding |
|--------|---------|
| Sensitivity ($\alpha$) | Lognormal(0,1) covers low, moderate, and high sensitivity |
| SEU max selection | Prior mean ~50%, ranging from ~20% to ~100% |
| Choice probabilities | Neither too concentrated nor too diffuse |
| Theoretical alignment | All three fundamental properties confirmed empirically |
The prior is **weakly informative**—it does not strongly constrain parameters but rules out implausible behaviors. This is appropriate for a default prior when we have limited domain knowledge about the specific decision context.
::: {.callout-note}
## Design Considerations
The study design used here (M=25, K=3, D=5, R=15) is moderately complex. Key features:
- **25 decision problems**: Provides reasonable statistical power
- **5 feature dimensions**: Allows rich feature-to-probability mapping
- **15 alternatives**: Creates varied choice sets across problems
- **2-5 alternatives per problem**: Realistic range for typical experiments
The [next report](04_parameter_recovery.qmd) examines whether this design supports accurate parameter recovery.
:::
```{python}
#| label: cleanup
#| include: false
# Clean up temporary directory
import shutil
import logging
try:
shutil.rmtree(output_dir)
except OSError as e:
logging.warning(f"Failed to clean up temporary directory {output_dir}: {e}")
```
## References
::: {#refs}
:::