Prior Predictive Analysis

Jeff Helzner

Prior Predictive Analysis

Foundational Report 3

foundations

validation

m_0

Analysis of the prior predictive distribution for model m_0, examining what range of choice behaviors the prior permits before seeing any data.

Author

Jeff Helzner

Published

May 12, 2026

0.1 Introduction

Having established the abstract formulation (Report 1) and its concrete implementation (Report 2), we now examine the prior predictive distribution: what range of choice behaviors does the model permit before observing any data?

Prior predictive analysis serves several purposes:

Prior validation: Do the priors produce sensible behaviors?
Model understanding: What parameter combinations are a priori plausible?
Experimental design: Is the design rich enough to distinguish different behaviors?

The Prior Predictive Distribution

The prior predictive distribution is the distribution over observables (choices) induced by:

Drawing parameters from the prior: $(\alpha, \boldsymbol{\beta}, \boldsymbol{\delta}) \sim p(\theta)$
Computing derived quantities: $\boldsymbol{\psi}, \boldsymbol{\upsilon}, \boldsymbol{\eta}, \boldsymbol{\chi}$
Simulating choices: $y_m \sim \text{Categorical}(\boldsymbol{\chi}_m)$

This gives us a distribution over possible datasets before conditioning on actual observations.

0.2 Study Design

We use a moderately complex study design to illustrate the prior predictive distribution:

Show code

# Study design configuration
config = {
    "M": 25,                    # Number of decision problems
    "K": 3,                     # Number of consequences
    "D": 5,                     # Feature dimensions
    "R": 15,                    # Number of distinct alternatives
    "min_alts_per_problem": 2,  # Minimum alternatives per problem
    "max_alts_per_problem": 5,  # Maximum alternatives per problem
    "feature_dist": "normal",   # Feature distribution
    "feature_params": {"loc": 0, "scale": 1}
}

print(f"Study Design Configuration:")
print(f"  Decision problems (M): {config['M']}")
print(f"  Consequences (K): {config['K']}")
print(f"  Feature dimensions (D): {config['D']}")
print(f"  Distinct alternatives (R): {config['R']}")
print(f"  Alternatives per problem: {config['min_alts_per_problem']}-{config['max_alts_per_problem']}")

Study Design Configuration:
  Decision problems (M): 25
  Consequences (K): 3
  Feature dimensions (D): 5
  Distinct alternatives (R): 15
  Alternatives per problem: 2-5

Design Rationale

The dimensions are chosen purposefully:

K=3 consequences: The minimal interesting case for ordered utilities, allowing meaningful intermediate preferences
D=5 features: Provides sufficient variation for diverse feature-to-probability mappings without excessive dimensionality
M=25 problems: Represents a plausible small-study scenario with reasonable statistical power
R=15 alternatives: Creates varied choice sets while keeping computational burden manageable
2–5 alternatives per problem: Reflects realistic variation in typical behavioral experiments

Note that prior predictive distributions depend on study design; different designs may yield different behavioral ranges. Report 4 examines whether this design supports accurate parameter recovery.

Show code

from utils.study_design import StudyDesign

# Create and generate the study design
study = StudyDesign(
    M=config["M"],
    K=config["K"],
    D=config["D"],
    R=config["R"],
    min_alts_per_problem=config["min_alts_per_problem"],
    max_alts_per_problem=config["max_alts_per_problem"],
    feature_dist=config["feature_dist"],
    feature_params=config["feature_params"],
    design_name="prior_analysis"
)
study.generate()


Generated Study Design:
  Total alternatives across problems: 87
  Alternatives per problem: min=2, max=5, mean=3.5
  Alternative appearances: min=3, max=9, mean=5.8

0.2.1 Design Characteristics

Show code

M = config["M"]
R = config["R"]

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Alternatives per problem
axes[0].bar(range(1, M+1), N, color='steelblue', alpha=0.7)
axes[0].axhline(y=N.mean(), color='red', linestyle='--', label=f'Mean = {N.mean():.1f}')
axes[0].set_xlabel('Decision Problem')
axes[0].set_ylabel('Number of Alternatives')
axes[0].set_title('Alternatives per Problem')
axes[0].legend()

# Alternative frequency
alt_freq = I.sum(axis=0)
axes[1].bar(range(1, R+1), alt_freq, color='coral', alpha=0.7)
axes[1].axhline(y=alt_freq.mean(), color='red', linestyle='--', label=f'Mean = {alt_freq.mean():.1f}')
axes[1].set_xlabel('Alternative')
axes[1].set_ylabel('Frequency (# problems)')
axes[1].set_title('Alternative Appearance Frequency')
axes[1].legend()

plt.tight_layout()
plt.show()

Figure 1: Study design characteristics. Left: Number of alternatives in each decision problem. Right: Frequency with which each alternative appears across problems.

0.3 Prior Predictive Simulation Using m_0_sim.stan

We use the simulation model m_0_sim.stan to generate prior predictive samples. This ensures that our analysis matches exactly how the model generates data, including all numerical details of the Stan implementation.

Show code

from analysis.prior_predictive import PriorPredictiveAnalysis
import tempfile

# Create a temporary output directory for this analysis
output_dir = tempfile.mkdtemp(prefix="prior_pred_")

# Prior predictive configuration
# - 200 parameter configurations: sufficient for central statistics (MC SE ≈ 0.04 at SD=1)
# - 5 choice samples per config: captures choice variability given parameters
# - Total: 1,000 samples for behavioral diagnostics
n_param_samples = 200
n_choice_samples = 5

# Run prior predictive analysis using m_0_sim.stan
analysis = PriorPredictiveAnalysis(
    model_path=None,  # Uses default m_0_sim.stan
    study_design=study,
    output_dir=output_dir,
    n_param_samples=n_param_samples,
    n_choice_samples=n_choice_samples
)

# Run the analysis (this calls Stan)
samples = analysis.run()

Prior Predictive Simulation Complete:
  Parameter configurations sampled: 200
  Total samples: 1000
  Columns available: 676

0.4 Prior Distribution of Parameters

0.4.1 Sensitivity Parameter ($\alpha$)

The sensitivity parameter is sampled from a Lognormal(0, 1) prior in m_0_sim.stan:

Show code

# Extract unique alpha values (one per parameter set)
alpha_samples = samples.groupby('param_set')['alpha'].first().values

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Histogram with truncation for visualization
alpha_plot = alpha_samples[alpha_samples < 15]
axes[0].hist(alpha_plot, bins=40, density=True, alpha=0.7, color='steelblue', edgecolor='white')
axes[0].axvline(x=np.median(alpha_samples), color='red', linestyle='--', linewidth=2, 
                label=f'Median = {np.median(alpha_samples):.2f}')
axes[0].axvline(x=1, color='orange', linestyle=':', linewidth=2, label='α = 1')
axes[0].set_xlabel('Sensitivity (α)')
axes[0].set_ylabel('Density')
axes[0].set_title('Prior Distribution of α')
axes[0].legend()
axes[0].set_xlim(0, 15)

# Log scale to see full distribution
axes[1].hist(np.log10(alpha_samples + 0.01), bins=40, density=True, alpha=0.7, 
             color='steelblue', edgecolor='white')
axes[1].axvline(x=0, color='red', linestyle='--', linewidth=2, label='α = 1')
axes[1].set_xlabel('log₁₀(α)')
axes[1].set_ylabel('Density')
axes[1].set_title('Prior Distribution of α (log scale)')
axes[1].legend()

plt.tight_layout()
plt.show()

# Compute Monte Carlo standard errors
n_alpha = len(alpha_samples)
mc_se_mean = np.std(alpha_samples) / np.sqrt(n_alpha)
mc_se_median = 1.253 * np.std(alpha_samples) / np.sqrt(n_alpha)  # Asymptotic SE for median

print(f"\nAlpha summary statistics (n={n_alpha} samples):")
print(f"  Mean: {np.mean(alpha_samples):.2f} (MC SE: {mc_se_mean:.2f})")
print(f"  Median: {np.median(alpha_samples):.2f} (MC SE: {mc_se_median:.2f})")
print(f"  Std: {np.std(alpha_samples):.2f}")
print(f"  95% interval: [{np.percentile(alpha_samples, 2.5):.2f}, {np.percentile(alpha_samples, 97.5):.2f}]")

Figure 2: Prior distribution of the sensitivity parameter α sampled from m_0_sim.stan. The distribution has median ≈ 1 with substantial probability mass on both low sensitivity (random-like choice) and high sensitivity (near-optimal choice).


Alpha summary statistics (n=200 samples):
  Mean: 1.58 (MC SE: 0.13)
  Median: 0.96 (MC SE: 0.17)
  Std: 1.87
  95% interval: [0.15, 6.34]

The lognormal prior covers a wide range of sensitivity values:

Low α (< 0.5): Near-random choice, weak sensitivity to SEU
Moderate α (0.5–3): Probabilistic choice with clear SEU preference
High α (> 5): Near-deterministic choice, strong SEU maximization

0.4.2 Utility Parameters

Show code

# Extract unique delta and upsilon values
delta_cols = [col for col in samples.columns if col.startswith('delta[')]
upsilon_cols = [col for col in samples.columns if col.startswith('upsilon[')]

# Get one value per parameter set
param_samples = samples.groupby('param_set').first()

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Middle utility distribution (upsilon[2])
if 'upsilon[2]' in param_samples.columns:
    upsilon_2 = param_samples['upsilon[2]'].values
    axes[0].hist(upsilon_2, bins=40, density=True, alpha=0.7, color='forestgreen', edgecolor='white')
    axes[0].axvline(x=0.5, color='red', linestyle='--', linewidth=2, label=f'Mean ≈ 0.5')
    axes[0].set_xlabel('Middle Utility (υ₂)')
    axes[0].set_ylabel('Density')
    axes[0].set_title('Prior Distribution of υ₂')
    axes[0].legend()
    axes[0].set_xlim(0, 1)

# Delta distribution
if delta_cols:
    delta_1 = param_samples[delta_cols[0]].values
    delta_2 = param_samples[delta_cols[1]].values if len(delta_cols) > 1 else 1 - delta_1
    
    axes[1].scatter(delta_1, delta_2, alpha=0.3, s=20, c='forestgreen')
    axes[1].plot([0, 1], [1, 0], 'r--', linewidth=2, label='δ₁ + δ₂ = 1')
    axes[1].set_xlabel('δ₁ (first increment)')
    axes[1].set_ylabel('δ₂ (second increment)')
    axes[1].set_title('Prior Distribution on Utility Simplex')
    axes[1].set_xlim(-0.05, 1.05)
    axes[1].set_ylim(-0.05, 1.05)
    axes[1].legend()
    axes[1].set_aspect('equal')

plt.tight_layout()
plt.show()

Figure 3: Prior distribution of utilities under Dirichlet(1,1) as sampled from m_0_sim.stan. Left: Distribution of the middle utility υ₂. Right: Distribution of utility increments δ.

The symmetric Dirichlet(1,1) prior induces a uniform distribution over the simplex of valid utility configurations.

0.4.3 Feature-to-Probability Weights ($\boldsymbol{\beta}$)

The matrix $\boldsymbol{\beta} \in \mathbb{R}^{K \times D}$ maps alternative features to (unnormalized) log-probabilities over consequences. In m_0_sim.stan, each element is drawn independently from $\text{Normal}(0, \sigma_\beta)$:

Show code

# Extract beta coefficients
beta_cols = [col for col in samples.columns if col.startswith('beta[')]

# Get unique values per parameter set
param_samples = samples.groupby('param_set').first()

# Collect all beta values
all_betas = []
for col in beta_cols:
    all_betas.extend(param_samples[col].values)
all_betas = np.array(all_betas)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Distribution of all beta coefficients
axes[0].hist(all_betas, bins=50, density=True, alpha=0.7, color='purple', edgecolor='white')
# Overlay theoretical normal
x_range = np.linspace(all_betas.min(), all_betas.max(), 100)
axes[0].plot(x_range, stats.norm.pdf(x_range, 0, 1), 'r-', linewidth=2, label='Normal(0,1)')
axes[0].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[0].set_xlabel('β coefficient value')
axes[0].set_ylabel('Density')
axes[0].set_title('Prior Distribution of β Coefficients')
axes[0].legend()

# Heatmap of median beta values (not absolute value)
K, D = config['K'], config['D']
beta_medians = np.zeros((K, D))
for k in range(K):
    for d in range(D):
        col = f'beta[{k+1},{d+1}]'
        if col in param_samples.columns:
            beta_medians[k, d] = np.median(param_samples[col].values)

im = axes[1].imshow(beta_medians, cmap='RdBu_r', aspect='auto', vmin=-1, vmax=1)
axes[1].set_xlabel('Feature Dimension (d)')
axes[1].set_ylabel('Consequence (k)')
axes[1].set_title('Median β by Position')
axes[1].set_xticks(range(D))
axes[1].set_xticklabels([f'{d+1}' for d in range(D)])
axes[1].set_yticks(range(K))
axes[1].set_yticklabels([f'{k+1}' for k in range(K)])
plt.colorbar(im, ax=axes[1], label='Median β')

plt.tight_layout()
plt.show()

print(f"\nBeta coefficient summary:")
print(f"  Mean: {np.mean(all_betas):.3f}")
print(f"  Std: {np.std(all_betas):.3f}")
print(f"  95% interval: [{np.percentile(all_betas, 2.5):.2f}, {np.percentile(all_betas, 97.5):.2f}]")

Figure 4: Prior distribution of the feature-to-probability weights β sampled from m_0_sim.stan. Left: Distribution of all β coefficients. Right: Heatmap of median β values by position.


Beta coefficient summary:
  Mean: 0.016
  Std: 0.980
  95% interval: [-1.90, 1.94]

The prior on $\boldsymbol{\beta}$ is deliberately uninformative—it does not favor any particular feature-consequence relationship. This allows the data to determine how features predict consequences. See Report 2 for the interpretation of the $\text{Normal}(0,1)$ prior in terms of odds ratios (log-odds differences up to $\pm 2$, corresponding to probability ratios up to $e^2 \approx 7.4$).

0.4.4 Subjective Probabilities Over Consequences ($\boldsymbol{\psi}$)

The weights $\boldsymbol{\beta}$ interact with alternative features to produce subjective probabilities over consequences via the softmax transformation: \[ \psi_{rk} = \frac{\exp(\boldsymbol{\beta}_k^\top \mathbf{x}_r)}{\sum_{k'=1}^K \exp(\boldsymbol{\beta}_{k'}^\top \mathbf{x}_r)} \]

We compute $\boldsymbol{\psi}$ for each of the $R$ distinct alternatives from the sampled $\boldsymbol{\beta}$ values.

Show code

# Compute psi from beta for the R distinct alternatives
# (The psi in Stan's generated quantities is vectorized over problems, causing repetition)
K = config['K']
R = config['R']
D = config['D']

# Feature matrix for distinct alternatives
w_array = np.array(study.w)  # Shape: (R, D)

# Get unique parameter sets
param_sets = samples['param_set'].unique()
n_param_samples = len(param_sets)

# For each parameter sample, compute psi for all R distinct alternatives
# Shape: (n_param_samples, R, K)
psi_array = np.zeros((n_param_samples, R, K))

for p_idx, param_set in enumerate(param_sets):
    # Get one row from this param_set (parameters are same within a set)
    row = samples[samples['param_set'] == param_set].iloc[0]
    
    # Extract beta matrix
    beta = np.zeros((K, D))
    for k in range(K):
        for d in range(D):
            col = f'beta[{k+1},{d+1}]'
            if col in samples.columns:
                beta[k, d] = row[col]
    
    # Compute psi for each distinct alternative
    for r in range(R):
        logits = beta @ w_array[r]  # K-dimensional
        psi_array[p_idx, r] = np.exp(logits - np.max(logits))  # numerically stable softmax
        psi_array[p_idx, r] /= psi_array[p_idx, r].sum()

# Compute mean psi across samples for each (alternative, consequence)
mean_psi = psi_array.mean(axis=0)  # Shape: (R, K)
all_psi_values = psi_array.flatten()

fig, ax = plt.subplots(figsize=(10, 4))

# Heatmap of mean psi for each distinct alternative and consequence
im = ax.imshow(mean_psi.T, cmap='YlOrRd', aspect='auto', vmin=0, vmax=1)
ax.set_xlabel('Distinct Alternative (r)')
ax.set_ylabel('Consequence (k)')
ax.set_title('Mean ψ by Alternative and Consequence')
ax.set_yticks(range(K))
ax.set_yticklabels([f'{k+1}' for k in range(K)])
# Use odd numbers for x-axis labels (1, 3, 5, 7, ...)
xtick_positions = list(range(0, R, 2))  # 0, 2, 4, 6, ... (indices)
ax.set_xticks(xtick_positions)
ax.set_xticklabels([f'{x+1}' for x in xtick_positions])  # 1, 3, 5, 7, ... (labels)
plt.colorbar(im, ax=ax, label='Mean ψ')

plt.tight_layout()
plt.show()

print(f"\nSubjective probability summary (from {n_param_samples} prior parameter draws):")
print(f"  Mean ψ: {np.mean(all_psi_values):.3f} (theoretical uniform = {1/K:.3f})")
print(f"  Std ψ: {np.std(all_psi_values):.3f}")
print(f"  P(ψ < 0.1): {np.mean(all_psi_values < 0.1):.3f}")
print(f"  P(ψ > 0.9): {np.mean(all_psi_values > 0.9):.3f}")

Figure 5: Mean subjective probability for each distinct alternative and consequence, averaged over all prior draws.


Subjective probability summary (from 200 prior parameter draws):
  Mean ψ: 0.333 (theoretical uniform = 0.333)
  Std ψ: 0.321
  P(ψ < 0.1): 0.357
  P(ψ > 0.9): 0.081

Interpretation of Subjective Probabilities

The vector $\boldsymbol{\psi}_r$ represents the decision-maker’s subjective beliefs about which consequence will obtain if alternative $r$ is chosen. The Normal(0,1) prior on $\boldsymbol{\beta}$ induces a prior on $\boldsymbol{\psi}$ that:

Centers on uniformity when features are balanced
Allows extreme beliefs ($\psi_{rk} \approx 0$ or $\approx 1$) when features strongly predict consequences
Varies across alternatives based on their feature profiles

This captures the idea that alternatives with different feature profiles may induce quite different probability distributions over consequences.

Show code

# Show how psi varies across distinct alternatives for a few parameter draws
# We already computed psi_array above with shape (n_param_samples, R, K)

fig, axes = plt.subplots(2, 3, figsize=(14, 8))
axes = axes.flatten()

# Select 6 diverse parameter draws
np.random.seed(123)
sample_indices = np.random.choice(n_param_samples, min(6, n_param_samples), replace=False)

for idx, (ax, sample_idx) in enumerate(zip(axes, sample_indices)):
    # Get psi matrix for this parameter draw (already computed above)
    psi_matrix = psi_array[sample_idx]  # Shape: (R, K)
    
    # Heatmap
    im = ax.imshow(psi_matrix.T, cmap='YlOrRd', aspect='auto', vmin=0, vmax=1)
    ax.set_xlabel('Distinct Alternative (r)')
    ax.set_ylabel('Consequence')
    ax.set_title(f'Prior Draw {idx+1}', fontsize=11)
    ax.set_yticks(range(K))
    ax.set_yticklabels([f'k={k+1}' for k in range(K)])
    # Show fewer x-ticks for readability
    n_xticks = min(8, R)
    xtick_positions = np.linspace(0, R-1, n_xticks, dtype=int)
    ax.set_xticks(xtick_positions)
    ax.set_xticklabels([f'{x+1}' for x in xtick_positions])

# Add a shared colorbar
fig.subplots_adjust(right=0.88)
cbar_ax = fig.add_axes([0.91, 0.15, 0.02, 0.7])
fig.colorbar(im, cax=cbar_ax, label='ψ (subjective probability)')

plt.suptitle('Subjective Probability Distributions Under Individual Prior Draws', 
             fontsize=13, y=0.98)
plt.tight_layout(rect=[0, 0, 0.88, 0.95])
plt.show()

Figure 6: Heatmaps showing subjective probability distributions across the R distinct alternatives for six individual prior draws. Each heatmap shows how ψ varies across alternatives (columns) and consequences (rows) under a single parameter configuration.

The diversity of $\boldsymbol{\psi}$ profiles across alternatives—driven by variation in $\boldsymbol{\beta}$—is what allows the model to capture differences in how decision-makers perceive distinct alternatives.

0.5 Expected Utilities Under the Prior

The expected utilities $\eta_r = \boldsymbol{\psi}_r^\top \boldsymbol{\upsilon}$ depend on both the subjective probabilities ($\boldsymbol{\psi}$, derived from $\boldsymbol{\beta}$) and the utility vector ($\boldsymbol{\upsilon}$).

Show code

# Extract problem_etas columns
eta_cols = [col for col in samples.columns if 'problem_etas' in col]

# Get data for first 10 problems
n_problems_to_show = min(10, config["M"])

fig, axes = plt.subplots(2, 5, figsize=(15, 8))
axes = axes.flatten()

for m in range(n_problems_to_show):
    ax = axes[m]
    
    # Get columns for this problem (1-indexed in Stan)
    problem_cols = [col for col in eta_cols if f'problem_etas[{m+1},' in col]
    
    # Filter out padding values (very negative)
    valid_data = []
    valid_labels = []
    for col in problem_cols:
        data = samples[col].values
        if np.median(data) > -1e9:  # Not padding
            valid_data.append(data)
            alt_idx = col.split(',')[1].rstrip(']')
            valid_labels.append(alt_idx)
    
    if valid_data:
        bp = ax.boxplot(valid_data, patch_artist=True)
        for patch in bp['boxes']:
            patch.set_facecolor('steelblue')
            patch.set_alpha(0.7)
        ax.set_xticklabels(valid_labels)
        ax.set_title(f'Problem {m+1}')
        ax.set_xlabel('Alternative')
        ax.set_ylabel('η')
        ax.set_ylim(0, 1)

plt.suptitle('Prior Distribution of Expected Utilities by Problem', fontsize=14)
plt.tight_layout()
plt.show()

Figure 7: Distribution of expected utilities across a subset of decision problems. Each boxplot shows the distribution of η for alternatives in that problem across prior samples.

0.6 SEU Maximizer Selection

A key diagnostic is the probability of selecting the SEU-maximizing alternative under the prior. The simulation model m_0_sim.stan tracks this directly via the selected_seu_max variable.

The Random Baseline

For each decision problem with $N_m$ alternatives, a random chooser would select the SEU maximizer with probability $1/N_m$ (assuming a unique maximizer). We use this as a baseline: values above $1/N_m$ indicate that the prior allows for above-chance SEU-maximizing behavior. The baseline varies across problems because problems have different numbers of alternatives.

0.6.1 By Decision Problem

Show code

# Extract SEU maximizer selection for each problem
seu_max_cols = [col for col in samples.columns if 'selected_seu_max' in col and 'total' not in col]

M = config["M"]
prob_seu_max = np.zeros(M)

for m in range(M):
    col = f'selected_seu_max[{m+1}]'
    if col in samples.columns:
        prob_seu_max[m] = samples[col].mean()

# Random baseline
random_baseline = 1.0 / N

fig, ax = plt.subplots(figsize=(12, 5))

x = np.arange(1, M+1)
width = 0.35

bars1 = ax.bar(x - width/2, prob_seu_max, width, label='Prior Predictive', 
               color='steelblue', alpha=0.7)
bars2 = ax.bar(x + width/2, random_baseline, width, label='Random Choice Baseline', 
               color='coral', alpha=0.7)

ax.set_xlabel('Decision Problem')
ax.set_ylabel('P(SEU Maximizer Selected)')
ax.set_title('Probability of Selecting SEU Maximizer by Problem')
ax.set_xticks(x)
ax.legend()
ax.set_ylim(0, 1)

plt.tight_layout()
plt.show()

print(f"\nSummary:")
print(f"  Mean P(SEU max): {prob_seu_max.mean():.3f}")
print(f"  Mean random baseline: {random_baseline.mean():.3f}")
print(f"  Ratio (observed/random): {prob_seu_max.mean() / random_baseline.mean():.2f}x")

Figure 8: Probability of selecting the SEU-maximizing alternative for each decision problem, computed from m_0_sim.stan samples. The coral bars show the random choice baseline (1/N for each problem).


Summary:
  Mean P(SEU max): 0.433
  Mean random baseline: 0.320
  Ratio (observed/random): 1.35x

0.6.2 Total SEU Maximizers Selected

Show code

# Get total SEU max selected
if 'total_seu_max_selected' in samples.columns:
    total_seu_max = samples['total_seu_max_selected'].values
else:
    # Compute from individual columns
    total_seu_max = np.zeros(len(samples))
    for m in range(M):
        col = f'selected_seu_max[{m+1}]'
        if col in samples.columns:
            total_seu_max += samples[col].values

fig, ax = plt.subplots(figsize=(10, 5))

ax.hist(total_seu_max, bins=range(0, M+2), density=True, alpha=0.7, 
        color='steelblue', edgecolor='white', align='left')

# Expected under random choice
expected_random = sum(1/n for n in N)
ax.axvline(x=expected_random, color='coral', linestyle='--', linewidth=2, 
           label=f'Expected (random): {expected_random:.1f}')
ax.axvline(x=total_seu_max.mean(), color='red', linestyle='-', linewidth=2,
           label=f'Prior mean: {total_seu_max.mean():.1f}')

ax.set_xlabel(f'Number of SEU Maximizers Selected (out of {M})')
ax.set_ylabel('Density')
ax.set_title('Prior Predictive Distribution of Total SEU Maximizer Selections')
ax.legend()

plt.tight_layout()
plt.show()

print(f"\nTotal SEU Maximizer Selection Summary:")
print(f"  Mean: {total_seu_max.mean():.1f}")
print(f"  Std: {total_seu_max.std():.1f}")
print(f"  Min: {total_seu_max.min():.0f}")
print(f"  Max: {total_seu_max.max():.0f}")
print(f"  Expected under random: {expected_random:.1f}")

Figure 9: Distribution of the total number of SEU maximizers selected (out of 25 problems) across prior samples. This is computed directly by m_0_sim.stan.


Total SEU Maximizer Selection Summary:
  Mean: 10.8
  Std: 3.5
  Min: 3
  Max: 23
  Expected under random: 8.0

0.6.3 Relationship with $\alpha$

Show code

# Get alpha and total SEU max for each sample
alpha_per_sample = samples['alpha'].values
seu_max_per_sample = total_seu_max / M  # Proportion

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Scatter plot
axes[0].scatter(alpha_per_sample, seu_max_per_sample, alpha=0.2, s=15, c='steelblue')
axes[0].set_xlabel('Sensitivity (α)')
axes[0].set_ylabel('Proportion SEU Max Selected')
axes[0].set_title('SEU Maximizer Selection vs. Sensitivity')
axes[0].set_xlim(0, min(15, np.percentile(alpha_per_sample, 99)))

# Binned averages
alpha_bins = [0, 0.25, 0.5, 1, 2, 3, 5, 10, 100]
bin_centers = []
prop_means = []
prop_stds = []

for i in range(len(alpha_bins) - 1):
    mask = (alpha_per_sample >= alpha_bins[i]) & (alpha_per_sample < alpha_bins[i+1])
    if mask.sum() > 5:
        bin_centers.append((alpha_bins[i] + alpha_bins[i+1])/2)
        prop_means.append(seu_max_per_sample[mask].mean())
        prop_stds.append(seu_max_per_sample[mask].std())

axes[1].errorbar(bin_centers, prop_means, yerr=prop_stds, fmt='o-', 
                 color='steelblue', linewidth=2, markersize=8, capsize=5)
axes[1].axhline(y=1/np.mean(N), color='coral', linestyle='--', 
                label=f'Random baseline: {1/np.mean(N):.2f}')
axes[1].set_xlabel('Sensitivity (α)')
axes[1].set_ylabel('Mean Proportion SEU Max Selected')
axes[1].set_title('SEU Maximizer Selection vs. α (Binned)')
axes[1].set_xscale('log')
axes[1].legend()
axes[1].set_ylim(0, 1)

plt.tight_layout()
plt.show()

Figure 10: Relationship between sensitivity α and the proportion of SEU maximizers selected. Higher α leads to more consistent SEU maximization, as predicted by the theoretical properties from Report 1.

This plot empirically confirms Property 1 (Monotonicity) from Report 1: as $\alpha$ increases, the probability of selecting SEU-maximizing alternatives increases.

0.7 Prior Predictive Checks: Are the Priors Reasonable?

Based on the prior predictive analysis using m_0_sim.stan, we can assess whether the default priors produce sensible behaviors:

0.7.1 ✓ The Prior Covers the Full Range of Behaviors

Show code

# Summarize the range of behaviors
prop_seu_max_by_sample = total_seu_max / M

print("Proportion of SEU Maximizers Selected (across prior samples):")
print(f"  Minimum: {prop_seu_max_by_sample.min():.2%}")
print(f"  5th percentile: {np.percentile(prop_seu_max_by_sample, 5):.2%}")
print(f"  Median: {np.median(prop_seu_max_by_sample):.2%}")
print(f"  95th percentile: {np.percentile(prop_seu_max_by_sample, 95):.2%}")
print(f"  Maximum: {prop_seu_max_by_sample.max():.2%}")

Proportion of SEU Maximizers Selected (across prior samples):
  Minimum: 12.00%
  5th percentile: 24.00%
  Median: 40.00%
  95th percentile: 72.00%
  Maximum: 92.00%

The prior places substantial probability on: - Near-random choice (~20-30% SEU max selection) - Moderate SEU-sensitivity (~40-60% SEU max selection) - Strong SEU-maximization (~80-100% SEU max selection)

0.7.2 ✓ No Pathological Behaviors

The prior does not generate: - Negative expected utilities (impossible given $\upsilon \in [0,1]$) - Degenerate choice probabilities (softmax always assigns positive probability) - Systematic biases toward particular alternatives (symmetric prior on $\boldsymbol{\beta}$)

0.7.3 ✓ Alignment with Theoretical Properties

The simulations confirm the three properties from Report 1: 1. Monotonicity: Higher $\alpha$ → higher P(SEU max selection) ✓ 2. Perfect rationality limit: Very high $\alpha$ → near-deterministic SEU max selection ✓ 3. Random choice limit: Very low $\alpha$ → near-uniform choice ✓

0.8 Prior Hyperparameter Sensitivity

Scope Limitation

This report examines the prior predictive distribution under the default hyperparameters: $\text{Lognormal}(0, 1)$ for $\alpha$, $\text{Normal}(0, 1)$ for $\boldsymbol{\beta}$, and $\text{Dirichlet}(1, \ldots, 1)$ for $\boldsymbol{\delta}$. A complete prior sensitivity analysis—varying these hyperparameters to examine how the prior predictive distribution responds—is beyond scope here but is an important consideration for applied work.

The most impactful hyperparameter is the scale of the lognormal prior on $\alpha$. Reducing the scale (e.g., $\text{Lognormal}(0, 0.5)$) would concentrate mass on moderate sensitivity and reduce the probability of extreme near-random or near-deterministic behaviors. Increasing the scale (e.g., $\text{Lognormal}(0, 2)$) would spread mass more widely, placing more probability on extreme behaviors. Practitioners should adjust this based on domain knowledge about the expected range of sensitivity in their application.

0.9 Detecting Prior-Data Conflict

The prior predictive analysis establishes what behaviors are a priori plausible. When data arrives, we can check for prior-data conflict—situations where the observed data lies in a region assigned low prior predictive probability. Key indicators would include:

SEU maximizer selection rate outside the prior predictive 95% interval (approximately 20%–90% under default priors)
Choice entropy substantially higher or lower than the prior predictive range
Observed $\hat{\alpha}$ falling in the extreme tails of the Lognormal(0,1) prior (e.g., $\alpha < 0.05$ or $\alpha > 20$)

Such conflicts would suggest either (a) the priors are poorly calibrated for the application domain, or (b) the model is misspecified. Either warrants further investigation before trusting posterior inferences.

0.10 Summary

The prior predictive analysis using m_0_sim.stan reveals that the default priors for model m_0 produce a sensible range of choice behaviors:

Aspect	Finding
Sensitivity ($\alpha$)	Lognormal(0,1) covers low, moderate, and high sensitivity
SEU max selection	Prior mean ~50%, ranging from ~20% to ~100%
Choice probabilities	Neither too concentrated nor too diffuse
Theoretical alignment	All three fundamental properties confirmed empirically

The prior is weakly informative—it does not strongly constrain parameters but rules out implausible behaviors. This is appropriate for a default prior when we have limited domain knowledge about the specific decision context.

Design Considerations

The study design used here (M=25, K=3, D=5, R=15) is moderately complex. Key features:

25 decision problems: Provides reasonable statistical power
5 feature dimensions: Allows rich feature-to-probability mapping
15 alternatives: Creates varied choice sets across problems
2-5 alternatives per problem: Realistic range for typical experiments

The next report examines whether this design supports accurate parameter recovery.

0.11 References

Reuse

CC BY-SA 4.0

Citation

BibTeX citation:

@online{helzner2026,
  author = {Helzner, Jeff},
  title = {Prior {Predictive} {Analysis}},
  date = {2026-05-12},
  url = {https://jeffhelzner.github.io/seu-sensitivity/foundations/03_prior_analysis.html},
  langid = {en}
}

For attribution, please cite this work as:

Helzner, Jeff. 2026. “Prior Predictive Analysis.” SEU Sensitivity Project, May 12. https://jeffhelzner.github.io/seu-sensitivity/foundations/03_prior_analysis.html.

--- title: "Prior Predictive Analysis" subtitle: "Foundational Report 3" description: | Analysis of the prior predictive distribution for model m_0, examining what range of choice behaviors the prior permits before seeing any data. categories: [foundations, validation, m_0] execute: cache: true --- ```{python} #| label: setup #| include: false import sys import os # Add parent directories to path sys.path.insert(0, os.path.join(os.getcwd(), '..')) project_root = os.path.dirname(os.path.dirname(os.getcwd())) sys.path.insert(0, project_root) import numpy as np import pandas as pd import matplotlib.pyplot as plt from scipy import stats import warnings warnings.filterwarnings('ignore') # Set random seed for reproducibility np.random.seed(42) ``` ## Introduction Having established the abstract formulation ([Report 1](01_abstract_formulation.qmd)) and its concrete implementation ([Report 2](02_concrete_implementation.qmd)), we now examine the **prior predictive distribution**: what range of choice behaviors does the model permit before observing any data? Prior predictive analysis serves several purposes: 1. **Prior validation**: Do the priors produce sensible behaviors? 2. **Model understanding**: What parameter combinations are a priori plausible? 3. **Experimental design**: Is the design rich enough to distinguish different behaviors? ::: {.callout-note} ## The Prior Predictive Distribution The prior predictive distribution is the distribution over observables (choices) induced by: 1. Drawing parameters from the prior: $(\alpha, \boldsymbol{\beta}, \boldsymbol{\delta}) \sim p(\theta)$ 2. Computing derived quantities: $\boldsymbol{\psi}, \boldsymbol{\upsilon}, \boldsymbol{\eta}, \boldsymbol{\chi}$ 3. Simulating choices: $y_m \sim \text{Categorical}(\boldsymbol{\chi}_m)$ This gives us a distribution over possible datasets *before* conditioning on actual observations. ::: ## Study Design We use a moderately complex study design to illustrate the prior predictive distribution: ```{python} #| label: study-design-config #| echo: true # Study design configuration config = { "M": 25, # Number of decision problems "K": 3, # Number of consequences "D": 5, # Feature dimensions "R": 15, # Number of distinct alternatives "min_alts_per_problem": 2, # Minimum alternatives per problem "max_alts_per_problem": 5, # Maximum alternatives per problem "feature_dist": "normal", # Feature distribution "feature_params": {"loc": 0, "scale": 1} } print(f"Study Design Configuration:") print(f" Decision problems (M): {config['M']}") print(f" Consequences (K): {config['K']}") print(f" Feature dimensions (D): {config['D']}") print(f" Distinct alternatives (R): {config['R']}") print(f" Alternatives per problem: {config['min_alts_per_problem']}-{config['max_alts_per_problem']}") ``` ::: {.callout-note} ## Design Rationale The dimensions are chosen purposefully: - **K=3 consequences**: The minimal interesting case for ordered utilities, allowing meaningful intermediate preferences - **D=5 features**: Provides sufficient variation for diverse feature-to-probability mappings without excessive dimensionality - **M=25 problems**: Represents a plausible small-study scenario with reasonable statistical power - **R=15 alternatives**: Creates varied choice sets while keeping computational burden manageable - **2–5 alternatives per problem**: Reflects realistic variation in typical behavioral experiments Note that prior predictive distributions depend on study design; different designs may yield different behavioral ranges. Report 4 examines whether this design supports accurate parameter recovery. ::: ```{python} #| label: generate-study-design #| output: false from utils.study_design import StudyDesign # Create and generate the study design study = StudyDesign( M=config["M"], K=config["K"], D=config["D"], R=config["R"], min_alts_per_problem=config["min_alts_per_problem"], max_alts_per_problem=config["max_alts_per_problem"], feature_dist=config["feature_dist"], feature_params=config["feature_params"], design_name="prior_analysis" ) study.generate() ``` ```{python} #| label: design-summary #| echo: false # Extract design properties w = np.array(study.w) I = np.array(study.I) N = I.sum(axis=1) print(f"\nGenerated Study Design:") print(f" Total alternatives across problems: {I.sum()}") print(f" Alternatives per problem: min={N.min()}, max={N.max()}, mean={N.mean():.1f}") print(f" Alternative appearances: min={I.sum(axis=0).min()}, max={I.sum(axis=0).max()}, mean={I.sum(axis=0).mean():.1f}") ``` ### Design Characteristics ```{python} #| label: fig-design-characteristics #| fig-cap: "Study design characteristics. Left: Number of alternatives in each decision problem. Right: Frequency with which each alternative appears across problems." M = config["M"] R = config["R"] fig, axes = plt.subplots(1, 2, figsize=(12, 4)) # Alternatives per problem axes[0].bar(range(1, M+1), N, color='steelblue', alpha=0.7) axes[0].axhline(y=N.mean(), color='red', linestyle='--', label=f'Mean = {N.mean():.1f}') axes[0].set_xlabel('Decision Problem') axes[0].set_ylabel('Number of Alternatives') axes[0].set_title('Alternatives per Problem') axes[0].legend() # Alternative frequency alt_freq = I.sum(axis=0) axes[1].bar(range(1, R+1), alt_freq, color='coral', alpha=0.7) axes[1].axhline(y=alt_freq.mean(), color='red', linestyle='--', label=f'Mean = {alt_freq.mean():.1f}') axes[1].set_xlabel('Alternative') axes[1].set_ylabel('Frequency (# problems)') axes[1].set_title('Alternative Appearance Frequency') axes[1].legend() plt.tight_layout() plt.show() ``` ## Prior Predictive Simulation Using m_0_sim.stan We use the simulation model `m_0_sim.stan` to generate prior predictive samples. This ensures that our analysis matches exactly how the model generates data, including all numerical details of the Stan implementation. ```{python} #| label: run-prior-predictive #| output: false from analysis.prior_predictive import PriorPredictiveAnalysis import tempfile # Create a temporary output directory for this analysis output_dir = tempfile.mkdtemp(prefix="prior_pred_") # Prior predictive configuration # - 200 parameter configurations: sufficient for central statistics (MC SE ≈ 0.04 at SD=1) # - 5 choice samples per config: captures choice variability given parameters # - Total: 1,000 samples for behavioral diagnostics n_param_samples = 200 n_choice_samples = 5 # Run prior predictive analysis using m_0_sim.stan analysis = PriorPredictiveAnalysis( model_path=None, # Uses default m_0_sim.stan study_design=study, output_dir=output_dir, n_param_samples=n_param_samples, n_choice_samples=n_choice_samples ) # Run the analysis (this calls Stan) samples = analysis.run() ``` ```{python} #| label: samples-info #| echo: false print(f"Prior Predictive Simulation Complete:") print(f" Parameter configurations sampled: {samples['param_set'].nunique()}") print(f" Total samples: {len(samples)}") print(f" Columns available: {len(samples.columns)}") ``` ## Prior Distribution of Parameters ### Sensitivity Parameter ($\alpha$) The sensitivity parameter is sampled from a Lognormal(0, 1) prior in `m_0_sim.stan`: ```{python} #| label: fig-alpha-prior #| fig-cap: "Prior distribution of the sensitivity parameter α sampled from m_0_sim.stan. The distribution has median ≈ 1 with substantial probability mass on both low sensitivity (random-like choice) and high sensitivity (near-optimal choice)." # Extract unique alpha values (one per parameter set) alpha_samples = samples.groupby('param_set')['alpha'].first().values fig, axes = plt.subplots(1, 2, figsize=(12, 4)) # Histogram with truncation for visualization alpha_plot = alpha_samples[alpha_samples < 15] axes[0].hist(alpha_plot, bins=40, density=True, alpha=0.7, color='steelblue', edgecolor='white') axes[0].axvline(x=np.median(alpha_samples), color='red', linestyle='--', linewidth=2, label=f'Median = {np.median(alpha_samples):.2f}') axes[0].axvline(x=1, color='orange', linestyle=':', linewidth=2, label='α = 1') axes[0].set_xlabel('Sensitivity (α)') axes[0].set_ylabel('Density') axes[0].set_title('Prior Distribution of α') axes[0].legend() axes[0].set_xlim(0, 15) # Log scale to see full distribution axes[1].hist(np.log10(alpha_samples + 0.01), bins=40, density=True, alpha=0.7, color='steelblue', edgecolor='white') axes[1].axvline(x=0, color='red', linestyle='--', linewidth=2, label='α = 1') axes[1].set_xlabel('log₁₀(α)') axes[1].set_ylabel('Density') axes[1].set_title('Prior Distribution of α (log scale)') axes[1].legend() plt.tight_layout() plt.show() # Compute Monte Carlo standard errors n_alpha = len(alpha_samples) mc_se_mean = np.std(alpha_samples) / np.sqrt(n_alpha) mc_se_median = 1.253 * np.std(alpha_samples) / np.sqrt(n_alpha) # Asymptotic SE for median print(f"\nAlpha summary statistics (n={n_alpha} samples):") print(f" Mean: {np.mean(alpha_samples):.2f} (MC SE: {mc_se_mean:.2f})") print(f" Median: {np.median(alpha_samples):.2f} (MC SE: {mc_se_median:.2f})") print(f" Std: {np.std(alpha_samples):.2f}") print(f" 95% interval: [{np.percentile(alpha_samples, 2.5):.2f}, {np.percentile(alpha_samples, 97.5):.2f}]") ``` The lognormal prior covers a wide range of sensitivity values: - **Low α (< 0.5)**: Near-random choice, weak sensitivity to SEU - **Moderate α (0.5–3)**: Probabilistic choice with clear SEU preference - **High α (> 5)**: Near-deterministic choice, strong SEU maximization ### Utility Parameters ```{python} #| label: fig-utility-prior #| fig-cap: "Prior distribution of utilities under Dirichlet(1,1) as sampled from m_0_sim.stan. Left: Distribution of the middle utility υ₂. Right: Distribution of utility increments δ." # Extract unique delta and upsilon values delta_cols = [col for col in samples.columns if col.startswith('delta[')] upsilon_cols = [col for col in samples.columns if col.startswith('upsilon[')] # Get one value per parameter set param_samples = samples.groupby('param_set').first() fig, axes = plt.subplots(1, 2, figsize=(12, 4)) # Middle utility distribution (upsilon[2]) if 'upsilon[2]' in param_samples.columns: upsilon_2 = param_samples['upsilon[2]'].values axes[0].hist(upsilon_2, bins=40, density=True, alpha=0.7, color='forestgreen', edgecolor='white') axes[0].axvline(x=0.5, color='red', linestyle='--', linewidth=2, label=f'Mean ≈ 0.5') axes[0].set_xlabel('Middle Utility (υ₂)') axes[0].set_ylabel('Density') axes[0].set_title('Prior Distribution of υ₂') axes[0].legend() axes[0].set_xlim(0, 1) # Delta distribution if delta_cols: delta_1 = param_samples[delta_cols[0]].values delta_2 = param_samples[delta_cols[1]].values if len(delta_cols) > 1 else 1 - delta_1 axes[1].scatter(delta_1, delta_2, alpha=0.3, s=20, c='forestgreen') axes[1].plot([0, 1], [1, 0], 'r--', linewidth=2, label='δ₁ + δ₂ = 1') axes[1].set_xlabel('δ₁ (first increment)') axes[1].set_ylabel('δ₂ (second increment)') axes[1].set_title('Prior Distribution on Utility Simplex') axes[1].set_xlim(-0.05, 1.05) axes[1].set_ylim(-0.05, 1.05) axes[1].legend() axes[1].set_aspect('equal') plt.tight_layout() plt.show() ``` The symmetric Dirichlet(1,1) prior induces a **uniform distribution** over the simplex of valid utility configurations. ### Feature-to-Probability Weights ($\boldsymbol{\beta}$) The matrix $\boldsymbol{\beta} \in \mathbb{R}^{K \times D}$ maps alternative features to (unnormalized) log-probabilities over consequences. In `m_0_sim.stan`, each element is drawn independently from $\text{Normal}(0, \sigma_\beta)$: ```{python} #| label: fig-beta-prior #| fig-cap: "Prior distribution of the feature-to-probability weights β sampled from m_0_sim.stan. Left: Distribution of all β coefficients. Right: Heatmap of median β values by position." # Extract beta coefficients beta_cols = [col for col in samples.columns if col.startswith('beta[')] # Get unique values per parameter set param_samples = samples.groupby('param_set').first() # Collect all beta values all_betas = [] for col in beta_cols: all_betas.extend(param_samples[col].values) all_betas = np.array(all_betas) fig, axes = plt.subplots(1, 2, figsize=(12, 4)) # Distribution of all beta coefficients axes[0].hist(all_betas, bins=50, density=True, alpha=0.7, color='purple', edgecolor='white') # Overlay theoretical normal x_range = np.linspace(all_betas.min(), all_betas.max(), 100) axes[0].plot(x_range, stats.norm.pdf(x_range, 0, 1), 'r-', linewidth=2, label='Normal(0,1)') axes[0].axvline(x=0, color='gray', linestyle='--', alpha=0.5) axes[0].set_xlabel('β coefficient value') axes[0].set_ylabel('Density') axes[0].set_title('Prior Distribution of β Coefficients') axes[0].legend() # Heatmap of median beta values (not absolute value) K, D = config['K'], config['D'] beta_medians = np.zeros((K, D)) for k in range(K): for d in range(D): col = f'beta[{k+1},{d+1}]' if col in param_samples.columns: beta_medians[k, d] = np.median(param_samples[col].values) im = axes[1].imshow(beta_medians, cmap='RdBu_r', aspect='auto', vmin=-1, vmax=1) axes[1].set_xlabel('Feature Dimension (d)') axes[1].set_ylabel('Consequence (k)') axes[1].set_title('Median β by Position') axes[1].set_xticks(range(D)) axes[1].set_xticklabels([f'{d+1}' for d in range(D)]) axes[1].set_yticks(range(K)) axes[1].set_yticklabels([f'{k+1}' for k in range(K)]) plt.colorbar(im, ax=axes[1], label='Median β') plt.tight_layout() plt.show() print(f"\nBeta coefficient summary:") print(f" Mean: {np.mean(all_betas):.3f}") print(f" Std: {np.std(all_betas):.3f}") print(f" 95% interval: [{np.percentile(all_betas, 2.5):.2f}, {np.percentile(all_betas, 97.5):.2f}]") ``` The prior on $\boldsymbol{\beta}$ is deliberately uninformative—it does not favor any particular feature-consequence relationship. This allows the data to determine how features predict consequences. See [Report 2](02_concrete_implementation.qmd) for the interpretation of the $\text{Normal}(0,1)$ prior in terms of odds ratios (log-odds differences up to $\pm 2$, corresponding to probability ratios up to $e^2 \approx 7.4$). ### Subjective Probabilities Over Consequences ($\boldsymbol{\psi}$) The weights $\boldsymbol{\beta}$ interact with alternative features to produce **subjective probabilities** over consequences via the softmax transformation: $$ \psi_{rk} = \frac{\exp(\boldsymbol{\beta}_k^\top \mathbf{x}_r)}{\sum_{k'=1}^K \exp(\boldsymbol{\beta}_{k'}^\top \mathbf{x}_r)} $$ We compute $\boldsymbol{\psi}$ for each of the $R$ distinct alternatives from the sampled $\boldsymbol{\beta}$ values. ```{python} #| label: fig-psi-prior #| fig-cap: "Mean subjective probability for each distinct alternative and consequence, averaged over all prior draws." # Compute psi from beta for the R distinct alternatives # (The psi in Stan's generated quantities is vectorized over problems, causing repetition) K = config['K'] R = config['R'] D = config['D'] # Feature matrix for distinct alternatives w_array = np.array(study.w) # Shape: (R, D) # Get unique parameter sets param_sets = samples['param_set'].unique() n_param_samples = len(param_sets) # For each parameter sample, compute psi for all R distinct alternatives # Shape: (n_param_samples, R, K) psi_array = np.zeros((n_param_samples, R, K)) for p_idx, param_set in enumerate(param_sets): # Get one row from this param_set (parameters are same within a set) row = samples[samples['param_set'] == param_set].iloc[0] # Extract beta matrix beta = np.zeros((K, D)) for k in range(K): for d in range(D): col = f'beta[{k+1},{d+1}]' if col in samples.columns: beta[k, d] = row[col] # Compute psi for each distinct alternative for r in range(R): logits = beta @ w_array[r] # K-dimensional psi_array[p_idx, r] = np.exp(logits - np.max(logits)) # numerically stable softmax psi_array[p_idx, r] /= psi_array[p_idx, r].sum() # Compute mean psi across samples for each (alternative, consequence) mean_psi = psi_array.mean(axis=0) # Shape: (R, K) all_psi_values = psi_array.flatten() fig, ax = plt.subplots(figsize=(10, 4)) # Heatmap of mean psi for each distinct alternative and consequence im = ax.imshow(mean_psi.T, cmap='YlOrRd', aspect='auto', vmin=0, vmax=1) ax.set_xlabel('Distinct Alternative (r)') ax.set_ylabel('Consequence (k)') ax.set_title('Mean ψ by Alternative and Consequence') ax.set_yticks(range(K)) ax.set_yticklabels([f'{k+1}' for k in range(K)]) # Use odd numbers for x-axis labels (1, 3, 5, 7, ...) xtick_positions = list(range(0, R, 2)) # 0, 2, 4, 6, ... (indices) ax.set_xticks(xtick_positions) ax.set_xticklabels([f'{x+1}' for x in xtick_positions]) # 1, 3, 5, 7, ... (labels) plt.colorbar(im, ax=ax, label='Mean ψ') plt.tight_layout() plt.show() print(f"\nSubjective probability summary (from {n_param_samples} prior parameter draws):") print(f" Mean ψ: {np.mean(all_psi_values):.3f} (theoretical uniform = {1/K:.3f})") print(f" Std ψ: {np.std(all_psi_values):.3f}") print(f" P(ψ < 0.1): {np.mean(all_psi_values < 0.1):.3f}") print(f" P(ψ > 0.9): {np.mean(all_psi_values > 0.9):.3f}") ``` ::: {.callout-important} ## Interpretation of Subjective Probabilities The vector $\boldsymbol{\psi}_r$ represents the decision-maker's **subjective beliefs** about which consequence will obtain if alternative $r$ is chosen. The Normal(0,1) prior on $\boldsymbol{\beta}$ induces a prior on $\boldsymbol{\psi}$ that: 1. **Centers on uniformity** when features are balanced 2. **Allows extreme beliefs** ($\psi_{rk} \approx 0$ or $\approx 1$) when features strongly predict consequences 3. **Varies across alternatives** based on their feature profiles This captures the idea that alternatives with different feature profiles may induce quite different probability distributions over consequences. ::: ```{python} #| label: fig-psi-across-alternatives #| fig-cap: "Heatmaps showing subjective probability distributions across the R distinct alternatives for six individual prior draws. Each heatmap shows how ψ varies across alternatives (columns) and consequences (rows) under a single parameter configuration." #| fig-height: 8 # Show how psi varies across distinct alternatives for a few parameter draws # We already computed psi_array above with shape (n_param_samples, R, K) fig, axes = plt.subplots(2, 3, figsize=(14, 8)) axes = axes.flatten() # Select 6 diverse parameter draws np.random.seed(123) sample_indices = np.random.choice(n_param_samples, min(6, n_param_samples), replace=False) for idx, (ax, sample_idx) in enumerate(zip(axes, sample_indices)): # Get psi matrix for this parameter draw (already computed above) psi_matrix = psi_array[sample_idx] # Shape: (R, K) # Heatmap im = ax.imshow(psi_matrix.T, cmap='YlOrRd', aspect='auto', vmin=0, vmax=1) ax.set_xlabel('Distinct Alternative (r)') ax.set_ylabel('Consequence') ax.set_title(f'Prior Draw {idx+1}', fontsize=11) ax.set_yticks(range(K)) ax.set_yticklabels([f'k={k+1}' for k in range(K)]) # Show fewer x-ticks for readability n_xticks = min(8, R) xtick_positions = np.linspace(0, R-1, n_xticks, dtype=int) ax.set_xticks(xtick_positions) ax.set_xticklabels([f'{x+1}' for x in xtick_positions]) # Add a shared colorbar fig.subplots_adjust(right=0.88) cbar_ax = fig.add_axes([0.91, 0.15, 0.02, 0.7]) fig.colorbar(im, cax=cbar_ax, label='ψ (subjective probability)') plt.suptitle('Subjective Probability Distributions Under Individual Prior Draws', fontsize=13, y=0.98) plt.tight_layout(rect=[0, 0, 0.88, 0.95]) plt.show() ``` The diversity of $\boldsymbol{\psi}$ profiles across alternatives—driven by variation in $\boldsymbol{\beta}$—is what allows the model to capture differences in how decision-makers perceive distinct alternatives. ## Expected Utilities Under the Prior The expected utilities $\eta_r = \boldsymbol{\psi}_r^\top \boldsymbol{\upsilon}$ depend on both the subjective probabilities ($\boldsymbol{\psi}$, derived from $\boldsymbol{\beta}$) and the utility vector ($\boldsymbol{\upsilon}$). ```{python} #| label: fig-expected-utilities #| fig-cap: "Distribution of expected utilities across a subset of decision problems. Each boxplot shows the distribution of η for alternatives in that problem across prior samples." # Extract problem_etas columns eta_cols = [col for col in samples.columns if 'problem_etas' in col] # Get data for first 10 problems n_problems_to_show = min(10, config["M"]) fig, axes = plt.subplots(2, 5, figsize=(15, 8)) axes = axes.flatten() for m in range(n_problems_to_show): ax = axes[m] # Get columns for this problem (1-indexed in Stan) problem_cols = [col for col in eta_cols if f'problem_etas[{m+1},' in col] # Filter out padding values (very negative) valid_data = [] valid_labels = [] for col in problem_cols: data = samples[col].values if np.median(data) > -1e9: # Not padding valid_data.append(data) alt_idx = col.split(',')[1].rstrip(']') valid_labels.append(alt_idx) if valid_data: bp = ax.boxplot(valid_data, patch_artist=True) for patch in bp['boxes']: patch.set_facecolor('steelblue') patch.set_alpha(0.7) ax.set_xticklabels(valid_labels) ax.set_title(f'Problem {m+1}') ax.set_xlabel('Alternative') ax.set_ylabel('η') ax.set_ylim(0, 1) plt.suptitle('Prior Distribution of Expected Utilities by Problem', fontsize=14) plt.tight_layout() plt.show() ``` ## SEU Maximizer Selection A key diagnostic is the **probability of selecting the SEU-maximizing alternative** under the prior. The simulation model `m_0_sim.stan` tracks this directly via the `selected_seu_max` variable. ::: {.callout-note} ## The Random Baseline For each decision problem with $N_m$ alternatives, a random chooser would select the SEU maximizer with probability $1/N_m$ (assuming a unique maximizer). We use this as a baseline: values above $1/N_m$ indicate that the prior allows for above-chance SEU-maximizing behavior. The baseline varies across problems because problems have different numbers of alternatives. ::: ### By Decision Problem ```{python} #| label: fig-seu-max-by-problem #| fig-cap: "Probability of selecting the SEU-maximizing alternative for each decision problem, computed from m_0_sim.stan samples. The coral bars show the random choice baseline (1/N for each problem)." # Extract SEU maximizer selection for each problem seu_max_cols = [col for col in samples.columns if 'selected_seu_max' in col and 'total' not in col] M = config["M"] prob_seu_max = np.zeros(M) for m in range(M): col = f'selected_seu_max[{m+1}]' if col in samples.columns: prob_seu_max[m] = samples[col].mean() # Random baseline random_baseline = 1.0 / N fig, ax = plt.subplots(figsize=(12, 5)) x = np.arange(1, M+1) width = 0.35 bars1 = ax.bar(x - width/2, prob_seu_max, width, label='Prior Predictive', color='steelblue', alpha=0.7) bars2 = ax.bar(x + width/2, random_baseline, width, label='Random Choice Baseline', color='coral', alpha=0.7) ax.set_xlabel('Decision Problem') ax.set_ylabel('P(SEU Maximizer Selected)') ax.set_title('Probability of Selecting SEU Maximizer by Problem') ax.set_xticks(x) ax.legend() ax.set_ylim(0, 1) plt.tight_layout() plt.show() print(f"\nSummary:") print(f" Mean P(SEU max): {prob_seu_max.mean():.3f}") print(f" Mean random baseline: {random_baseline.mean():.3f}") print(f" Ratio (observed/random): {prob_seu_max.mean() / random_baseline.mean():.2f}x") ``` ### Total SEU Maximizers Selected ```{python} #| label: fig-total-seu-max #| fig-cap: "Distribution of the total number of SEU maximizers selected (out of 25 problems) across prior samples. This is computed directly by m_0_sim.stan." # Get total SEU max selected if 'total_seu_max_selected' in samples.columns: total_seu_max = samples['total_seu_max_selected'].values else: # Compute from individual columns total_seu_max = np.zeros(len(samples)) for m in range(M): col = f'selected_seu_max[{m+1}]' if col in samples.columns: total_seu_max += samples[col].values fig, ax = plt.subplots(figsize=(10, 5)) ax.hist(total_seu_max, bins=range(0, M+2), density=True, alpha=0.7, color='steelblue', edgecolor='white', align='left') # Expected under random choice expected_random = sum(1/n for n in N) ax.axvline(x=expected_random, color='coral', linestyle='--', linewidth=2, label=f'Expected (random): {expected_random:.1f}') ax.axvline(x=total_seu_max.mean(), color='red', linestyle='-', linewidth=2, label=f'Prior mean: {total_seu_max.mean():.1f}') ax.set_xlabel(f'Number of SEU Maximizers Selected (out of {M})') ax.set_ylabel('Density') ax.set_title('Prior Predictive Distribution of Total SEU Maximizer Selections') ax.legend() plt.tight_layout() plt.show() print(f"\nTotal SEU Maximizer Selection Summary:") print(f" Mean: {total_seu_max.mean():.1f}") print(f" Std: {total_seu_max.std():.1f}") print(f" Min: {total_seu_max.min():.0f}") print(f" Max: {total_seu_max.max():.0f}") print(f" Expected under random: {expected_random:.1f}") ``` ### Relationship with $\alpha$ ```{python} #| label: fig-seu-max-vs-alpha #| fig-cap: "Relationship between sensitivity α and the proportion of SEU maximizers selected. Higher α leads to more consistent SEU maximization, as predicted by the theoretical properties from Report 1." # Get alpha and total SEU max for each sample alpha_per_sample = samples['alpha'].values seu_max_per_sample = total_seu_max / M # Proportion fig, axes = plt.subplots(1, 2, figsize=(12, 5)) # Scatter plot axes[0].scatter(alpha_per_sample, seu_max_per_sample, alpha=0.2, s=15, c='steelblue') axes[0].set_xlabel('Sensitivity (α)') axes[0].set_ylabel('Proportion SEU Max Selected') axes[0].set_title('SEU Maximizer Selection vs. Sensitivity') axes[0].set_xlim(0, min(15, np.percentile(alpha_per_sample, 99))) # Binned averages alpha_bins = [0, 0.25, 0.5, 1, 2, 3, 5, 10, 100] bin_centers = [] prop_means = [] prop_stds = [] for i in range(len(alpha_bins) - 1): mask = (alpha_per_sample >= alpha_bins[i]) & (alpha_per_sample < alpha_bins[i+1]) if mask.sum() > 5: bin_centers.append((alpha_bins[i] + alpha_bins[i+1])/2) prop_means.append(seu_max_per_sample[mask].mean()) prop_stds.append(seu_max_per_sample[mask].std()) axes[1].errorbar(bin_centers, prop_means, yerr=prop_stds, fmt='o-', color='steelblue', linewidth=2, markersize=8, capsize=5) axes[1].axhline(y=1/np.mean(N), color='coral', linestyle='--', label=f'Random baseline: {1/np.mean(N):.2f}') axes[1].set_xlabel('Sensitivity (α)') axes[1].set_ylabel('Mean Proportion SEU Max Selected') axes[1].set_title('SEU Maximizer Selection vs. α (Binned)') axes[1].set_xscale('log') axes[1].legend() axes[1].set_ylim(0, 1) plt.tight_layout() plt.show() ``` This plot empirically confirms **Property 1 (Monotonicity)** from Report 1: as $\alpha$ increases, the probability of selecting SEU-maximizing alternatives increases. ## Prior Predictive Checks: Are the Priors Reasonable? Based on the prior predictive analysis using `m_0_sim.stan`, we can assess whether the default priors produce sensible behaviors: ### ✓ The Prior Covers the Full Range of Behaviors ```{python} #| label: behavior-range-check # Summarize the range of behaviors prop_seu_max_by_sample = total_seu_max / M print("Proportion of SEU Maximizers Selected (across prior samples):") print(f" Minimum: {prop_seu_max_by_sample.min():.2%}") print(f" 5th percentile: {np.percentile(prop_seu_max_by_sample, 5):.2%}") print(f" Median: {np.median(prop_seu_max_by_sample):.2%}") print(f" 95th percentile: {np.percentile(prop_seu_max_by_sample, 95):.2%}") print(f" Maximum: {prop_seu_max_by_sample.max():.2%}") ``` The prior places substantial probability on: - **Near-random choice** (~20-30% SEU max selection) - **Moderate SEU-sensitivity** (~40-60% SEU max selection) - **Strong SEU-maximization** (~80-100% SEU max selection) ### ✓ No Pathological Behaviors The prior does not generate: - Negative expected utilities (impossible given $\upsilon \in [0,1]$) - Degenerate choice probabilities (softmax always assigns positive probability) - Systematic biases toward particular alternatives (symmetric prior on $\boldsymbol{\beta}$) ### ✓ Alignment with Theoretical Properties The simulations confirm the three properties from Report 1: 1. **Monotonicity**: Higher $\alpha$ → higher P(SEU max selection) ✓ 2. **Perfect rationality limit**: Very high $\alpha$ → near-deterministic SEU max selection ✓ 3. **Random choice limit**: Very low $\alpha$ → near-uniform choice ✓ ## Prior Hyperparameter Sensitivity ::: {.callout-note} ## Scope Limitation This report examines the prior predictive distribution under the **default** hyperparameters: $\text{Lognormal}(0, 1)$ for $\alpha$, $\text{Normal}(0, 1)$ for $\boldsymbol{\beta}$, and $\text{Dirichlet}(1, \ldots, 1)$ for $\boldsymbol{\delta}$. A complete prior sensitivity analysis—varying these hyperparameters to examine how the prior predictive distribution responds—is beyond scope here but is an important consideration for applied work. The most impactful hyperparameter is the scale of the lognormal prior on $\alpha$. Reducing the scale (e.g., $\text{Lognormal}(0, 0.5)$) would concentrate mass on moderate sensitivity and reduce the probability of extreme near-random or near-deterministic behaviors. Increasing the scale (e.g., $\text{Lognormal}(0, 2)$) would spread mass more widely, placing more probability on extreme behaviors. Practitioners should adjust this based on domain knowledge about the expected range of sensitivity in their application. ::: ## Detecting Prior-Data Conflict The prior predictive analysis establishes what behaviors are *a priori* plausible. When data arrives, we can check for **prior-data conflict**—situations where the observed data lies in a region assigned low prior predictive probability. Key indicators would include: - **SEU maximizer selection rate** outside the prior predictive 95% interval (approximately 20%–90% under default priors) - **Choice entropy** substantially higher or lower than the prior predictive range - **Observed $\hat{\alpha}$** falling in the extreme tails of the Lognormal(0,1) prior (e.g., $\alpha < 0.05$ or $\alpha > 20$) Such conflicts would suggest either (a) the priors are poorly calibrated for the application domain, or (b) the model is misspecified. Either warrants further investigation before trusting posterior inferences. ## Summary The prior predictive analysis using `m_0_sim.stan` reveals that the default priors for model m_0 produce a sensible range of choice behaviors: | Aspect | Finding | |--------|---------| | Sensitivity ($\alpha$) | Lognormal(0,1) covers low, moderate, and high sensitivity | | SEU max selection | Prior mean ~50%, ranging from ~20% to ~100% | | Choice probabilities | Neither too concentrated nor too diffuse | | Theoretical alignment | All three fundamental properties confirmed empirically | The prior is **weakly informative**—it does not strongly constrain parameters but rules out implausible behaviors. This is appropriate for a default prior when we have limited domain knowledge about the specific decision context. ::: {.callout-note} ## Design Considerations The study design used here (M=25, K=3, D=5, R=15) is moderately complex. Key features: - **25 decision problems**: Provides reasonable statistical power - **5 feature dimensions**: Allows rich feature-to-probability mapping - **15 alternatives**: Creates varied choice sets across problems - **2-5 alternatives per problem**: Realistic range for typical experiments The [next report](04_parameter_recovery.qmd) examines whether this design supports accurate parameter recovery. ::: ```{python} #| label: cleanup #| include: false # Clean up temporary directory import shutil import logging try: shutil.rmtree(output_dir) except OSError as e: logging.warning(f"Failed to clean up temporary directory {output_dir}: {e}") ``` ## References ::: {#refs} :::

Prior Predictive Analysis

0.1 Introduction

0.2 Study Design

0.2.1 Design Characteristics

0.3 Prior Predictive Simulation Using m_0_sim.stan

0.4 Prior Distribution of Parameters

0.4.1 Sensitivity Parameter (\(\alpha\))

0.4.2 Utility Parameters

0.4.3 Feature-to-Probability Weights (\(\boldsymbol{\beta}\))

0.4.4 Subjective Probabilities Over Consequences (\(\boldsymbol{\psi}\))

0.5 Expected Utilities Under the Prior

0.6 SEU Maximizer Selection

0.6.1 By Decision Problem

0.6.2 Total SEU Maximizers Selected

0.6.3 Relationship with \(\alpha\)

0.7 Prior Predictive Checks: Are the Priors Reasonable?

0.7.1 ✓ The Prior Covers the Full Range of Behaviors

0.7.2 ✓ No Pathological Behaviors

0.7.3 ✓ Alignment with Theoretical Properties

0.8 Prior Hyperparameter Sensitivity

0.9 Detecting Prior-Data Conflict

0.10 Summary

0.11 References

Reuse

Citation