Hierarchical Prior Predictive Analysis

Foundational Report 10

foundations
validation
h_m01

Prior predictive validation of the h_m01 hierarchical model at the alignment-study factorial scale (J=18 cells from a 6×3 design). Establishes the tightened priors and verifies that implied cell-level sensitivities, choice distributions, and SEU-maximizer rates are scientifically plausible.

Author
Published

May 12, 2026

0.1 Introduction

Report 9 described the Stan implementation of the hierarchical h_m01 model. Before fitting the model to real alignment-study data we repeat, at the hierarchical scale, the prior-predictive validation we performed for m_0 in Report 3.

The questions are the same in spirit:

  1. Prior validation: Do the chosen hyperpriors produce sensible cell-level sensitivities?
  2. Model understanding: What range of behaviours is a priori plausible across the 18 experimental cells?
  3. Experimental design: Is the 6 × 3 factorial rich enough to distinguish the regression effects of interest?
NoteThe Hierarchical Prior Predictive Distribution

The prior predictive distribution for h_m01 is induced by:

  1. Drawing population hyperparameters: \(\gamma_0 \sim \mathcal{N}(2.5, 0.5)\), \(\boldsymbol{\gamma} \sim \mathcal{N}(0, 0.5)\), \(\sigma_{\text{cell}} \sim \text{half-}\mathcal{N}(0, 0.3)\), \(\boldsymbol{\beta}_j \sim \mathcal{N}(0, 1)\), \(\boldsymbol{\delta} \sim \text{Dirichlet}(1)\).
  2. Deriving cell-level sensitivities via the non-centred parameterisation \(\log\alpha_j = \gamma_0 + \mathbf{x}_j^\top \boldsymbol{\gamma} + \sigma_{\text{cell}}\, z_{\alpha,j}\).
  3. Computing the choice probabilities \(\chi_{j,m}\) for every problem in every cell.
  4. Simulating choices \(y_{j,m} \sim \text{Categorical}(\chi_{j,m})\).

This gives a joint distribution over (i) population-level effects, (ii) cell-level sensitivities, (iii) per-cell \(\boldsymbol{\beta}\)-matrices, and (iv) choice behaviours — before conditioning on any observed data.

0.2 Study Design: The Alignment Factorial

All hierarchical validation reports use the same design: the 6 × 3 factorial that will be applied in the alignment study, treatment-coded relative to a reference cell.

Show code
from utils.study_design_hierarchical import HierarchicalStudyDesign

# Factors: factor 0 = LLM (6 levels), factor 1 = prompt style (3 levels)
# Reference cell (index 0): first LLM × first prompt
X, labels, cell_levels = HierarchicalStudyDesign.treatment_design_matrix(
    factors=[6, 3], reference_indices=[0, 0]
)

print(f"J (cells) = {X.shape[0]}")
print(f"P (regression columns, treatment-coded) = {X.shape[1]}")
print(f"Column labels: {labels}")
print(f"Design-matrix rank: {np.linalg.matrix_rank(X)} "
      f"(= P, so the regression is identified)")
print(f"\nFirst 6 rows of X (cells 0..5, i.e. LLM 0 × prompt 0..2, "
      f"LLM 1 × prompt 0..2):")
print(X[:6])
J (cells) = 18
P (regression columns, treatment-coded) = 7
Column labels: ['f0_lv1', 'f0_lv2', 'f0_lv3', 'f0_lv4', 'f0_lv5', 'f1_lv1', 'f1_lv2']
Design-matrix rank: 7 (= P, so the regression is identified)

First 6 rows of X (cells 0..5, i.e. LLM 0 × prompt 0..2, LLM 1 × prompt 0..2):
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 1. 0.]
 [1. 0. 0. 0. 0. 0. 1.]]
NoteTreatment Coding Convention

With factors=[6, 3] and reference_indices=[0, 0], the reference cell is (LLM₀, prompt₀). The implicit intercept is absorbed by \(\gamma_0\); the seven treatment dummies in \(\mathbf{x}_j\) toggle on for each non-reference level of each factor. Therefore \(\gamma_0\) = log-sensitivity of the reference cell; \(\gamma_1,\ldots,\gamma_5\) = LLM main effects; \(\gamma_6, \gamma_7\) = prompt main effects. Cells are ordered in Kronecker / row-major form (last factor varies fastest).

The prior predictive analysis was executed via scripts/run_hierarchical_prior_predictive.py when run from the command line; inside this report we run the same analysis programmatically with the tightened priors and the 6 × 3 factorial. Results are cached to disk by Quarto so the analysis re-runs only when this report changes.

Show code
from analysis.hierarchical_prior_predictive import HierarchicalPriorPredictiveAnalysis

# Build the factorial design (J=18, P=7, treatment-coded).
study = HierarchicalStudyDesign.from_factorial(
    factors=[6, 3],
    reference_indices=[0, 0],
    K=3, D=2, R=10, M_per_cell=20,
    min_alts_per_problem=2, max_alts_per_problem=4,
    feature_dist="normal", feature_params={"loc": 0, "scale": 1},
    design_name="h_m01_prior_analysis",
)
study.generate()

# Tightened hyperparameters established in Report 9.
hyperparams = {
    "gamma0_mean": 2.5,
    "gamma0_sd": 0.5,
    "gamma_sd": 0.5,
    "sigma_cell_sd": 0.3,
    "beta_sd": 1.0,
}

output_dir = tempfile.mkdtemp(prefix="h_m01_prior_")

analysis = HierarchicalPriorPredictiveAnalysis(
    study_design=study,
    output_dir=output_dir,
    n_param_samples=200,
    n_choice_samples=5,
    hyperparams=hyperparams,
)
_ = analysis.run()

def _img(path, width=720):
    """Display a PNG from the prior-predictive output directory."""
    return Image(filename=os.path.join(output_dir, path), width=width)
Hyperparameters used:
  gamma0_mean = 2.5
  gamma0_sd = 0.5
  gamma_sd = 0.5
  sigma_cell_sd = 0.3
  beta_sd = 1.0

Dimensions: J=18, P=7, K=3, D=2, R=10, M_total=360

0.3 Prior Distributions of the Hyperparameters

0.3.1 Grand log-sensitivity \(\gamma_0\) and cell-level noise \(\sigma_{\text{cell}}\)

gamma0 is the log-sensitivity of the reference cell. The tightened prior \(\mathcal{N}(2.5, 0.5)\) places 95 % of its mass on \(\gamma_0 \in [1.5, 3.5]\), corresponding to a reference-cell sensitivity \(\alpha_{\text{ref}} \in [4.5, 33]\) — a scientifically reasonable range for “moderately rational” agents.

Show code
_img("regression_parameters/gamma0_dist.png")
Figure 1: Prior distribution of \(\gamma_0\) from the simulation block of h_m01_sim.stan.
Show code
_img("regression_parameters/sigma_cell_dist.png")
Figure 2: Prior distribution of the cell-level noise SD \(\sigma_\text{cell}\). The half-normal(0, 0.3) keeps the cell-level deviations modest — 95th percentile \(\approx 0.59\) — so that unexplained between-cell variation in \(\log\alpha\) does not swamp the regression signal.

0.3.2 Regression coefficients \(\boldsymbol{\gamma}\)

Each of the seven treatment dummies is given \(\mathcal{N}(0, 0.5)\) independently. On the log-\(\alpha\) scale, a single coefficient of \(0.5\) corresponds to a ~65 % multiplicative increase in \(\alpha\) for cells where the dummy is on (and a coefficient of \(-0.5\) to a ~40 % decrease).

Show code
_img("regression_parameters/gamma_dist.png")
Figure 3: Prior distributions of the seven regression coefficients \(\gamma_1,\ldots,\gamma_7\).
'   param   mean    sd    q05   q95\n  gamma0  2.503 0.497  1.671 3.324\ngamma[1] -0.010 0.503 -0.808 0.779\ngamma[2] -0.030 0.489 -0.810 0.786\ngamma[3]  0.004 0.496 -0.856 0.794\ngamma[4] -0.013 0.506 -0.830 0.820\ngamma[5] -0.027 0.499 -0.843 0.767\ngamma[6]  0.011 0.487 -0.771 0.812\ngamma[7]  0.006 0.481 -0.795 0.812'

0.3.3 Utility simplex \(\boldsymbol{\delta}\)

The symmetric Dirichlet(1) prior keeps the shared utility increments \(\boldsymbol{\delta}\) uniform on the simplex, matching the convention of the flat model (Report 3).

Show code
_img("utilities/delta_dist.png")
Figure 4: Prior on \(\boldsymbol{\delta}\) (left) and the induced utilities \(\boldsymbol{\upsilon}\) (right).

0.4 Prior Distributions over Cell-Level Sensitivities \(\boldsymbol{\alpha}\)

The real scientific object in h_m01 is the vector of cell-level sensitivities. A priori each \(\alpha_j\) is approximately lognormal around \(e^{\gamma_0} \approx 12\), stretched by the regression effects and cell-level noise.

Show code
_img("cell_alphas/alpha_by_cell.png")
Figure 5: Prior distribution of cell-level \(\alpha_j\) for each of the 18 cells. Dispersion across cells reflects the combined effect of the seven regression dummies and of \(\sigma_\text{cell}\).
Summary of prior marginals for cell-level α (tail quantiles in log-normal units):

     cell  median  q05   q95  mean    sd
 alpha[1]   12.31 4.69 30.91 14.59 10.07
 alpha[2]   12.48 3.51 41.17 16.29 13.11
 alpha[3]   12.89 3.66 43.31 16.72 16.61
 alpha[4]   12.03 3.59 44.09 16.38 15.14
 alpha[5]   12.22 2.51 56.22 18.55 19.38
 alpha[6]   11.93 2.58 59.36 19.02 23.11
 alpha[7]   11.77 3.64 42.14 15.80 13.91
 alpha[8]   12.06 2.89 52.96 17.49 17.26
 alpha[9]   11.69 2.84 56.99 18.06 21.96
alpha[10]   12.10 3.36 40.43 16.04 14.01
alpha[11]   12.53 2.95 52.12 18.35 19.77
alpha[12]   12.39 2.86 48.49 18.17 20.12
alpha[13]   12.31 3.31 43.65 16.64 16.53
alpha[14]   11.97 2.61 60.71 18.99 21.80
alpha[15]   11.62 2.62 61.59 19.27 23.99
alpha[16]   12.21 3.24 38.02 15.67 13.18
alpha[17]   12.37 2.64 49.35 17.56 18.77
alpha[18]   12.27 2.79 51.70 18.63 26.94

Key features:

  • Medians are tightly clustered at \(11.6\)\(12.9\) across all 18 cells — the cells differ in a priori centrality only modestly, as intended given \(\boldsymbol{\gamma} \sim \mathcal{N}(0, 0.5)\).
  • 95 th percentiles stay in the \(30\)\(60\) range; no cell’s prior spills into the “hyper-sharp” regime (\(\alpha > 100\)) where the softmax saturates. This is the key difference from the original, wider priors — under which the upper prior tail routinely exceeded \(140\) and induced divergent transitions during sampling (see Report 11).
  • 90 % intervals are comfortably bounded, covering the behaviourally meaningful range from weak-to-strong SEU sensitivity without advocating either near-random or deterministic choice.

0.5 Prior-Predictive Choice Behaviour

0.5.1 Chosen-alternative distributions

For each parameter draw we simulated \(R = 10\) choices per cell. Aggregating across draws and cells, the prior predicts a balanced but not uniform spread over the shared alternatives, with moderate concentration on the alternatives that happen to dominate the SEU ordering in a given draw.

Show code
_img("choices/choice_index_by_cell.png")
Figure 6: Distribution of choice indices across the 18 cells under the prior predictive. Each row is a cell; columns are the (within-problem) alternative indices.

0.5.2 SEU-maximiser selection rates

A particularly interpretable summary is the probability that the chosen alternative is the SEU-maximiser for its problem — i.e. whether the agent acts optimally under its own beliefs and utilities. Under the tightened priors:

Overall P(SEU-maximiser selected) = 0.768
Problems analysed: 360
Total SEU-max selected per simulation: mean=276.5, sd=32.6  (out of 360)

Per-cell mean P(SEU-maximiser):
  cell  1: mean=0.787, sd=0.075
  cell  2: mean=0.750, sd=0.081
  cell  3: mean=0.785, sd=0.067
  cell  4: mean=0.800, sd=0.081
  cell  5: mean=0.746, sd=0.072
  cell  6: mean=0.796, sd=0.071
  cell  7: mean=0.770, sd=0.055
  cell  8: mean=0.768, sd=0.058
  cell  9: mean=0.777, sd=0.066
  cell 10: mean=0.764, sd=0.064
  cell 11: mean=0.772, sd=0.064
  cell 12: mean=0.752, sd=0.076
  cell 13: mean=0.758, sd=0.079
  cell 14: mean=0.761, sd=0.051
  cell 15: mean=0.752, sd=0.055
  cell 16: mean=0.778, sd=0.066
  cell 17: mean=0.766, sd=0.072
  cell 18: mean=0.742, sd=0.072
Show code
_img("seu_maximizer_selection/prob_seu_max_by_cell.png")
Figure 7: Per-cell prior probability that the SEU-maximiser is chosen.

The overall rate (~0.77) sits in the intended middle range: high enough to make genuine SEU-maximising behaviour the modal outcome, low enough to leave substantial room for sub-optimal (and, in the alignment context, super-optimal) treatment effects to be detected.

TipScientific Interpretation

A prior that concentrated P(SEU-max) near 1.0 would be too committed to optimality — the posterior would struggle to update toward “this LLM is noisy” without a great deal of data. A prior that placed P(SEU-max) near \(1/|\text{alts}| \approx 0.3\) would be effectively uninformative about rationality. The tightened priors leave the entire alignment-relevant range identifiable.

0.6 Comparison with the Pre-Tightening Priors

The original h_m01 defaults were \(\gamma_0 \sim \mathcal{N}(3, 1)\), \(\boldsymbol{\gamma} \sim \mathcal{N}(0, 1)\), \(\sigma_{\text{cell}} \sim \text{half-}\mathcal{N}(0, 0.5)\), and \(\boldsymbol{\beta} \sim \mathcal{N}(0, 1)\). Under those priors the 97.5 th percentile of the prior predictive for \(\alpha\) frequently exceeded \(140\), pushing the softmax into near-deterministic regimes where the likelihood flattens and posterior exploration becomes unstable. The tightened priors (used throughout this and subsequent reports) preserve the scientifically meaningful range while restoring sampler geometry to a regime where HMC can operate reliably.

NoteProvenance and Reproducibility

This report runs HierarchicalPriorPredictiveAnalysis live on every build (Quarto caches the result so it re-runs only when the report itself changes). The same analysis can be reproduced from the command line via

python scripts/run_hierarchical_prior_predictive.py \
    --config configs/h_m01_prior_analysis_config.json

The design uses factors=[6, 3] and reference_indices=[0, 0] (treatment coding), giving \(J = 18\) cells and \(P = 7\) regression columns.

0.7 Summary

The hierarchical prior predictive analysis establishes three things:

  1. Cell-level sensitivities are well-behaved: \(\alpha_j\) medians cluster near 12 across cells, with 95th percentiles below 60 — avoiding the flat-likelihood regime that caused pathological sampling under the original priors.
  2. Implied choice behaviour is scientifically reasonable: the overall SEU-maximiser selection rate ≈ 0.77 leaves room for both sub-optimal and super-optimal treatment effects to be detected.
  3. The treatment-coded design is identified: \(\mathbf{X}\) has full column rank 7, with \(\gamma_0\) plus five LLM dummies and two prompt dummies forming an orthogonal main-effects parameterisation of the \(6 \times 3\) factorial.

Report 11 takes the next step: can we recover the known hyperparameters when we simulate data from this prior and fit h_m01 back to it?

Reuse

Citation

BibTeX citation:
@online{helzner2026,
  author = {Helzner, Jeff},
  title = {Hierarchical {Prior} {Predictive} {Analysis}},
  date = {2026-05-12},
  url = {https://jeffhelzner.github.io/seu-sensitivity/foundations/10_hierarchical_prior_analysis.html},
  langid = {en}
}
For attribution, please cite this work as:
Helzner, Jeff. 2026. “Hierarchical Prior Predictive Analysis.” SEU Sensitivity Project, May 12. https://jeffhelzner.github.io/seu-sensitivity/foundations/10_hierarchical_prior_analysis.html.