---
title: "Hierarchical Formulation of Multi-Cell SEU Sensitivity"
subtitle: "Foundational Report 8"
description: |
Abstract formulation of the hierarchical SEU sensitivity model. Extends the
single-agent softmax framework to multi-cell experimental designs where
sensitivity varies systematically across cells via regression on log(α).
categories: [foundations, theory, h_m01]
execute:
cache: true
---
```{python}
#| label: setup
#| include: false
import sys
import os
sys.path.insert(0, os.path.join(os.getcwd(), '..'))
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import softmax
from scipy.stats import norm, lognorm
```
## Introduction
[Report 1](01_abstract_formulation.qmd) and [Report 2](02_concrete_implementation.qmd) established the single-agent SEU sensitivity model: a sensitivity parameter $\alpha$ governs how consistently an agent's choices track expected utility via the softmax rule. [Reports 5–7](05_adding_risky_choices.qmd) extended the parameter space to accommodate risky alternatives, separate sensitivity, and proportional sensitivity. This report extends the framework in a different direction: from single-agent to **multi-cell experimental designs**.
The motivation for this extension is twofold:
1. **Prospective (primary).** The natural next step for this research program is a large-scale factorial experiment — the alignment study — crossing multiple LLMs with multiple prompt conditions (6 models × 3 prompts = 18 cells). Analyzing such a design requires a model that can estimate cell-level sensitivity within a single unified framework, with formal regression coefficients quantifying how experimental factors shift $\alpha$. Independent per-cell fits would not provide the formal effect estimates, partial pooling, or principled shrinkage that a study of this scale demands.
2. **Retrospective.** The 2×2 factorial synthesis ([Report](../applications/factorial_synthesis/01_factorial_synthesis.qmd)) already demonstrated the limitations of the independent-fits approach. That report explicitly noted: "A more statistically coherent analysis would fit a single hierarchical model with LLM, task, and temperature as factors." The factorial pilot data could itself be re-analyzed under the hierarchical framework, but the primary impetus was the prospective alignment study.
This report presents a hierarchical model — `h_m01` — that enables formal inference about how experimental factors (e.g., LLM identity, prompt framing, task domain) affect SEU sensitivity, within a single unified Bayesian analysis.
::: {.callout-note}
## Why Hierarchical?
When data are gathered across $J$ experimental cells (e.g., different LLMs, different prompt conditions), fitting each cell independently discards information. A hierarchical model:
1. **Shares statistical strength**: Cells with less data borrow from the group-level estimate
2. **Enables formal comparisons**: Regression coefficients directly quantify how factors shift sensitivity
3. **Provides principled shrinkage**: Extreme per-cell estimates are regularized toward the group mean
:::
## From Single-Agent to Multi-Cell Designs
Recall the single-agent model from [Report 1](01_abstract_formulation.qmd): a single sensitivity parameter $\alpha$ governs how choices track expected utility via the softmax rule $P(\text{choose } r) \propto \exp(\alpha \cdot \eta_r)$. All $M$ observations are generated by one agent with one $\alpha$.
Now consider the experimental design setting: $J$ cells indexed by $j \in \{1, \ldots, J\}$, each with its own population of observations. Cells correspond to distinct experimental conditions — for example, a particular LLM under a particular prompt. Each cell $j$ has its own sensitivity parameter $\alpha_j$.
The key question: how should we relate the $\alpha_j$ across cells?
Three possible approaches are:
| Approach | Model | Strengths | Weaknesses |
|----------|-------|-----------|------------|
| Independent fits | Fit m_01 separately per cell | Simple, no assumptions about cross-cell structure | No information sharing; between-cell comparisons are post-hoc |
| Exchangeable hierarchy | $\alpha_j \sim \text{LogNormal}(\mu, \sigma)$ | Partial pooling, shrinkage | No formal link to experimental design; effects estimated post-hoc |
| Regression hierarchy | $\log(\alpha_j) = \gamma_0 + \mathbf{X}_j \boldsymbol{\gamma} + \sigma_{\text{cell}} \cdot z_j$ | Formal effect estimates; partial pooling; directly interpretable coefficients | Requires specifying a design matrix |
: Approaches to multi-cell sensitivity estimation. {#tbl-approaches}
The `h_m01` model takes the **regression hierarchy** approach, which nests the other two as special cases: setting $P = 0$ (no predictors) recovers an exchangeable hierarchy, while $\sigma_{\text{cell}} \to \infty$ effectively recovers independent fits.
## Notation for the Hierarchical Model
::: {.callout-note}
## Notation Summary (Hierarchical Extension)
| Symbol | Description |
|--------|-------------|
| $J$ | Number of experimental cells |
| $j \in \{1, \ldots, J\}$ | Cell index |
| $\alpha_j$ | Sensitivity parameter for cell $j$ |
| $\mathbf{X} \in \mathbb{R}^{J \times P}$ | Design matrix (no intercept column) |
| $\gamma_0 \in \mathbb{R}$ | Intercept of log-$\alpha$ regression |
| $\boldsymbol{\gamma} \in \mathbb{R}^P$ | Regression coefficients |
| $\sigma_{\text{cell}} \geq 0$ | Cell-level residual SD |
| $z_j \sim \mathcal{N}(0, 1)$ | Standardized cell deviations |
| $\boldsymbol{\beta}_j \in \mathbb{R}^{K \times D}$ | Cell-specific feature-to-probability mapping |
| $\boldsymbol{\delta} \in \Delta^{K-2}$ | Shared utility increments (unchanged from m_01) |
| $M_{\text{total}}$ | Total observations across all cells |
| $\text{cell}[m] \in \{1, \ldots, J\}$ | Cell membership for observation $m$ |
**Unchanged from Reports 01–02:** $K$ (consequences), $D$ (feature dimensions), $R$ (distinct alternatives), $\mathbf{w}_r$ (feature vectors), $I_{m,r}$ (availability indicators), $y_m$ (observed choices), $\boldsymbol{\psi}_r$ (subjective probabilities), $\boldsymbol{\upsilon}$ (utilities), $\eta_r$ (expected utilities).
:::
## The Regression on Log-Sensitivity {#sec-regression}
This section presents the core theoretical contribution: a regression structure that links cell-level sensitivity to experimental factors.
The hierarchical structure is:
$$
\log(\alpha_j) = \gamma_0 + \mathbf{X}_j \boldsymbol{\gamma} + \sigma_{\text{cell}} \cdot z_j, \quad z_j \sim \mathcal{N}(0, 1)
$$ {#eq-regression}
Equivalently:
$$
\alpha_j = \exp\!\bigl(\gamma_0 + \mathbf{X}_j \boldsymbol{\gamma} + \sigma_{\text{cell}} \cdot z_j\bigr)
$$ {#eq-alpha-exp}
### Why the log scale?
The sensitivity parameter $\alpha$ is strictly positive. Working on the log scale maps to the real line, permitting standard normal regression mechanics. The lognormal prior on $\alpha$ in m_01 ($\text{LogNormal}(3.0, 0.75)$) already operates on the log scale — $\gamma_0$ takes the role of the prior mean (3.0) and $\sigma_{\text{cell}}$ plays a role analogous to the prior SD (0.75), but now conditional on predictors.
### Interpretation of $\boldsymbol{\gamma}$
Each $\gamma_p$ represents the **multiplicative change** in $\alpha$ associated with a unit change in $X_{j,p}$, all else equal. Specifically, $\exp(\gamma_p)$ is the multiplicative factor. If $\gamma_p = 0.5$, then a one-unit increase in predictor $p$ multiplies $\alpha$ by $\exp(0.5) \approx 1.65$, i.e., a 65% increase in sensitivity.
### Non-centered parameterization
We parameterize via $z_j \sim \mathcal{N}(0, 1)$ and construct $\log(\alpha_j) = \gamma_0 + \mathbf{X}_j \boldsymbol{\gamma} + \sigma_{\text{cell}} \cdot z_j$, rather than directly placing a normal prior on $\log(\alpha_j)$. These two formulations are mathematically equivalent, but the non-centered form performs better computationally when $\sigma_{\text{cell}}$ is small relative to the data information, because it avoids the "funnel" geometry that causes divergent transitions in the $(\sigma_{\text{cell}}, z_j)$ joint posterior [@carpenter2017].
::: {.callout-important}
## Interpreting Regression Coefficients
With a treatment-coded design matrix:
- **$\gamma_0$** is the log-sensitivity of the reference cell (all predictors = 0)
- **$\gamma_p$** is the log-ratio of sensitivity between the treatment and reference level for factor $p$
- **$\exp(\gamma_p)$** is the multiplicative effect on $\alpha$
Example: If $\gamma_0 = 3.0$ and $\gamma_1 = -0.5$ (for an LLM indicator), the reference LLM has median $\alpha \approx 20$ and the treatment LLM has median $\alpha \approx 20 \cdot \exp(-0.5) \approx 12$. The treatment LLM is approximately 40% less sensitive to SEU maximization.
:::
```{python}
#| label: fig-regression-illustration
#| fig-cap: "Illustration of how regression coefficients map to cell-level sensitivity distributions in a 2×2 factorial design. Left: design matrix structure. Right: resulting log(α) distributions for each cell, showing how γ coefficients shift the location."
fig, axes = plt.subplots(1, 2, figsize=(12, 5), gridspec_kw={'width_ratios': [1, 1.5]})
# Left panel: Design matrix visualization
ax = axes[0]
# 2x2 factorial: LLM (A vs B) × Prompt (neutral vs directive)
cell_labels = ['A × neutral', 'A × directive', 'B × neutral', 'B × directive']
X_example = np.array([
[0, 0], # reference: LLM A, neutral
[0, 1], # LLM A, directive
[1, 0], # LLM B, neutral
[1, 1], # LLM B, directive
])
im = ax.imshow(X_example, cmap='Blues', aspect='auto', vmin=-0.5, vmax=1.5)
ax.set_xticks([0, 1])
ax.set_xticklabels(['$X_1$ (LLM B)', '$X_2$ (directive)'], fontsize=10)
ax.set_yticks(range(4))
ax.set_yticklabels(cell_labels, fontsize=10)
for i in range(4):
for j_col in range(2):
ax.text(j_col, i, str(X_example[i, j_col]),
ha='center', va='center', fontsize=14, fontweight='bold',
color='white' if X_example[i, j_col] == 1 else 'black')
ax.set_title('Design Matrix $\\mathbf{X}$', fontsize=12)
# Right panel: Resulting log(alpha) distributions
ax = axes[1]
gamma0 = 3.0
gamma1 = -0.5 # LLM B effect
gamma2 = 0.3 # directive prompt effect
sigma_cell = 0.3
x_range = np.linspace(0.5, 5.5, 300)
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
for i, (label, x_row) in enumerate(zip(cell_labels, X_example)):
mu = gamma0 + x_row[0] * gamma1 + x_row[1] * gamma2
density = norm.pdf(x_range, loc=mu, scale=sigma_cell)
ax.plot(x_range, density, color=colors[i], linewidth=2, label=f'{label}\n$\\mu = {mu:.1f}$')
ax.fill_between(x_range, density, alpha=0.15, color=colors[i])
ax.set_xlabel('$\\log(\\alpha_j)$', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title(f'Cell-level $\\log(\\alpha)$ distributions\n($\\gamma_0={gamma0}$, $\\gamma_1={gamma1}$, $\\gamma_2={gamma2}$, $\\sigma_{{\\mathrm{{cell}}}}={sigma_cell}$)', fontsize=11)
ax.legend(fontsize=9, loc='upper right')
ax.set_xlim(0.5, 5.5)
plt.tight_layout()
plt.show()
```
## What Is Shared, What Varies
A key design decision in any hierarchical model is which parameters are shared across cells and which vary. In `h_m01`, these choices are:
| Component | Shared/Varies | Rationale |
|-----------|:-------------:|-----------|
| $\alpha_j$ (sensitivity) | Varies by cell (via regression) | Primary quantity of interest |
| $\boldsymbol{\beta}_j$ (feature → probability) | Varies by cell | Different LLMs/prompts may form different beliefs |
| $\boldsymbol{\delta}$ (utility increments) | Shared across cells | Consequence structure is constant across cells |
| $\boldsymbol{\upsilon}$ (utilities) | Shared (derived from $\boldsymbol{\delta}$) | Same as above |
| $\mathbf{w}_r$ (features) | Shared | Same alternative pool across all cells |
: Shared vs. cell-specific components in h_m01. {#tbl-shared-varies}
### Cell-specific $\boldsymbol{\beta}_j$
Each cell represents a different agent (LLM) under a different prompt. There is no reason to assume they form identical subjective probabilities from the same features. In fact, differences in $\boldsymbol{\beta}_j$ capture an important part of how LLMs differ — two models may agree on the utility ranking of consequences but disagree about which consequences follow from a given alternative.
### Shared $\boldsymbol{\delta}$
The utility structure represents the consequence rank-ordering. For a fixed task (e.g., insurance triage), the consequences and their relative desirability are the same regardless of which LLM is deciding or what prompt was used. Sharing $\boldsymbol{\delta}$ introduces a useful constraint and improves identification: the shared utility scale provides a common frame of reference against which cell-specific sensitivities and beliefs are estimated.
### Shared alternatives
The experimental design uses a common pool of decision problems across all cells, so that differences in behavior are attributable to the experimental factors, not to different stimuli. This is a standard feature of factorial designs.
## The Stacked Data Structure {#sec-stacked-data}
The hierarchical model requires a specific data format: all $M_{\text{total}}$ observations across all cells are **concatenated** into a single dataset, with a cell-membership vector $\text{cell}[m]$ identifying which cell observation $m$ belongs to.
This stacked structure is more efficient than separate per-cell fits because:
- The shared parameters ($\boldsymbol{\delta}$) are estimated jointly
- The regression structure ($\gamma_0, \boldsymbol{\gamma}, \sigma_{\text{cell}}$) pools information across cells
- A single MCMC run produces joint posterior samples, enabling coherent inference about all parameters simultaneously
```{python}
#| label: fig-stacked-data
#| fig-cap: "Schematic of the stacked data structure. Observations from J cells are concatenated into a single data array. The cell[m] vector indexes back to the cell of origin, enabling shared and cell-specific parameter estimation within a single model."
fig, ax = plt.subplots(figsize=(10, 4))
# Draw stacked blocks
J = 4
M_per_cell = [8, 12, 6, 10]
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
cell_labels = [f'Cell {j+1}\n($M_{j+1}={M_per_cell[j]}$)' for j in range(J)]
total = sum(M_per_cell)
left = 0
for j in range(J):
width = M_per_cell[j] / total
rect = plt.Rectangle((left, 0.3), width, 0.3, facecolor=colors[j], edgecolor='black', linewidth=1.5, alpha=0.7)
ax.add_patch(rect)
ax.text(left + width / 2, 0.45, cell_labels[j], ha='center', va='center', fontsize=9, fontweight='bold')
left += width
# Label the full array
ax.annotate('', xy=(0, 0.25), xytext=(1, 0.25),
arrowprops=dict(arrowstyle='<->', color='black', lw=1.5))
ax.text(0.5, 0.18, f'$M_{{\\mathrm{{total}}}} = {total}$ stacked observations', ha='center', fontsize=11)
# cell[m] vector
ax.text(0.5, 0.72, 'cell[$m$] = [1, 1, ..., 1, 2, 2, ..., 2, 3, 3, ..., 3, 4, 4, ..., 4]',
ha='center', fontsize=10, family='monospace',
bbox=dict(boxstyle='round,pad=0.3', facecolor='lightyellow', edgecolor='gray'))
ax.annotate('', xy=(0.5, 0.65), xytext=(0.5, 0.63),
arrowprops=dict(arrowstyle='->', color='gray', lw=1))
# Arrows from cell vector to blocks
for j in range(J):
x_center = sum(M_per_cell[:j]) / total + M_per_cell[j] / (2 * total)
ax.annotate('', xy=(x_center, 0.6), xytext=(x_center, 0.63),
arrowprops=dict(arrowstyle='->', color=colors[j], lw=1.5))
ax.set_xlim(-0.05, 1.05)
ax.set_ylim(0.05, 0.85)
ax.axis('off')
ax.set_title('Stacked Data Structure for Hierarchical Model', fontsize=13, pad=15)
plt.tight_layout()
plt.show()
```
## Relationship to the Base Model {#sec-reduction}
The hierarchical model `h_m01` nests the base model `m_01` as a special case, and interpolates between complete pooling and independent estimation.
::: {.callout-note appearance="minimal"}
### Proposition (Reduction to m_01)
If $J = 1$ and $P = 0$, then `h_m01` reduces to `m_01` with $\alpha \sim \text{LogNormal}(\gamma_0, \sigma_{\text{cell}})$.
:::
::: {.callout-tip collapse="true"}
### Proof
With $J = 1$, the design matrix $\mathbf{X}$ is empty (no columns), so:
$$
\log(\alpha_1) = \gamma_0 + \sigma_{\text{cell}} \cdot z_1, \quad z_1 \sim \mathcal{N}(0, 1)
$$
This is equivalent to $\log(\alpha_1) \sim \mathcal{N}(\gamma_0, \sigma_{\text{cell}})$, i.e., $\alpha_1 \sim \text{LogNormal}(\gamma_0, \sigma_{\text{cell}})$.
Since there is only one cell and one $\boldsymbol{\beta}_1$, the model reduces to a single-cell softmax choice model — exactly `m_01`.
:::
::: {.callout-note appearance="minimal"}
### Proposition (Independent cells)
As $\sigma_{\text{cell}} \to \infty$, the cell-level deviations dominate the regression, and estimating per-cell $\alpha_j$ approaches independent estimation.
:::
The regression hierarchy thus interpolates between **complete pooling** ($\sigma_{\text{cell}} = 0$, all cells have the same $\alpha$ determined by the regression) and **no pooling** ($\sigma_{\text{cell}} \to \infty$, each cell's $\alpha$ is essentially unconstrained). The data determine where along this continuum the model settles — this is the essence of partial pooling [@gelman2007].
## Properties of the Hierarchical Extension
The three fundamental properties from [Report 1](01_abstract_formulation.qmd) — monotonicity, perfect optimization limit, and random choice limit — continue to hold **within each cell**. Conditional on $\alpha_j$, the choice model for cell $j$ is exactly the softmax model from Report 1. The proofs are identical; no new derivations are needed.
The new structural contribution is the **cross-cell** relationship: the regression structure enables testing hypotheses about what drives sensitivity differences across experimental conditions.
::: {.callout-note}
## Example Testable Hypotheses
1. **LLM main effect**: $\gamma_{\text{LLM}} \neq 0$ — different LLMs have systematically different sensitivity
2. **Prompt main effect**: $\gamma_{\text{prompt}} \neq 0$ — prompt framing shifts sensitivity
3. **Interaction**: A coefficient on the product term — the prompt effect differs by LLM
4. **Heterogeneity**: $\sigma_{\text{cell}} > 0$ — there is residual cell-level variation beyond what the design matrix explains
:::
## Design Matrix Construction {#sec-design-matrix}
The design matrix $\mathbf{X} \in \mathbb{R}^{J \times P}$ encodes the experimental structure. Common coding schemes include:
- **Treatment coding** (dummy/indicator coding): One reference cell; each predictor is an indicator for a treatment level. $\gamma_p$ measures the difference from the reference on the log-$\alpha$ scale. This is the default in `h_m01`.
- **Effect coding**: Coefficients represent deviations from the grand mean rather than from a reference level.
- **Interaction terms**: Products of main-effect columns capture non-additive effects on log-sensitivity.
The following figure illustrates a treatment-coded design matrix for a 6-model × 3-prompt factorial study ($J = 18$ cells, $P = 7$ predictors: 5 model dummies + 2 prompt dummies), with GPT-4o under the neutral prompt as the reference cell.
```{python}
#| label: fig-design-matrix
#| fig-cap: "Treatment-coded design matrix for a 6-model × 3-prompt factorial study (J=18 cells, P=7 predictors). GPT-4o under the neutral prompt is the reference cell (all zeros). Each row corresponds to one experimental cell."
models = ['GPT-4o', 'GPT-4o-mini', 'Claude 3.5S', 'Claude 3.5H', 'Gemini 1.5P', 'Gemini 1.5F']
prompts = ['neutral', 'chain-of-thought', 'role-play']
# Build design matrix: 5 model dummies + 2 prompt dummies
J = len(models) * len(prompts)
P = (len(models) - 1) + (len(prompts) - 1) # 5 + 2 = 7
X = np.zeros((J, P))
cell_labels = []
row = 0
for i, model in enumerate(models):
for k, prompt in enumerate(prompts):
# Model dummies (GPT-4o is reference = 0)
if i > 0:
X[row, i - 1] = 1
# Prompt dummies (neutral is reference = 0)
if k > 0:
X[row, len(models) - 1 + k - 1] = 1
cell_labels.append(f'{model} × {prompt}')
row += 1
fig, ax = plt.subplots(figsize=(10, 8))
im = ax.imshow(X, cmap='Blues', aspect='auto', vmin=-0.3, vmax=1.3)
# Column labels
col_labels = [f'$\\gamma_{{{i+1}}}$\n{models[i+1]}' for i in range(len(models)-1)]
col_labels += [f'$\\gamma_{{{len(models)+k}}}$\n{prompts[k+1]}' for k in range(len(prompts)-1)]
ax.set_xticks(range(P))
ax.set_xticklabels(col_labels, fontsize=8, ha='center')
ax.set_yticks(range(J))
ax.set_yticklabels(cell_labels, fontsize=8)
# Add text values
for i in range(J):
for j_col in range(P):
val = int(X[i, j_col])
color = 'white' if val == 1 else 'gray'
ax.text(j_col, i, str(val), ha='center', va='center', fontsize=9, color=color, fontweight='bold')
ax.set_title('Treatment-Coded Design Matrix (6 × 3 Factorial)', fontsize=12)
ax.set_xlabel('Predictors ($\\gamma_0$ intercept not shown)', fontsize=11)
# Highlight reference row
ax.axhspan(-0.5, 0.5, color='gold', alpha=0.15)
ax.text(P + 0.3, 0, '← reference', va='center', fontsize=9, color='goldenrod', fontstyle='italic',
clip_on=False)
plt.tight_layout()
plt.show()
```
## Summary
This report extended the single-agent SEU sensitivity framework to multi-cell experimental designs. The key contributions are:
1. **Regression on log-sensitivity.** A hierarchical structure ($@eq-regression$) formally links cell-level $\alpha_j$ to experimental factors via a design matrix, producing interpretable regression coefficients.
2. **Shared-vs-varying parameter choices.** Sensitivity and beliefs vary by cell; utilities and alternatives are shared — reflecting the structure of factorial experiments where the task is constant but the decision-maker (LLM × prompt) varies.
3. **Nesting of the base model.** The hierarchical framework reduces to `m_01` when $J = 1$ and $P = 0$ (@sec-reduction), and interpolates between complete pooling and independent estimation via $\sigma_{\text{cell}}$.
4. **Preservation of fundamental properties.** The three properties of softmax choice (monotonicity, perfect optimization limit, random choice limit) hold within each cell, unchanged.
5. **Design matrix construction.** Treatment coding provides directly interpretable coefficients for factorial experiments (@sec-design-matrix).
[Report 9](09_hierarchical_implementation.qmd) presents the concrete implementation of this formulation in Stan, detailing the data structure, parameterization, and generated quantities.
::: {.callout-note}
## Connection to Earlier Reports
- [Report 1](01_abstract_formulation.qmd): The three properties of softmax choice continue to hold within each cell
- [Report 2](02_concrete_implementation.qmd): The concrete implementation choices (softmax beliefs, incremental utilities) carry forward in h_m01
- [Reports 5–7](05_adding_risky_choices.qmd): The risky-choice extensions (m_1, m_2, m_3) could in principle also be hierarchicalized; h_m01 extends m_01 specifically
- [Factorial Synthesis](../applications/factorial_synthesis/01_factorial_synthesis.qmd): Retrospective motivation — demonstrated limitations of independent per-cell fits
:::