PRISM

Policy Reuse via Interpretable Strategy Mapping

A reinforcement learning framework for zero-shot strategy transfer between agents. Train a PPO agent, transfer its strategy to a DQN agent — no retraining required.

View on GitHub Read the Devlog

PRISM architecture diagram showing zero-shot strategy transfer between PPO and DQN agents

76.4%

Best Transfer

BC → DQN, zero-shot

8×

Fine-Tune Speedup

steps to 60% win rate

69.4%

Causal Intervention

concepts drive behavior

0.587

Concept Stability

NMI across seeds

The Problem

RL agents spend millions of steps learning strategy, but that knowledge is locked inside continuous neural network weights. You can't read it, extract it, or move it to a different agent.

Training a second agent on the same task means starting from scratch. PRISM fixes that by forcing agents to reason through a shared layer of discrete, interpretable concepts — and using those concepts as a transfer interface.

How It Works

1
Baseline training. Train an RL agent (PPO, DQN, or DAgger) with a CNN encoder that outputs 128D feature vectors.
2
Concept discovery. Freeze the encoder, run K-means on gameplay episodes. Each board position maps to one of 64 discrete concept IDs.
3
Bottleneck policy. Train a micro-policy on concept IDs only — no raw observations, just integers. Transfer by aligning concept spaces via Hungarian matching.

Zero-Shot Transfer Results

Six source→target pairs among three independently trained agents, evaluated against GnuGo on Go 7×7, 10 seeds × 100 games. Random agent baseline: 3.5%. No-alignment baseline: 9.2%.

Source	Target	Win Rate	Notes
BC	DQN	76.4% ± 3.4%	Expert-trained source transfers well
PPO	DQN	69.5% ± 3.2%	Strong RL source, functional target
DQN	PPO	49.8% ± 2.6%	Undertrained source, not above 50%
DAgger	PPO	41.5% ± 6.0%	Below 50% — imitation source fails
DQN	BC	38.7% ± 4.9%	Degenerate target encoder
PPO	BC	0.0%	BC encoder collapses — always passes

Tech Stack

PythonPyTorchStable-Baselines3scikit-learnscipyGnuGo

Scope

Go 7×7 — primary domain, CNN encoder, K=64 concepts, action masking
Atari Breakout — boundary condition: bottleneck collapses to random floor

PRISM requires domains where strategic state is naturally discrete. Continuous-dynamics environments like Breakout fall outside scope — confirmed empirically.

Key Findings

Source quality is what matters

Alignment quality (centroid similarity after matching) predicts nothing — R² ≈ 0 across all transfer pairs. Whether the source policy is strong is the operative variable. Spend compute on training the source, not on alignment method.

Concepts are causally real

Overriding a state's concept assignment changes the chosen action 69.4% of the time (p = 8.6 × 10⁻⁸⁶, 2500 interventions). The concepts drive behavior — they're not just correlated with it.

Frequency ≠ importance

C47 (most-used at 33% of states) causes only a 9.4% win-rate drop when ablated. C16 (15.4% frequency) collapses win rate from 100% to 51.8%. Concept usage does not predict strategic importance.

K tradeoff

Transfer win rate peaks at K=32 (76% in the ablation sweep). Direct performance favors K=64 at full training. Lower K generalizes better for transfer; higher K captures finer-grained structure for direct play.

Read the Paper

13 pages covering the three-stage pipeline, causal intervention, concept ablation, alignment method comparison, fine-tuning curves, K sensitivity sweep, and the Atari Breakout boundary condition.

GitHub Repository