I replicated David Ng's RYS method (
https://dnhkng.github.io/posts/rys/) on consumer AMD GPUs
(RX 7900 XT + RX 6950 XT) and found something I didn't expect.
Transformers appear to have discrete "reasoning circuits" — contiguous blocks of 3-4 layers that
act as indivisible cognitive units. Duplicate the right block and the model runs its reasoning
pipeline twice. No weights change. No training. The model just thinks longer.
The results on standard benchmarks (lm-evaluation-harness, n=50):
Devstral-24B, layers 12-14 duplicated once:
- BBH Logical Deduction: 0.22 → 0.76
- GSM8K (strict): 0.48 → 0.64
- MBPP (code gen): 0.72 → 0.78
- Nothing degraded
Qwen2.5-Coder-32B, layers 7-9 duplicated once:
- Reasoning probe: 76% → 94%
The weird part: different duplication patterns create different cognitive "modes" from the same
weights. Double-pass boosts math. Triple-pass boosts emotional reasoning. Interleaved doubling
(13,13,14,14,15,15,16) creates a pure math specialist. Same model, same VRAM, different routing.
The circuit boundaries are sharp — shift by one layer and the effect disappears or inverts.
Smaller models (24B) have tighter circuits (3 layers) than larger ones (Ng found 7 layers in 72B).
Tools to find circuits in any GGUF model and apply arbitrary layer routing are in the repo.
The whole thing — sweep, discovery, validation — took one evening.
Happy to answer questions.
Considering this, I think (again, assuming the benchmarks themselves are sound) the most plausible explanation for the observations is (1) the layers being duplicated are close to the identity function on most inputs; (2) something happened to the model in training (RLHF?) that forcefully degraded its reasoning performance; (3) the mechanism causing the degradation involves the duplicated layers, so their duplication has the effect of breaking the reasoning-degrading mechanism (e.g. by clobbering a "refusal" "circuit" that emerged in post-training).
More concisely, I'm positing that this is an approach that can only ever break things, and rather than boosting reasoning, it is selectively breaking things deleterious to reasoning.