Posts

Capability scoping through cheap post hoc robust unlearning may be impossible. Distillation offers a practical path forward because fresh initialization breaks the transmission of latent structure.
December, 2025
Why do neural networks appear chaotic at the neuron level yet produce high-level representations that can be probed linearly? Strange attractors might give us good intuition to understand the paradox of neural network behavior.
August, 2025
Why become a scientist? What makes science attractive?
November, 2024
Can we do activation steering with less side effect? Can we program, instead of optimize, model behavior? This post introduces a technique to identify and manipulate specific activation patterns to steer model outputs.
September, 2024