└─
Why do neural networks appear chaotic at the neuron level yet produce high-level representations that can be probed linearly?
Strange attractors might give us good intuition to understand the paradox of neural network behavior.
└─
Can we do activation steering with less side effect?
Can we program, instead of optimize, model behavior?
This post introduces a technique to identify and manipulate specific activation patterns to steer model outputs.