Neural Networks, Strange Attractors, and Orderliness in Chaos
└─ Why do neural networks appear chaotic at the neuron level yet produce high-level representations that can be probed linearly? Strange attractors might give us good intuition to understand the paradox of neural network behavior.
August 2025
November 2024
Conditional Activation Steering: Mechanistically Programming a Language Model's Behavior
└─ Can we do activation steering with less side effect? Can we program, instead of optimize, model behavior? This post introduces a technique to identify and manipulate specific activation patterns to steer model outputs.
September 2024