Tagged "interpretability"

Notes on "A Mathematical Framework for Transformer Circuits"

Close-reading a classic interpretability paper and trying to make sense of it