Tagged "notes"

Notes on "A Mathematical Framework for Transformer Circuits"

Close-reading a classic interpretability paper and trying to make sense of it


10 Autoencoders in a Trenchcoat, part 1

Notes on the core sections of Anthropic's Toy Models of Supervision.


10 Autoencoders in a Trenchcoat, part 1

Notes on the core sections of Anthropic's Toy Models of Supervision.


Understanding the Parameter Decomposition papers

Understanding attribution-based and stochastic parameter decomposition methods


Understanding the Parameter Decomposition papers

Understanding attribution-based and stochastic parameter decomposition methods


What's different about a Matryoshka SAE?

Brief notes from the Matryoshka SAEs paper.


What's different about a Matryoshka SAE?

Brief notes from the Matryoshka SAEs paper.