10 Autoencoders in a Trenchcoat, part 1
Notes on the core sections of Anthropic's Toy Models of Supervision.
10 Autoencoders in a Trenchcoat, part 1
Notes on the core sections of Anthropic's Toy Models of Supervision.
Notes on "A Mathematical Framework for Transformer Circuits"
Close-reading a classic interpretability paper and trying to make sense of it
What's different about a Matryoshka SAE?
Brief notes from the Matryoshka SAEs paper.