Research Interests

Hi! I'm Logan. This is a document describing my current research interests, in case it increases the likelihood of serendipitous coincidences — someone seeing this site, saying "oh, I'm working on that!" or "oh, I want to do a project like that!" and reaching out. As described before, a website is a long and complex search query (etc).

My main interest right now is mechanistic interpretability. I think mechanistic interpretability is both technically fascinating and philosophically, or something like "for the purposes of general world-modeling," interesting.

Some things I've spent reasonable amounts of time thinking about:

  • SAEs.
  • Parameter decomposition, especially in relation to similar-looking approaches (e.g. cross-layer transcoders)
  • As of very recently, I'm working on a project related to interpreting model diffs.
  • I'd also be interested in projects related to reasoning/chain of thought faithfulness.

If you'd be interested in collaborating on research in these domains, reach out! (me at logan graves dot com, lgngrvs on Discord, lll.55 on Signal)


Here are other non-mech interp research topics I'd totally still be interested in:

  1. Mathematical models of agent coordination
    1. Abstract models (e.g. game theory)
    2. In the wild models (e.g. mechanism design)
    3. Qualitative models (e.g. history)
  2. Computational neuroscience, particularly computational neuroscience projects that draw from effective methodologies from AI (e.g. mechanistic interpretability)
    1. Neuron modeling
    2. Modeling neural circuitry (this one especially)
  3. Intellectual history
    1. 20th century intellectual history; how ideology shaped response to crises
    2. Longer-term undercurrents in human thought, e.g. comparative classics research
  4. International relations/International AI policy
    1. The ties between technology and great-power competition
    2. Nuclear proliferation and other related dynamics
    3. Clever treaty enforcement mechanisms, especially technological ones (e.g. methods for monitoring GPUs for treaty enforcement)
  5. Theology in/and mathematics
  6. Cyborgism-y things, e.g. what Neuralink is doing, as well as Chalmers and Clark's "Extended mind"