Wenrui Xu

ML interpretability researcher, ex-astrophysicist
xuwenrui26 [at] gmail.com


I like studying complex systems. There is something deeply satisfying about discovering new dynamics emergent from complexity and distilling complex systems into simple models. I also believe that our ability to conceptualize, predict, and control complex systems lies at the heart of solving some of the most pressing scientific and societal challenges of our time.

Over the years, my interest in complex systems has led me into two areas of research: astrophysics and machine learning interpretability. I spent the first decade of my career in astrophysics, focusing mainly on planet formation and astrophysical dynamics. You can find a list of my astrophysics papers here. While I have left academia, I continue doing some astro research as a hobby.

Currently, I work as a research scientist at Anthropic, where I focus on interpretability research. I try to build more interpretable large language models through understanding how knowledge is stored in models and used in their "thinking" process.