Hi, I'm Jack! I'm interested in understanding the cognition of modern language models, so that we can make them more reliable and aligned with human values. Currently, I lead the "Model Psych" team at Anthropic. We study the internal basis of higher-level cognitive phenomena in LLMs, like introspection, situational awareness, personas, and representations of emotion. We apply these techniques to audit Anthropic’s production models, for instance by monitoring their neural activity for signatures of deception, manipulation, or awareness of being evaluated. Previously, I did my PhD in the Center for Theoretical Neuroscience at Columbia University. For a list of my publications, see my Google Scholar profile.