On a quest to understand intelligence and ensure that advanced AGI is safe and beneficial.
Hi! I’m a Research Scientist at the AI Security Institute (AISI), a directorate of the UK Department for Science, Innovation, and Technology (DSIT). My research focuses on frontier alignment, security, interpretability, and reinforcement learning.
Previously, as an independent researcher, I worked on RL for efficient multi-turn exploration at the Center for Human-Compatible AI (CHAI) at UC Berkeley. I was a scholar at the ML Alignment & Theory Scholars (MATS) program with Adrià Garriga-Alonso (working on frontier deception), with Nandi Schoots (on feature geometry and modularity), and I did Neel Nanda’s MATS training program on mechanistic interpretability.
Before deciding to focus full-time on AI safety, I worked at Microsoft Research on language models. Prior to that, I was an Associate Research Scientist at Wadhwani AI working on AI for Social Good and Healthcare.
Drop me an email at zsatvik@gmail.com to discuss research and collaboration!
Research
I study intelligence (via its emergence and expression in neural networks) to ensure that advanced AGI is safe, beneficial, and useful. This involves working on alignment, security, interpretability, and reinforcement learning for frontier AI systems and agents. Here is some of my recent work:

Auditing Games for Sandbagging
Jordan Taylor, Sid Black, Dillon Bowen, Thomas Read, Satvik Golechha, Alex Z-M., Oliver M., Connor K., Kola A., Jacob M., Sam Marks, Chris Cundy, Joseph Bloom
2025, UK AISI (in collaboration with FAR AI)

Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha, Adrià Garriga-Alonso
NeurIPS 2025 (Spotlight) (MATS)

A is for Absorption: Studying Feature Splitting and Absorption in SAEs
David Chanin, James W.S., Tomáš D., Hardik B., Satvik Golechha, Joseph Bloom
NeurIPS 2025 (Oral) (MATS)

ABBEL: Acting through Belief Bottlenecks Expressed in Language
Aly Lidayan, Jakob Bjorner, Satvik Golechha, Kartik Goyal, Alane Suhr
NeurIPS 2025 (Spotlight, LAW workshop) (CHAI, UC Berkeley)

Auditing Language Models for Hidden Objectives
Samuel Marks, Johannes Treutlein, . . ., Satvik Golechha, . . ., Evan Hubinger
2025, Anthropic (external collaboration)

Who’s the Evil Twin? Differential Auditing for Undesired Behavior
Ishwar B. , Hasith V. , Greta K., Ronan A. , Satvik Golechha
Mentored at SPAR 2025. Under review.

Intricacies of Feature Geometry in Large Language Models
Satvik Golechha, Lucius Bushnaq, Euan Ong, Neeraj Kayal, Nandi Schoots
ICLR 2025 (poster) (best blog award)

Studying Cross-cluster Modularity in Neural Networks
Satvik Golechha, Maheep C., Joan V., Alessandro Abate, Nandi Schoots
NeurIPS 2024: Workshop on Science of Deep Learning

Some Lessons from the OpenAI-FrontierMath Debacle
Satvik Golechha
Some investigative journalism that became pretty popular :)

Progress Measures for Grokking on Real-world Tasks
Satvik Golechha
ICML 2024:Workshop on High-Dim. Learning Dynamics (independent)

Challenges in Mechanistically Interpreting Harmful Representations
Satvik Golechha, James Dao
ICML 2024: Workshop on Mechanistic Interpretability (independent)

NICE: To Optimize In-Context Examples or Not?
Pragya Srivastava*, Satvik Golechha*, Amit Deshpande, Amit Sharma
ACL 2024 (main, poster) (work done at Microsoft Research)

BYoEB: An LLM-Powered Expert-in-the-Loop Chat System
Pragnya R.*, Bhuvan S.*, Satvik Golechha*, Mohit Jain, and others
UbiComp 2025 (work done at Microsoft Research)

Predicting Treatment Adherence of Tuberculosis Patients at Scale
Mihir Kulkarni*, Satvik Golechha*, Rishi R.*, Jithin S.*, Alpan Raval
NeurIPS 2022 (work done at Wadhwani AI)
Poetry
Writing metaphorical poetry allows a channel into emotions that could not have been expressed another way. Check out my poetry page!
Almost done with my first poetry book, Anuswaad!

Fiction
A beautiful thing happens when fiction is written. A good story reflects back to us aspects of ourselves that we’re not aware of. Really, it is the story that’s writing us.

Algebra to Zombies
A 29-week curriculum that covers foundational math required to do AI research. This accompanies a study group I used to run at Microsoft Research in India.

Research Blog
Some notes around AI research. For my research, please see my research statement and Scholar profile.
PS: For a more general (and hopefully fun) introduction to the less-taught parts of AI check out Alice!

Other Stuff
Intelligence: I write about intelligence and a number of interesting ideas in my fiction and research. I plan to bundle it into a blog series someday.
School: I’m writing a book (or a series of posts) on my version of an ideal school — I believe good schooling is highly impactful, undervalued, and achievable.
Like Winds & Dystop.ai: Slowly working on finishing these novels but aah so little time!
Infinite Jest: Reading this epic book; will take more than a couple months.
Exploring London: I’ve moved to London for the first time, HMU!





