On a quest to understand intelligence and ensure that advanced AGI is safe and beneficial.

Satvik Golechha

Hi! I’m a Research Scientist at the AI Security Institute (AISI), a directorate of the UK Department for Science, Innovation, and Technology (DSIT). My research focuses on frontier alignment, security, interpretability, and reinforcement learning.

Previously, as an independent researcher, I worked on RL for efficient multi-turn exploration at the Center for Human-Compatible AI (CHAI) at UC Berkeley. I was a scholar at the ML Alignment & Theory Scholars (MATS) program with Adrià Garriga-Alonso (working on frontier deception), with Nandi Schoots (on feature geometry and modularity), and I did Neel Nanda’s MATS training program on mechanistic interpretability.

Before deciding to focus full-time on AI safety, I worked at Microsoft Research on language models. Prior to that, I was an Associate Research Scientist at Wadhwani AI working on AI for Social Good and Healthcare.

Writing fiction and poetry along the way!

Drop me an email at zsatvik@gmail.com to discuss research and collaboration!

Research

I study intelligence (via its emergence and expression in neural networks) to ensure that advanced AGI is safe, beneficial, and useful. This involves working on alignment, security, interpretability, and reinforcement learning for frontier AI systems and agents. Here is some of my recent work:

Jordan Taylor, Sid Black, Dillon Bowen, Thomas Read, Satvik Golechha, Alex Z-M., Oliver M., Connor K., Kola A., Jacob M., Sam Marks, Chris Cundy, Joseph Bloom

Satvik Golechha, Adrià Garriga-Alonso

David Chanin, James W.S., Tomáš D., Hardik B., Satvik Golechha, Joseph Bloom

Aly Lidayan, Jakob Bjorner, Satvik Golechha, Kartik Goyal, Alane Suhr

Samuel Marks, Johannes Treutlein, . . ., Satvik Golechha, . . ., Evan Hubinger

Ishwar B. , Hasith V. , Greta K., Ronan A. , Satvik Golechha

Satvik Golechha, Lucius Bushnaq, Euan Ong, Neeraj Kayal, Nandi Schoots

Satvik Golechha, Maheep C., Joan V., Alessandro Abate, Nandi Schoots

Satvik Golechha

Satvik Golechha

Satvik Golechha, James Dao

Pragya Srivastava*, Satvik Golechha*, Amit Deshpande, Amit Sharma

Pragnya R.*, Bhuvan S.*, Satvik Golechha*, Mohit Jain, and others

Mihir Kulkarni*, Satvik Golechha*, Rishi R.*, Jithin S.*, Alpan Raval

Poetry

Writing metaphorical poetry allows a channel into emotions that could not have been expressed another way. Check out my poetry page!

Almost done with my first poetry book, Anuswaad!

Fiction

A beautiful thing happens when fiction is written. A good story reflects back to us aspects of ourselves that we’re not aware of. Really, it is the story that’s writing us.

Algebra to Zombies

A 29-week curriculum that covers foundational math required to do AI research. This accompanies a study group I used to run at Microsoft Research in India.

Research Blog

Some notes around AI research. For my research, please see my research statement and Scholar profile.

PS: For a more general (and hopefully fun) introduction to the less-taught parts of AI check out Alice!

Other Stuff

Intelligence: I write about intelligence and a number of interesting ideas in my fiction and research. I plan to bundle it into a blog series someday.

School: I’m writing a book (or a series of posts) on my version of an ideal school — I believe good schooling is highly impactful, undervalued, and achievable.

Like Winds & Dystop.ai: Slowly working on finishing these novels but aah so little time!

Infinite Jest: Reading this epic book; will take more than a couple months.

Exploring London: I’ve moved to London for the first time, HMU!

All life is bound together by mutual support and interdependence.

Acharya UmaswaTi