Publications

Preprints

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Mickel Liu*, Liwei Jiang*, Yancheng Liang, Simon Shaolei Du, Yejin Choi, Tim Althoff*, Natasha Jaques*

AI Agents: Capabilities and Safety (AIA) Workshop @ Conference on Language Modeling (COLM) 2025 (Outstanding Paper Award, Oral Presentation)

Paper

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

Zeng, Z., Ivison, H., Wang, Y., Yuan, L., Li, S., Ye, Z., Li, S., He, J., Zhou, R., Chen, T., Zhao, C., Tsvetkov, Y., Du, S., Natasha Jaques, Peng, H., Koh, P., & Hajishirzi, H.

Paper

Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL

Marwa Abdulhai, Ricky Cheng, Apoorv Shrivastava, Natasha Jaques, Yarin Gal, Sergey Levine

Paper

AgenticRed: Optimizing Agentic Systems for Automated Red-teaming

Jiayi Yuan, Jonas Nöther, Natasha Jaques, Goran Radanović

Paper

2026

Learning to Summarize User Information for Personalized Reinforcement Learning from Human Feedback

Haeone Nam, Yanming Wan, Mickel Liu, Jiachen Lian, Paul Ahnn, Natasha Jaques

International Conference on Learning Representations (ICLR) 2026

Paper

Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction

Yusong Wu, Sara Brade, Tua Ma, Tucker Fowler, Ed Yang, Berkay Banar, Aaron Courville, Natasha Jaques*, Anna Huang*

International Conference on Learning Representations (ICLR) 2026

Paper

AutoCode: LLMs as Problem Setters for Competitive Programming

Shangxi Zhou, Zheng Zheng, Kaixin Liu, Zhanke Shen, Zekun Cheng, Zekun Chen, Hao He, Jiali Yao, Haoyuan Mao, Qinyuan Mang, Tao Fu, Binhang Li, Dongding Li, Wei Chai, Zhiliang Liu, Aleksandra Korolova, Peter Henderson, Natasha Jaques, Pramod Viswanath, Saining Xie, Jingbo Shang

International Conference on Learning Representations (ICLR) 2026

Paper

Improving Human-AI Coordination through Online Adversarial Training and Generative Models

Paresh Chaudhary, Yancheng Liang, Daphne Chen, Simon S. Du, Natasha Jaques

International Conference on Learning Representations (ICLR) 2026

Paper Website

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Bo Liu*, Leon Guertler*, Simon Yu*, Zichen Liu*, Penghui Qi, Daniel Balcells, Mickel Liu, Cheston Tan, Weiyan Shi, Min Lin, Wee Sun Lee, Natasha Jaques

International Conference on Learning Representations (ICLR) 2026

Paper

Modeling Others' Minds as Code

Kunal Jha, Aydan Yuenan Huang, Eric Ye, Natasha Jaques*, Max Kleiman-Weiner*

International Conference on Learning Representations (ICLR) 2026 (Best Paper Award, Oral Presentation) @ NeurIPS 2025 LAW Workshop

Paper

2025

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

Yanming Wan*, Jiaxing Wu*, Marwa Abdulhai, Lior Shani, Natasha Jaques

Neural Information Processing Systems (NeurIPS) 2025
Madrona Prize @ Paul G. Allen School of Computer Science & Engineering

Paper Website

Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning

Marwa Abdulhai, Ricky Cheng, Danny Clay, Tim Althoff, Sergey Levine, Natasha Jaques

Neural Information Processing Systems (NeurIPS) 2025

Paper

Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

Chandler Smith, Marwa Abdulhai, et al., Natasha Jaques, José Hernández-Orallo, Joel Leibo

Neural Information Processing Systems (NeurIPS) 2025

Paper

Achieving Human Level Competitive Robot Table Tennis

David D’Ambrosio, Saminda Wishwajith Abeyruwan, Laura Graesser, Atil Iscen, Heni Ben Amor, Alex Bewley, Barney J. Reed, Krista Reymann, Leila Takayama, Yuval Tassa, Krysztof Choromanski, Erwin Coumans, Deepali Jain, Navdeep Jaitly, Natasha Jaques, Satoshi Kataoka, Yuheng Kuang, Nevena Lazic, Reza, Mahjourian, Sherry Moore, Kenneth Oslund, Anish Shankar, Vikas Sindhwani, Vincent Vanhoucke, Grace Vesom, Peng Xu, Pannag Sanketi

IEEE International Conference on Robotics and Automation (ICRA) 2025 (Best Paper Finalist)

Paper

Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination

Kunal Jha, Wilka Carvalho, Yancheng Liang, Simon S. Du, Max Kleiman-Weiner*, Natasha Jaques*

International Conference on Machine Learning (ICML) 2025 (Oral Paper-Top 1%) and CogSci 2025

Paper Website

Multi Agent Reinforcement Learning for Sequential Satellite Assignment Problems

Josh Holder, Natasha Jaques, Mehran Mesbahi

AAAI Conference on Artificial Intelligence (AAAI) 2025 (Oral Paper - Top 5%)

Paper

Infer Human’s Intentions Before Following Natural Language Instructions

Yanming Wan, Yue Wu, Yiping Wang, Jiayuan Mao*, Natasha Jaques*

AAAI Conference on Artificial Intelligence (AAAI) 2025

Paper Website

ReaLJam: Real-Time, Synchronous Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers

Alexander Scarlatos, Yusong Wu, Ian Simon, Adam Roberts, Tim Cooijmans, Natasha Jaques, Cassie Tarakajian, Anna Huang

Extended Abstracts of The ACM Conference on Human Factors in Computing Systems (CHI) 2025

Paper

An Efficient Open World Benchmark for Multi-Agent Reinforcement Learning

Eric Ye, Ruohan Tao, Natasha Jaques

NeurIPS Open World Agents Workshop 2025

Paper

Generative Modeling for Robust Deep Reinforcement Learning on the Traveling Salesman Problem

Michael Li, Eunice Bae, Christian Haberland, Natasha Jaques*

NeurIPS MATH.AI Workshop 2025

Paper

InvestESG: A multi-agent reinforcement learning benchmark for studying climate investment as a social dilemma

Xiaoxuan Hou, Jiayi Yuan, Joel Z. Leibo, Natasha Jaques

International Conference on Learning Representations (ICLR) 2025

Paper Website

2024

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

Sriyash Poddar, Yanming Wan, Hamish Ivison, Abhishek Gupta*, Natasha Jaques*

Neural Information Processing Systems (NeurIPS) 2024 (Spotlight-Top 2%)

Paper Website

Learning to Cooperate with Humans Using Generative Agents

Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon Du, Natasha Jaques

Neural Information Processing Systems (NeurIPS) 2024

Paper Website

Impossibility theorems for feature attribution

Blair Bilodeau, Natasha Jaques, Pang-Wei Koh, Been Kim

Proceedings of the National Academy of Sciences (PNAS) 2024

Paper

The Concordia Contest: Advancing the Cooperative Intelligence of Language Agents

Chandler Smith, Rishabh Trivedi, Julian Clifton, Lewis Hammond, Akbir Khan, Alexander Sasha Vezhnevets, John P. Agapiou, Edgar A. Duéñez-Guzmán, Joel Matyas, Danny Karmon, Marwa Abdulhai, Dylan Hadfield-Menell, Natasha Jaques, Joel Leibo, Oliver Slumbers, Tim Baarslag, Michael Chang

Neural Information Processing Systems (NeurIPS) Competition Track 2024

Paper

Moral Foundations of Large Language Models

Marwa Abdulhai, Clément Crepy, Daria Valter, John Canny, Sergey Levine, Natasha Jaques

Empirical Methods in Natural Language Processing (EMNLP) 2024 (Best Paper, AAAI Workshop on Representation Learning for Responsible Human-Centric AI)

Paper

Adaptive Accompaniment with ReaLchords

Yusong Wu, Tim Cooijmans, Kyle Kastner, Adam Roberts, Ian Simon, Alexander Scarlatos, Chris Donahue, Cassie Tarakajian, Shayegan Omidshafiei, Aaron Courville, Pablo Samuel Castro, Natasha Jaques, Cheng-Zhi Anna Huang

International Conference on Machine Learning (ICML), 2024

Paper