Preprints

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Mickel Liu*, Liwei Jiang*, Yancheng Liang, Simon Shaolei Du, Yejin Choi, Tim Althoff*, Natasha Jaques*
AI Agents: Capabilities and Safety (AIA) Workshop @ Conference on Language Modeling (COLM) 2025 (Outstanding Paper Award, Oral Presentation)
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Zeng, Z., Ivison, H., Wang, Y., Yuan, L., Li, S., Ye, Z., Li, S., He, J., Zhou, R., Chen, T., Zhao, C., Tsvetkov, Y., Du, S., Natasha Jaques, Peng, H., Koh, P., & Hajishirzi, H.
Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL
Marwa Abdulhai, Ricky Cheng, Apoorv Shrivastava, Natasha Jaques, Yarin Gal, Sergey Levine
AgenticRed: Optimizing Agentic Systems for Automated Red-teaming
Jiayi Yuan, Jonas Nöther, Natasha Jaques, Goran Radanović

2026

Learning to Summarize User Information for Personalized Reinforcement Learning from Human Feedback
Haeone Nam, Yanming Wan, Mickel Liu, Jiachen Lian, Paul Ahnn, Natasha Jaques
International Conference on Learning Representations (ICLR) 2026
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
Yusong Wu, Sara Brade, Tua Ma, Tucker Fowler, Ed Yang, Berkay Banar, Aaron Courville, Natasha Jaques*, Anna Huang*
International Conference on Learning Representations (ICLR) 2026
AutoCode: LLMs as Problem Setters for Competitive Programming
Shangxi Zhou, Zheng Zheng, Kaixin Liu, Zhanke Shen, Zekun Cheng, Zekun Chen, Hao He, Jiali Yao, Haoyuan Mao, Qinyuan Mang, Tao Fu, Binhang Li, Dongding Li, Wei Chai, Zhiliang Liu, Aleksandra Korolova, Peter Henderson, Natasha Jaques, Pramod Viswanath, Saining Xie, Jingbo Shang
International Conference on Learning Representations (ICLR) 2026
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
Paresh Chaudhary, Yancheng Liang, Daphne Chen, Simon S. Du, Natasha Jaques
International Conference on Learning Representations (ICLR) 2026
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
Bo Liu*, Leon Guertler*, Simon Yu*, Zichen Liu*, Penghui Qi, Daniel Balcells, Mickel Liu, Cheston Tan, Weiyan Shi, Min Lin, Wee Sun Lee, Natasha Jaques
International Conference on Learning Representations (ICLR) 2026
Modeling Others' Minds as Code
Kunal Jha, Aydan Yuenan Huang, Eric Ye, Natasha Jaques*, Max Kleiman-Weiner*
International Conference on Learning Representations (ICLR) 2026 (Best Paper Award, Oral Presentation) @ NeurIPS 2025 LAW Workshop

2025

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
Yanming Wan*, Jiaxing Wu*, Marwa Abdulhai, Lior Shani, Natasha Jaques
Neural Information Processing Systems (NeurIPS) 2025
Madrona Prize @ Paul G. Allen School of Computer Science & Engineering
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
Marwa Abdulhai, Ricky Cheng, Danny Clay, Tim Althoff, Sergey Levine, Natasha Jaques
Neural Information Processing Systems (NeurIPS) 2025
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia
Chandler Smith, Marwa Abdulhai, et al., Natasha Jaques, José Hernández-Orallo, Joel Leibo
Neural Information Processing Systems (NeurIPS) 2025
Achieving Human Level Competitive Robot Table Tennis
David D’Ambrosio, Saminda Wishwajith Abeyruwan, Laura Graesser, Atil Iscen, Heni Ben Amor, Alex Bewley, Barney J. Reed, Krista Reymann, Leila Takayama, Yuval Tassa, Krysztof Choromanski, Erwin Coumans, Deepali Jain, Navdeep Jaitly, Natasha Jaques, Satoshi Kataoka, Yuheng Kuang, Nevena Lazic, Reza, Mahjourian, Sherry Moore, Kenneth Oslund, Anish Shankar, Vikas Sindhwani, Vincent Vanhoucke, Grace Vesom, Peng Xu, Pannag Sanketi
IEEE International Conference on Robotics and Automation (ICRA) 2025 (Best Paper Finalist)
Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination
Kunal Jha, Wilka Carvalho, Yancheng Liang, Simon S. Du, Max Kleiman-Weiner*, Natasha Jaques*
International Conference on Machine Learning (ICML) 2025 (Oral Paper-Top 1%) and CogSci 2025
Multi Agent Reinforcement Learning for Sequential Satellite Assignment Problems
Josh Holder, Natasha Jaques, Mehran Mesbahi
AAAI Conference on Artificial Intelligence (AAAI) 2025 (Oral Paper - Top 5%)
Infer Human’s Intentions Before Following Natural Language Instructions
Yanming Wan, Yue Wu, Yiping Wang, Jiayuan Mao*, Natasha Jaques*
AAAI Conference on Artificial Intelligence (AAAI) 2025
ReaLJam: Real-Time, Synchronous Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers
Alexander Scarlatos, Yusong Wu, Ian Simon, Adam Roberts, Tim Cooijmans, Natasha Jaques, Cassie Tarakajian, Anna Huang
Extended Abstracts of The ACM Conference on Human Factors in Computing Systems (CHI) 2025
An Efficient Open World Benchmark for Multi-Agent Reinforcement Learning
Eric Ye, Ruohan Tao, Natasha Jaques
NeurIPS Open World Agents Workshop 2025
Generative Modeling for Robust Deep Reinforcement Learning on the Traveling Salesman Problem
Michael Li, Eunice Bae, Christian Haberland, Natasha Jaques*
NeurIPS MATH.AI Workshop 2025
InvestESG: A multi-agent reinforcement learning benchmark for studying climate investment as a social dilemma
Xiaoxuan Hou, Jiayi Yuan, Joel Z. Leibo, Natasha Jaques
International Conference on Learning Representations (ICLR) 2025

2024

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
Sriyash Poddar, Yanming Wan, Hamish Ivison, Abhishek Gupta*, Natasha Jaques*
Neural Information Processing Systems (NeurIPS) 2024 (Spotlight-Top 2%)
Learning to Cooperate with Humans Using Generative Agents
Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon Du, Natasha Jaques
Neural Information Processing Systems (NeurIPS) 2024
Impossibility theorems for feature attribution
Blair Bilodeau, Natasha Jaques, Pang-Wei Koh, Been Kim
Proceedings of the National Academy of Sciences (PNAS) 2024
The Concordia Contest: Advancing the Cooperative Intelligence of Language Agents
Chandler Smith, Rishabh Trivedi, Julian Clifton, Lewis Hammond, Akbir Khan, Alexander Sasha Vezhnevets, John P. Agapiou, Edgar A. Duéñez-Guzmán, Joel Matyas, Danny Karmon, Marwa Abdulhai, Dylan Hadfield-Menell, Natasha Jaques, Joel Leibo, Oliver Slumbers, Tim Baarslag, Michael Chang
Neural Information Processing Systems (NeurIPS) Competition Track 2024
Moral Foundations of Large Language Models
Marwa Abdulhai, Clément Crepy, Daria Valter, John Canny, Sergey Levine, Natasha Jaques
Empirical Methods in Natural Language Processing (EMNLP) 2024 (Best Paper, AAAI Workshop on Representation Learning for Responsible Human-Centric AI)
Adaptive Accompaniment with ReaLchords
Yusong Wu, Tim Cooijmans, Kyle Kastner, Adam Roberts, Ian Simon, Alexander Scarlatos, Chris Donahue, Cassie Tarakajian, Shayegan Omidshafiei, Aaron Courville, Pablo Samuel Castro, Natasha Jaques, Cheng-Zhi Anna Huang
International Conference on Machine Learning (ICML), 2024