Preprints
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
AI Agents: Capabilities and Safety (AIA) Workshop @ Conference on Language Modeling (COLM) 2025 (Outstanding Paper Award, Oral Presentation)
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL
AgenticRed: Optimizing Agentic Systems for Automated Red-teaming
2026
Learning to Summarize User Information for Personalized Reinforcement Learning from Human Feedback
International Conference on Learning Representations (ICLR) 2026
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
International Conference on Learning Representations (ICLR) 2026
AutoCode: LLMs as Problem Setters for Competitive Programming
International Conference on Learning Representations (ICLR) 2026
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
International Conference on Learning Representations (ICLR) 2026
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
International Conference on Learning Representations (ICLR) 2026
Modeling Others' Minds as Code
International Conference on Learning Representations (ICLR) 2026 (Best Paper Award, Oral Presentation) @ NeurIPS 2025 LAW Workshop
2025
Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
Neural Information Processing Systems (NeurIPS) 2025
Madrona Prize @ Paul G. Allen School of Computer Science & Engineering
Madrona Prize @ Paul G. Allen School of Computer Science & Engineering
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
Neural Information Processing Systems (NeurIPS) 2025
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia
Neural Information Processing Systems (NeurIPS) 2025
Achieving Human Level Competitive Robot Table Tennis
IEEE International Conference on Robotics and Automation (ICRA) 2025 (Best Paper Finalist)
Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination
International Conference on Machine Learning (ICML) 2025 (Oral Paper-Top 1%) and CogSci 2025
Multi Agent Reinforcement Learning for Sequential Satellite Assignment Problems
AAAI Conference on Artificial Intelligence (AAAI) 2025 (Oral Paper - Top 5%)
Infer Human’s Intentions Before Following Natural Language Instructions
AAAI Conference on Artificial Intelligence (AAAI) 2025
ReaLJam: Real-Time, Synchronous Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers
Extended Abstracts of The ACM Conference on Human Factors in Computing Systems (CHI) 2025
An Efficient Open World Benchmark for Multi-Agent Reinforcement Learning
NeurIPS Open World Agents Workshop 2025
Generative Modeling for Robust Deep Reinforcement Learning on the Traveling Salesman Problem
NeurIPS MATH.AI Workshop 2025
InvestESG: A multi-agent reinforcement learning benchmark for studying climate investment as a social dilemma
International Conference on Learning Representations (ICLR) 2025
2024
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
Neural Information Processing Systems (NeurIPS) 2024 (Spotlight-Top 2%)
Learning to Cooperate with Humans Using Generative Agents
Neural Information Processing Systems (NeurIPS) 2024
Impossibility theorems for feature attribution
Proceedings of the National Academy of Sciences (PNAS) 2024
The Concordia Contest: Advancing the Cooperative Intelligence of Language Agents
Neural Information Processing Systems (NeurIPS) Competition Track 2024
Moral Foundations of Large Language Models
Empirical Methods in Natural Language Processing (EMNLP) 2024 (Best Paper, AAAI Workshop on Representation Learning for Responsible Human-Centric AI)
Adaptive Accompaniment with ReaLchords
International Conference on Machine Learning (ICML), 2024