Xiaoxuan (Mandy) Wang

About Me

I am a fourth-year Ph.D. student in Computer Science at the University of California, Los Angeles (UCLA), advised by Prof. Wei Wang and collaborating closely with Prof. Yizhou Sun.

My research centers on agentic reinforcement learning and LLM agents, — building stable, scalable agents for real-world interactions. I am also interested in LLM reasoning, spanning planning, tool use, and scientific problem-solving.

I organize the UCLA Data Mining reading group. Before UCLA, I earned my B.S. in Computer Science from the University of Illinois Urbana-Champaign in May 2022.

News

Jun 2026Release new work HarnessBridge.
May 2026ARLArena is accepted to ICML 2026, and MatSciBench is accepted to KDD 2026.
Mar 2026Invited talk on ARLArena at NICE (Nexus for IntelligenCE).
Oct 2025EAST is accepted to the NeurIPS 2025 MathAI workshop.

Selected Projects

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning 2025 – Present

Project Lead

Leading a large-scale project on agentic LLMs: principled training and inference recipes across 8+ tasks; a unified, extensible pipeline supporting 16+ policy optimization algorithms with extensions to asynchronous agent training and MoE models; a multi-agent system for interactive agentic tasks; and extensive empirical analyses of training stability, scalability, and system efficiency across NVIDIA and AMD GPUs.
SciBench: Evaluating Scientific Problem-Solving in Large Language Models 2023 – 2024

Developed SciBench to evaluate college-level scientific reasoning in LLMs, with evaluation protocols and analyses of model capabilities and failure modes across scientific domains. Published at ICML 2024; covered in a Nature News Feature.

Publications

Agentic RL & LLM Agents

Preprint 2026 HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness Xiaoxuan Wang*, Haixin Wang*, Alex Taylor, Jason Cong, Yizhou Sun, Wei Wang [Paper] [Code] [Model]
ICML 2026 ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning Xiaoxuan Wang*, Han Zhang*, Haixin Wang*, Yidan Shi†, Ruoyan Li†, Kaiqiao Han†, Chengyi Tong, Haoran Deng, Alex Taylor, Yanqiao Zhu, Renliang Sun, Jason Cong, Yizhou Sun, Wei Wang [Paper] [Code] [Model]

LLM Reasoning, Post-Training & Evaluation

ICML 2024 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models Xiaoxuan Wang*, Ziniu Hu*, Pan Lu*, Yanqiao Zhu*, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, Wei Wang Media coverage: Nature News Feature [Paper] [Code] [Website]
NeurIPS MathAI 2025 EAST: Entropy-Based Adaptive Weighting for Self-Training Xiaoxuan Wang, Yihe Deng, Mingyu Derek Ma, Wei Wang [Paper] [Code]
Preprint 2025 From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs Xiaoxuan Wang, Bo Liu, Song Jiang, Jingzhou Liu, Jingyuan Qi, Xia Chen, Baosheng He [Paper]
KDD 2026 MatSciBench: Benchmarking the Reasoning Ability of Large Language Models in Materials Science Junkai Zhang*, Jingru Gan*, Xiaoxuan Wang, Zian Jia, Changquan Gu, Jianpeng Chen, Yanqiao Zhu, Mingyu Derek Ma, Dawei Zhou, Ling Li, Wei Wang [Paper] [Code] [Dataset]

Clinical & Domain Applications

AAAI 2025 Memorize and Rank: Evaluating LLMs for Clinical Diagnosis Prediction Mingyu Derek Ma, Xiaoxuan Wang, Yijia Xiao, Anthony Cuturrufo, Vijay S. Nori, Eran Halperin, Wei Wang Also at NeurIPS GenAI4Health Workshop 2024 [Paper]
Preprint 2024 CliBench: Multifaceted Evaluation of LLMs in Clinical Decision Making Mingyu Derek Ma, Chenchen Ye, Yu Yan, Xiaoxuan Wang, Peipei Ping, Timothy S. Chang, Wei Wang [Paper] [Code] [Website]

Earlier Work (NLP & Speech)

AAAI 2024 STAR: Boosting Low-Resource Event Extraction via Structure-to-Text Generation Mingyu Derek Ma, Xiaoxuan Wang, Po-Nien Kung, P. Jeffrey Brantingham, Nanyun Peng, Wei Wang [Paper]
EMNLP Findings 2023 Learning under Label Proportions for Text Classification Jatin Chauhan, Xiaoxuan Wang, Wei Wang [Paper]
Preprint 2022 Global Responses to the COVID-19 Pandemic: Evidence Finding and Verification Rotem Dror, Xiaoxuan Wang, Dan Roth
Speech Communication 2022 Seamless Equal Accuracy Ratio for Inclusive CTC Speech Recognition Heting Gao, Xiaoxuan Wang, Sunghun Kang, Rusty Mina, Dias Issa, John Harvill, Leda Sari, Mark Hasegawa-Johnson, Chang D. Yoo [Paper]

Invited Talks

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning March 2026

NICE (Nexus for IntelligenCE)

Internship Experience

Preference Model Summer 2026

Research Intern
Meta Platforms, Inc. Summer 2025

Research Intern (AI)

Developed GRPO-Verif, a reinforcement learning algorithm that jointly optimizes solution generation and self-verification in LLMs via a unified loss formulation, improving self-verification while preserving reasoning performance.
Amazon Development Center U.S., Inc. Summer 2024

Applied Scientist Intern

Implemented and evaluated RLHF methods (REST-EM, Iterative DPO, and variants) for tool-integrated problem solving in LLMs; showed that regularization mitigates overfitting from self-generated data and improves tool usage.

Education

University of California, Los Angeles Sept. 2022 – Present

Ph.D. in Computer Science · Advisor: Wei Wang
University of Illinois Urbana-Champaign Aug. 2018 – May 2022

B.S. in Computer Science, Grainger College of Engineering

Minor in Mathematics · James Scholar Honor · Dean's List

Teaching

CS 245: Big Data Analytics, UCLA Fall 2023

Teaching Assistant
CS 446: Machine Learning, UIUC Spring 2022

Course Assistant