About Me
I am a fourth-year Ph.D. student in Computer Science at the University of California, Los Angeles (UCLA),
advised by Prof. Wei Wang and collaborating closely with
Prof. Yizhou Sun.
My research centers on agentic reinforcement learning and LLM agents,
— building stable, scalable agents for real-world interactions. I am also interested in LLM reasoning,
spanning planning, tool use, and scientific problem-solving.
I organize the UCLA Data Mining reading group. Before UCLA, I earned my B.S. in Computer Science from the
University of Illinois Urbana-Champaign in May 2022.
News
- Jun 2026Release new work HarnessBridge.
- May 2026ARLArena is accepted to ICML 2026, and MatSciBench is accepted to KDD 2026.
- Mar 2026Invited talk on ARLArena at NICE (Nexus for IntelligenCE).
- Oct 2025EAST is accepted to the NeurIPS 2025 MathAI workshop.
Selected Projects
-
Project Lead
Leading a large-scale project on agentic LLMs: principled training and inference recipes across 8+ tasks;
a unified, extensible pipeline supporting 16+ policy optimization algorithms with extensions to
asynchronous agent training and MoE models; a multi-agent system for interactive agentic tasks; and
extensive empirical analyses of training stability, scalability, and system efficiency across NVIDIA
and AMD GPUs.
-
Developed SciBench to evaluate college-level scientific reasoning in LLMs, with evaluation protocols and
analyses of model capabilities and failure modes across scientific domains. Published at
ICML 2024;
covered in a
Nature News Feature.
Publications
Agentic RL & LLM Agents
-
Preprint 2026
HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness
Xiaoxuan Wang*, Haixin Wang*, Alex Taylor, Jason Cong, Yizhou Sun, Wei Wang
[Paper]
[Code]
[Model]
-
ICML 2026
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
Xiaoxuan Wang*, Han Zhang*, Haixin Wang*, Yidan Shi†, Ruoyan Li†, Kaiqiao Han†, Chengyi Tong, Haoran Deng, Alex Taylor, Yanqiao Zhu, Renliang Sun, Jason Cong, Yizhou Sun, Wei Wang
[Paper]
[Code]
[Model]
LLM Reasoning, Post-Training & Evaluation
-
ICML 2024
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang*, Ziniu Hu*, Pan Lu*, Yanqiao Zhu*, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, Wei Wang
Media coverage: Nature News Feature
[Paper]
[Code]
[Website]
-
NeurIPS MathAI 2025
EAST: Entropy-Based Adaptive Weighting for Self-Training
Xiaoxuan Wang, Yihe Deng, Mingyu Derek Ma, Wei Wang
[Paper]
[Code]
-
Preprint 2025
From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs
Xiaoxuan Wang, Bo Liu, Song Jiang, Jingzhou Liu, Jingyuan Qi, Xia Chen, Baosheng He
[Paper]
-
KDD 2026
MatSciBench: Benchmarking the Reasoning Ability of Large Language Models in Materials Science
Junkai Zhang*, Jingru Gan*, Xiaoxuan Wang, Zian Jia, Changquan Gu, Jianpeng Chen, Yanqiao Zhu, Mingyu Derek Ma, Dawei Zhou, Ling Li, Wei Wang
[Paper]
[Code]
[Dataset]
Clinical & Domain Applications
-
AAAI 2025
Memorize and Rank: Evaluating LLMs for Clinical Diagnosis Prediction
Mingyu Derek Ma, Xiaoxuan Wang, Yijia Xiao, Anthony Cuturrufo, Vijay S. Nori, Eran Halperin, Wei Wang
Also at NeurIPS GenAI4Health Workshop 2024
[Paper]
-
Preprint 2024
CliBench: Multifaceted Evaluation of LLMs in Clinical Decision Making
Mingyu Derek Ma, Chenchen Ye, Yu Yan, Xiaoxuan Wang, Peipei Ping, Timothy S. Chang, Wei Wang
[Paper]
[Code]
[Website]
Earlier Work (NLP & Speech)
-
AAAI 2024
STAR: Boosting Low-Resource Event Extraction via Structure-to-Text Generation
Mingyu Derek Ma, Xiaoxuan Wang, Po-Nien Kung, P. Jeffrey Brantingham, Nanyun Peng, Wei Wang
[Paper]
-
EMNLP Findings 2023
Learning under Label Proportions for Text Classification
Jatin Chauhan, Xiaoxuan Wang, Wei Wang
[Paper]
-
Preprint 2022
Global Responses to the COVID-19 Pandemic: Evidence Finding and Verification
Rotem Dror, Xiaoxuan Wang, Dan Roth
-
Speech Communication 2022
Seamless Equal Accuracy Ratio for Inclusive CTC Speech Recognition
Heting Gao, Xiaoxuan Wang, Sunghun Kang, Rusty Mina, Dias Issa, John Harvill, Leda Sari, Mark Hasegawa-Johnson, Chang D. Yoo
[Paper]
Invited Talks
Internship Experience
-
Research Intern
-
Research Intern (AI)
Developed GRPO-Verif, a reinforcement learning algorithm that jointly optimizes solution generation and
self-verification in LLMs via a unified loss formulation, improving self-verification while preserving
reasoning performance.
-
Applied Scientist Intern
Implemented and evaluated RLHF methods (REST-EM, Iterative DPO, and variants) for tool-integrated problem
solving in LLMs; showed that regularization mitigates overfitting from self-generated data and improves
tool usage.
Education
-
Ph.D. in Computer Science · Advisor: Wei Wang
-
B.S. in Computer Science, Grainger College of Engineering
Minor in Mathematics · James Scholar Honor · Dean's List
Teaching
-
Teaching Assistant
-
Course Assistant