Junze (Tony) Ye
Junze (Tony) Ye
I work on the data aspect of LLM post-training and AI agents, including on-policy data curation, SFT-RL recipes, and evaluation signal quality. My research aims to bring a perspective from applied probability and sequential decision-making to analyze and optimize these machine learning systems.
Faculty Advisors
Research Interests
- Applied Probability
- Sequential decision making
- Data
- Post-Training
Publications
Working Papers
- ArXiv preprint: https://arxiv.org/abs/2512.19691 - Code and data release: https://github.com/junzeye/validate-medcalc-labels
Accepted by ICML 2026 workshop: RL from World Feedback (RLxF). TL;DR: When finetuning LLM agents, spending teacher labels on broader student-context coverage can be more effective than spending it on longer or more heavily filtered teacher completions.
A poster version was presented at CS 329A's poster session on December 12, 2025.