Junze (Tony) Ye

Junze (Tony) Ye
PhD Student, Operations, Information & Technology
PhD Program Office Graduate School of Business Stanford University 655 Knight Way Stanford, CA 94305

Junze (Tony) Ye

I work on the data aspect of LLM post-training and AI agents, including on-policy data curation, SFT-RL recipes, and evaluation signal quality. My research aims to bring a perspective from applied probability and sequential decision-making to analyze and optimize these machine learning systems.

Faculty Advisors

Research Interests

  • Applied Probability
  • Sequential decision making
  • Data
  • Post-Training

Working Papers

- ArXiv preprint: https://arxiv.org/abs/2512.19691 - Code and data release: https://github.com/junzeye/validate-medcalc-labels

Accepted by ICML 2026 workshop: RL from World Feedback (RLxF). TL;DR: When finetuning LLM agents, spending teacher labels on broader student-context coverage can be more effective than spending it on longer or more heavily filtered teacher completions.

PDF

A poster version was presented at CS 329A's poster session on December 12, 2025.