Yuzhi Zhao

Researcher, ByteDance

Shenzhen, China

Short Bio

I received the Ph.D. degree in Electronic Engineering from Department of Electronic Engineering, City University of Hong Kong in February 2023 and the B.Eng. degree in Electronic and Information Engineering from School of Electronic and Information Engineering (Qiming College), Huazhong University of Science and Technology in June 2018. My research spans low-level vision restoration for intelligent mobile devices, multimodal understanding of content and user intent, GUI agent systems, and benchmark construction. My research interests include post-training for Multimodal Large Language Models (MLLMs), AI agents, computational photography, and generative models. I have published 12 papers as a first or corresponding author at leading international conferences and journals, and have more than 1,800 Google Scholar citations.

Working Experiences

Since December 2025, I have been a Researcher at ByteDance, working on GUI test agents that automate app testing from GUI operation knowledge.

From April 2023 to December 2025, I was a Researcher at Huawei Hong Kong Research Center. I led a research group that built a unified MLLM for content moderation and an app-testing agent for the HarmonyOS ecosystem.

Previously, I was a Student Researcher at Tencent from January to March 2023 and at SenseTime from November 2019 to May 2022. At SenseTime, I worked on joint denoising and deblurring, as well as hyperspectral image reconstruction.

Selected Publication

^*: corresponding author

LLM/MLLM Training (like RLVR or continual training)

Donglai Xu, Hongzheng Yang, Yuzhi Zhao^*, Pingping Zhang, Jinpeng Chen, Wenao Ma, Zhijian Hou, Mengyang Wu, Xiaolei Li, Senkang Hu, Ziyi Guan, Jason Chun Lok Li, Lai-Man Po. From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training. CVPR, 2026 (PDF) (Code) (URL)

Senkang Hu, Yong Dai, Yuzhi Zhao, Yihang Tao, Yu Guo, Zhengru Fang, Sam Kwong, Yuguang Fang. Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward. ICML, 2026 (URL)

Hieu Trung Nguyen, Bao Nguyen, Wenao Ma, Yuzhi Zhao, Ruifeng She, Viet Anh Nguyen. Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards. ICLR, 2026 (PDF) (Code) (URL)

Jinpeng Chen, Runmin Cong, Yuzhi Zhao^*, Hongzheng Yang, Guangneng Hu, Horace Ip, Sam Kwong. SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning. ICML, 2025 (PDF) (Code) (URL)

AI Agent, LLM/MLLM Applications

Ziyi Guan, Jason Chun Lok Li, Zhijian Hou, Pingping Zhang, Donglai Xu, Pengfei Xian, Mengyang Wu, Jinpeng Chen, Wenao Ma, Yuzhi Zhao^*, Shengchao Qin, Graziano Chesi, Ngai Wong. KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation. EMNLP, 2025 (PDF) (Code) (URL)

Mengyang Wu, Yuzhi Zhao^*, Jialun Cao, Mingjie Xu, Zhongming Jiang, Xuehui Wang, Qinbin Li, Guangneng Hu, Shengchao Qin, Chi-Wing Fu. ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. AAAI, 2025 (PDF) (Code) (URL)

Mingjie Xu, Mengyang Wu, Yuzhi Zhao^*, Jason Chun Lok Li, Weifeng Ou. LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations. WACV, 2025 (PDF) (Code) (URL)

LLM/MLLM Benchmarks

Mingjie Xu, Jinpeng Chen, Yuzhi Zhao^*, Jason Chun Lok Li, Yue Qiu, Zekang Du, Mengyang Wu, Pingping Zhang, Kun Li, Hongzheng Yang, Wenao Ma, Jiaheng Wei, Qinbin Li, Kangcheng Liu, Wenqiang Lei. VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models. AAAI, 2026 (PDF) (URL)

Li Kun, Lai Man Po, Hongzheng Yang, Xuyuan Xu, Kangcheng Liu, Yuzhi Zhao^*. AesBiasBench: Evaluating Bias and Alignment in Multimodal Language Models for Personalized Image Aesthetic Assessment. EMNLP, 2025 (PDF) (URL)

Low-level Vision and Computational Photography

Yuzhi Zhao^*, Lai-Man Po, Xin Ye, Yongzhe Xu, Qiong Yan. Modeling Dual-Exposure Quad-Bayer Patterns for Joint Denoising and Deblurring. IEEE Transactions on Image Processing, 2024 (PDF) (Code) (URL)

Yuzhi Zhao^*, Lai-Man Po, Kangcheng Liu, Xuehui Wang, Wing-Yin Yu. SVCNet: Scribble-Based Video Colorization Network with Temporal Aggregation. IEEE Transactions on Image Processing, 2023 (PDF) (Code) (URL)

Yuzhi Zhao^*, Lai-Man Po, Tingyu Lin, Qiong Yan, Wei Liu, Pengfei Xian. HSGAN: Hyperspectral Reconstruction from RGB Images with Generative Adversarial Network. IEEE Transactions on Neural Networks and Learning Systems, 2023 (PDF) (Code) (URL)

Yuzhi Zhao^*, Yongzhe Xu, Qiong Yan, Dingdong Yang, Xuehui Wang, and Lai-Man Po. D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration. ECCV, 2022 (PDF) (Code/Dataset) (URL)

Yuzhi Zhao^*, Lai-Man Po, Wing-Yin Yu, Yasar Abbas Ur Rehman, Mengyang Liu, Yujia Zhang, Weifeng Ou. VCGAN: Video Colorization with Hybrid Generative Adversarial Network. IEEE Transactions on Multimedia, 2022 (PDF) (Code) (URL)

Yuzhi Zhao^*, Lai-Man Po, Kwok-Wai Cheung, Wing-Yin Yu, Yasar Abbas Ur Rehman. SCGAN: Saliency Map-guided Colorization with Generative Adversarial Network. IEEE Transactions on Circuits and Systems for Video Technology, 2020 (PDF) (Code) (URL)

Generative Models

Hongzheng Yang, Jason Chun-Lok Li, Li Kun, Wenao Ma, Mingjie Xu, Yuzhi Zhao^*, Lai-Man Po. RSTFA: Efficient Training-Free Human Preference Alignment via Rejection Sampling for Text-to-Image Diffusion Models. IEEE Transactions on Image Processing, 2026 (URL)

Wing-Yin Yu, Lai-Man Po, Ray C.C. Cheung, Yuzhi Zhao, Yu Xue, Kun Li. Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer. ICCV, 2023 (PDF) (Code) (URL)

Wing-Yin Yu, Lai-Man Po, Jingjing Xiong, Yuzhi Zhao, Pengfei Xian. ShaTure: Shape and Texture Deformation for Human Pose and Attribute Transfer. IEEE Transactions on Image Processing, 2022 (URL)

Yuzhi Zhao^*, Lai-Man Po, Xuehui Wang, Qiong Yan, Wei Shen, et al. ChildPredictor: A Child Face Prediction Framework with Disentangled Learning. IEEE Transactions on Multimedia, 2022 (PDF) (Code/Dataset) (URL)

Representation Learning

Xuehui Wang, Chongjie Si, Xue Yang, Yuzhi Zhao, Wenhai Wang, Xiaokang Yang, Wei Shen. OPMapper: Enhancing Open-Vocabulary Semantic Segmentation with Multi-Guidance Information. NeurIPS, 2025 (PDF) (URL)

Kangcheng Liu, Yuzhi Zhao, Qiang Nie, Zhi Gao, and Ben M. Chen. WS3D: Weakly Supervised 3D Scene Segmentation with Region-Level Boundary Awareness and Instance Discrimination. ECCV, 2022 (PDF) (Code) (URL)

Yujia Zhang, Lai-Man Po, Xuyuan Xu, Mengyang Liu, Yexin Wang, Weifeng Ou, Yuzhi Zhao, Wing-Yin Yu. Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation. AAAI, 2022 (PDF) (Code) (URL)