Hello! I am currently a PhD student at Zhejiang University, majoring in Computer Science and Technology supervised by Prof. Jianke Zhu. I have obtained B.Eng. (with Honors) from Wuhan University majoring in Computer Science and Technology supervised by Prof. Zheng Wang . I used to be a summer research intern at McGill University and Mila-Quebec AI Institute in Montreal, Canada, under the supervision of Prof. Xujie Si. Prior to that, I was a visit student at KAIST in Daejeon, Korea, supervised by Prof. Chang D. Yoo.
My research interests include 2D/3D Multimodal LLMs, Visual/Scene Understanding, and Spatial Intelligence, particularly in:
1.Native multimodal foundation models, including unified 2D and 3D understanding within a single backbone.
2.Spatial-temporal understanding with MLLMs, including streaming interaction and embodied scene understanding.
3.Efficient and effective MLLMs, including visual token compression and lightweight MLLM design.
If you are interested in any form of academic cooperation with me, please feel free to email at hanxun.yu@zju.edu.cn.
🔥 News
- 2026.01: 🎉🎉 One paper is accepted by ICLR 2026.
- 2025.02: 🎉🎉 One paper is accepted by CVPR 2025 Highlight. (2.9%, 387/13008)
- 2024.07: 🎉🎉 One paper is accepted by IEEE TPAMI 2024.
- 2023.07: 🎉🎉 One paper is accepted by ACM MM 2023.
- 2023.06: 🎉🎉 I won the National Scholarship at Wuhan University. (Top 2%)
- 2022.06: 🎉🎉 Accepted to the Mitacs Globalink Research Internship 2022 program. (200/year Nationwide)
📝 Selected Publications

VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
Hanxun Yu*, Wentong Li*, Xuan Qu*, Song Wang, Junbo Chen, Jianke Zhu
ICLR 2026
- An efficient vision token compression framework with two modules, Dominant Vision Token Selection (DVTS) and Text-Guided Vision Complement (TGVC).

Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu*, Wentong Li*, Song Wang, Junbo Chen, Jianke Zhu
CVPR 2025 (Highlight, Top 2.9%)
- A unified and effective instance-aware 3D Large Multi-modal Model for multi-task 3D scene understanding through coupled 2D-3D modality encoding.

Moiré Backdoor Attack (MBA): A Novel Trigger for Pedestrian Detectors in the Physical World
Hui Wei*, Hanxun Yu*, Kewei Zhang, Zhixiang Wang, Jianke Zhu, Zheng Wang
ACM MM 2023
- This paper focuses on AI safety-critical tasks and innovatively proposes moiré-based backdoor attack triggers into pedestrian detection models.
📚 Other Publications

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
Yuxin Wang, Lei Ke, Boqiang Zhang, Tianyuan Qu, Hanxun Yu, Zhenpeng Huang, Meng Yu, Dan Xu, Dong Yu
arXiv 2025
- A unified framework that empowers native 3D grounding to enable accurate spatial reasoning in Vision-Language Models.

StreamingAssistant: Efficient Visual Token Pruning for Accelerating Online Video Understanding
Xinqi Jin*, Hanxun Yu*, Bohan Yu, Kebin Liu, Jian Liu, Keda Tao, Yixuan Pei, Huan Wang, Fan Dang, Jiangchuan Liu, Weiqiang Wang
arXiv 2025
- A token pruning method designed to reduce both spatial and temporal redundancy in online video understanding.

Physical Adversarial Attack meets Computer Vision: A Decade Survey
Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin’ichi Satoh, Luc Van Gool, Zheng Wang
IEEE TPAMI 2024
- This survey aims to summarize existing physical adversarial attack methods, providing insights toward the development of trustworthy AI systems.
🎖 Honors and Awards
- 2024 The Chiang Chen Scholarship, China.
- 2024,2025 The First Prize of Excellent Graduate Scholarship, Zhejiang University.
- 2023 The National Scholarship, China. (Top 2%)
- 2023 Outstanding Undergraduate Dissertation Award, Wuhan University.
- 2023 Outstanding Graduate, Wuhan University.
- 2022 Mitacs-CSC Globalink Research Internship Scholarship, China. (200/year Nationwide)
- 2020,2021,2022 The First Prize of Excellent Undergraduate Scholarship, Wuhan University.
📖 Educations
2023.09 - now, Ph.D, Zhejiang University.
2019.09 - 2023.06, B.Eng. (with Honors), Wuhan University.
💻 Internships
- 2025.09 - Present, Tencent Hunyuan LLM
, Shenzhen, China.
Mentor: Lei Ke
Research Intern (Ph.D) - 2025.04 - 2025.09, AntGroup
, Hangzhou, China.
Mentor: Jian Liu
Research Intern (Ph.D) - 2022.06 - 2022.10, McGill University
and Mila-Quebec AI Institute
, Montreal, Canada. [Certificate]
Supervisor: Prof. Xujie Si
Research Intern (Undergraduate) - 2021.12 - 2022.02, Korea Advanced Institute of Science and Technology (KAIST)
, Daejeon, Korea. [Certificate]
Supervisor: Prof. Chang D. Yoo
Research Intern (Undergraduate)
💬 Academic Services
- Journal Reviewer: IEEE TPAMI
- Conference Reviewer: ECCV 2026, CVPR 2026, NeurIPS 2026, ICLR 2026, ICML 2026