Hello! I am currently a student in Zhejiang University (浙江大学), majoring in Artifical Intelligence supervised by Prof. Jianke Zhu. I have obtained B.Eng (with Honors) from Wuhan University (武汉大学) majoring in Computer Science and Technology supervised by Prof. Zheng Wang . I used to be a summer research intern in-person at McGill University and Mila-Quebec AI Institute in Montreal, Canada, under the supervision of Prof. Xujie Si. Prior to that, I was a remote visit student at KAIST in Korea, supervised by Prof. Chang D. Yoo.
My research interests include 2D/3D Multimodal LLMs, Visual/Scene Understanding and Embodied AI, particularly in:
1.Enabling MLLMs with common visual tasks, including open-vocabulary visual grounding for image/video/3D scene.
2.Embodied scene understanding/reasoning, including 3D question answering, 3D dense captioning and embodied dialogue/planning.
3.Efficient and effective MLLMs, including visual token compression and lightweight MLLM.
I am currently seeking a long-term research internship opportunity. If you are interested in collaborating with me, please feel free to email at hanxun.yu@zju.edu.cn.
🔥 News
- 2025.02: 🎉🎉 One paper is accepted by CVPR 2025.
- 2024.07: 🎉🎉 One paper is accepted by IEEE TPAMI 2024.
- 2023.07: 🎉🎉 One paper is accepted by ACM MM 2023.
- 2023.06: 🎉🎉 I won the National Scholarship at Wuhan University (Top 2%).
- 2022.06: 🎉🎉 Accepted to the Mitacs Globalink Research Internship 2022 program (in Canada).
📝 Publications

Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning.
Rating Score: 5/5/4
Hanxun Yu*, Wentong Li*, Song Wang, Junbo Chen, Jianke Zhu
- This paper proposes an effective instance-aware Large Multi-modal Model for 3D scene understanding. We develop a MCMF module for 2D/3D cross-modal feature fusion, which generates fine-grained instance-level tokens, and a 3D-ISR module to capture the complex spatial relationships among objects, producing informative scene-level tokens. Our Inst3D-LMM is designed to require fewer computational resources, while delivering faster training and inference speeds.

Moiré Backdoor Attack (MBA): A Novel Trigger for Pedestrian Detectors in the Physical World
Hui Wei*, Hanxun Yu*, Kewei Zhang, Zhixiang Wang, Jianke Zhu, Zheng Wang
- This paper focuses on AI safety-critical tasks and introduces the Moiré Backdoor Attack (MBA), which firstly integrates Moiré-based triggers into pedestrian detection models. Our MBA approach enables individuals wearing clothes with Moiré patterns to evade detection in the real world scenarios, while maintaining considerable stealthiness.
-
TPAMI 2024Physical Adversarial Attack meets Computer Vision: A Decade Survey, Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin'ichi Satoh, Zheng Wang [Paper] [Code]
-
PreprintAesthetic Yet Customizable Adversarial Patches Towards Physical Attacks, Hui Wei*, Hanxun Yu*, Zhixiang Wang, Shin'ichi Satoh, Hao Tang, Zheng Wang [Paper]
🎖 Honors and Awards
- 2024 The Chiang Chen Scholarship, China.
- 2024 The First Prize of Excellent Graduate Scholarship, Zhejiang University.
- 2023 The National Scholarship, China. (Top 2%)
- 2023 Outstanding Undergraduate Dissertation Award, Wuhan University.
- 2023 Outstanding Graduate, Wuhan University.
- 2022 Mitacs-CSC Globalink Research Internship Scholarship, China. (200/year Nationwide)
- 2020,2021,2022 The First Prize of Excellent Undergraduate Scholarship, Wuhan University.
📖 Educations
- 2023.09 - now, Zhejiang University.
- 2019.09 - 2023.06, Bachelor (with Honors), Wuhan University.
💻 Internships
- 2022.06 - 2022.10, McGill University and Mila-Quebec AI Institute, Montreal, Canada, advised by Prof. Xujie Si. [Certificate]
- 2021.12 - 2022.02, KAIST, Daejeon, Korea, advised by Prof. Chang D. Yoo. [Certificate]
💬 Academic Services
- Conference Reviewer: ICML 2025, CVPR 2025, AAAI 2025, ACM MM 2023-2024