Hello! I am currently a Ph.D. student at the College of Computer Science and Technology, Zhejiang University, supervised by Prof. Jianke Zhu. Prior to this, I received my B.Eng. (with Honors) from the School of Computer Science, Wuhan University, supervised by Prof. Zheng Wang . I used to be a summer research intern at the School of Computer Science, McGill University and Mila-Quebec AI Institute in Montreal, Canada, advised by Prof. Xujie Si. Prior to that, I was a visiting student at the School of Electrical Engineering, KAIST in Daejeon, Korea, advised by Prof. Chang D. Yoo.

My research interests include 2D/3D Multimodal LLMs, Visual/Scene Understanding, and Spatial Intelligence, particularly in:

1.Native multimodal foundation models, including unified 2D and 3D understanding within a single backbone.

2.Spatial-temporal understanding with MLLMs, including streaming interaction and embodied scene understanding.

3.Efficient and effective MLLMs, including visual token compression and lightweight MLLM design.

If you are interested in any form of academic cooperation with me, please feel free to email at hanxun.yu@zju.edu.cn.

🔥 News

2026.06: 🎉🎉 One paper is accepted by ECCV 2026.
2026.01: 🎉🎉 One paper is accepted by ICLR 2026.
2025.02: 🎉🎉 One paper is accepted by CVPR 2025 Highlight. (2.9%, 387/13008)
2024.07: 🎉🎉 One paper is accepted by IEEE TPAMI 2024.
2023.07: 🎉🎉 One paper is accepted by ACM MM 2023.
2023.06: 🎉🎉 I won the National Scholarship at Wuhan University. (Top 2%)
2022.06: 🎉🎉 Accepted to the Mitacs Globalink Research Internship 2022 program. (200/year Nationwide)

📝 Selected Publications

* indicates equal contribution

arXiv 2026

Unlocking Dense Metric Depth Estimation in VLMs

Hanxun Yu*, Xuan Qu*, Yuxin Wang, Jianke Zhu, Lei Ke

arXiv 2026

[Project Page] [Paper] [Code]

A unified foundation model for both low-level dense geometry prediction and high-level multimodal understanding, while achieving substantially faster inference than existing VLM-based approaches such as DepthLM and Youtu-VL.

ECCV 2026

Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

Hanxun Yu*, Xuan Qu*, Lei Ke, Boqiang Zhang, Yuxin Wang, Jianke Zhu, Dong Yu

ECCV 2026

[Project Page] [Paper] [Code]

The first online 3D vision-language model that supports real-time spatial understanding directly from streaming video, enabling efficient and continuous 3D scene comprehension through incremental geometry integration and geometry-adaptive voxel compression.

ICLR 2026

VisionTrim: Unified Vision Token Compression for Training-free MLLM Acceleration

Hanxun Yu*, Wentong Li*, Xuan Qu*, Song Wang, Junbo Chen, Jianke Zhu

ICLR 2026

[Paper] [Code]

An efficient vision token compression framework with two modules, Dominant Vision Token Selection (DVTS) and Text-Guided Vision Complement (TGVC).

CVPR 2025 (Highlight)

Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning

Hanxun Yu*, Wentong Li*, Song Wang, Junbo Chen, Jianke Zhu

CVPR 2025 (Highlight, Top 2.9%)

[Paper] [Code]

A unified and effective instance-aware 3D Large Multi-modal Model for multi-task 3D scene understanding through coupled 2D-3D modality encoding.

ACM MM 2023

Moiré Backdoor Attack (MBA): A Novel Trigger for Pedestrian Detectors in the Physical World

Hui Wei*, Hanxun Yu*, Kewei Zhang, Zhixiang Wang, Jianke Zhu, Zheng Wang

ACM MM 2023

[Paper] [Code]

This paper focuses on AI safety-critical tasks and innovatively proposes moiré-based backdoor attack triggers into pedestrian detection models.

TPAMI 2024

Physical Adversarial Attack meets Computer Vision: A Decade Survey

Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin’ichi Satoh, Luc Van Gool, Zheng Wang

IEEE TPAMI 2024

[Paper] [Code]

This survey aims to summarize existing physical adversarial attack methods, providing insights toward the development of trustworthy AI systems.

📚 Other Publications

arXiv 2025

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Yuxin Wang, Lei Ke, Boqiang Zhang, Tianyuan Qu, Hanxun Yu, Zhenpeng Huang, Meng Yu, Dan Xu, Dong Yu

arXiv 2025

[Project Page] [Paper] [Code]

A unified framework that empowers native 3D grounding to enable accurate spatial reasoning in Vision-Language Models.

arXiv 2025

StreamingAssistant: Efficient Visual Token Pruning for Accelerating Online Video Understanding

Xinqi Jin*, Hanxun Yu*, Bohan Yu, Kebin Liu, Jian Liu, Keda Tao, Yixuan Pei, Huan Wang, Fan Dang, Jiangchuan Liu, Weiqiang Wang

arXiv 2025

[Paper] [Code]

A token pruning method designed to reduce both spatial and temporal redundancy in online video understanding.

🎖 Honors and Awards

2024 The Chiang Chen Scholarship, China.
2024,2025 The First Prize of Excellent Graduate Scholarship, Zhejiang University.
2023 The National Scholarship, China. (Top 2%)
2023 Outstanding Undergraduate Dissertation Award, Wuhan University.
2023 Outstanding Graduate, Wuhan University.
2022 Mitacs-CSC Globalink Research Internship Scholarship, China. (200/year Nationwide)
2020,2021,2022 The First Prize of Excellent Undergraduate Scholarship, Wuhan University.

📖 Educations

2023.09 - now, Ph.D., Zhejiang University.
2019.09 - 2023.06, B.Eng. (with Honors), Wuhan University.

💻 Internships

2026.07 - Present, Super Intelligence Team, Xiaohongshu , Hangzhou, China.
Research Intern (Ph.D.)
2025.09 - 2025.06, Tencent Hunyuan LLM , Shenzhen, China.
Mentor: Lei Ke
Research Intern (Ph.D.)
2025.04 - 2025.09, AntGroup , Hangzhou, China.
Mentor: Jian Liu
Research Intern (Ph.D.)
2022.06 - 2022.10, McGill University and Mila-Quebec AI Institute , Montreal, Canada. [Certificate]
Supervisor: Prof. Xujie Si
Research Intern (Undergraduate)
2021.12 - 2022.02, Korea Advanced Institute of Science and Technology (KAIST) , Daejeon, Korea. [Certificate]
Supervisor: Prof. Chang D. Yoo
Research Intern (Undergraduate)

💬 Academic Services

Journal Reviewer: IEEE TPAMI
Conference Reviewer: ECCV 2026, CVPR 2026, NeurIPS 2026, ICLR 2026, ICML 2026, ACL 2026

Hanxun Yu · 于瀚勋

🔥 News

📝 Selected Publications

📚 Other Publications

🎖 Honors and Awards

📖 Educations

💻 Internships

💬 Academic Services