Haoning Wu

Haoning Wu 「吴浩宁」

I am currently a 4th-year PhD candidate at Shanghai Jiao Tong University (SJTU), fortunately advised by Prof. Weidi Xie and Prof. Ya Zhang. Before that, I received my B.S. degree in EE (IEEE Pilot Class) also from SJTU in 2022.

I am generally interested in multi-modal learning, especially generative models, spatial intelligence, and AI4Sports. My ultimate goal is to build an artificial general intelligence that surpasses humans in perception, thinking, and practical abilities.

I am always eager to communicate and cooperate, so feel free to contact me!!!

By the way, I am currently open to research internship opportunities related to multi-modal generation and understanding. Feel free to connect via email or WeChat.

Email: haoningwu3639 at gmail.com WeChat: haoningwu_

Email / CV / Google Scholar / Github / Zhihu / LinkedIn

News

[2026.02] Two papers (SpatialScore) and (SoccerMaster) have been accepted by CVPR 2026.
[2025.11] One paper (SceneGen) has been accepted by 3DV 2026.
[2025.08] Invited talk about our paper (MRGen) at NICE. Welcome to discuss!
[2025.08] I will attend ICCV 2025 in person. See you in Hawaii!
[2025.07] One paper (SoccerAgent) has been accepted by ACM Multimedia 2025.
[2025.06] One paper (MRGen) has been accepted by ICCV 2025.
[2025.02] One paper (UniSoccer) has been accepted by CVPR 2025.
[2024.11] I am recognized as BMVC 2024 Outstanding Reviewer.
[2024.10] One paper (MegaFusion) has been accepted by WACV 2025.
[2024.09] One paper (MatchTime) has been accepted by EMNLP 2024 and selected as Oral.
[2024.04] I will attend CVPR 2024 in person. See you in Seattle!
[2024.04] I have passed my PhD Qualification Examination, and I am a PhD candidate now.
[2024.03] Start internship at Shanghai AI Laboratory as a research intern, Shanghai (on-site).
[2024.02] One paper (StoryGen) has been accepted by CVPR 2024.
[2023.10] One paper (NeRF-SDP) has been accepted by ACM Multimedia Asia 2023 and selected as Oral.
[2023.08] One paper (VFIAdapter) has been accepted by BMVC 2023 and selected as Oral.

Publications

* denotes equal contribution, and ^† denotes corresponding author.

SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence
Haoning Wu*, Xiao Huang*, Yaohui Chen, Ya Zhang, Yanfeng Wang^†, Weidi Xie^†
CVPR, 2026. (NEW)
project page / arXiv / code

In this work, we investigate a critical question: to what extent do existing MLLMs possess spatial intelligence, encompassing both spatial perception and spatial understanding?

SoccerMaster: A Vision Foundation Model for Soccer Understanding
Haolin Yang, Jiayuan Rao, Haoning Wu, Weidi Xie^†
CVPR, 2026. (NEW)
project page / arXiv / code

In this work, we present SoccerMaster, the first soccer-specific vision foundation model that unifies diverse understanding tasks within a single framework.

SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
Yanxu Meng*, Haoning Wu*, Ya Zhang, Weidi Xie^†
3DV, 2026. (NEW)
project page / arXiv / code

In this work, we propose a feedforward 3D scene generation model that can simultaneously synthesize multiple 3D assets from a single image.

Multi-Agent System for Comprehensive Soccer Understanding
Jiayuan Rao*, Zifeng Li*, Haoning Wu, Ya Zhang, Yanfeng Wang^†, Weidi Xie^†
ACM Multimedia, 2025. (NEW)
project page / arXiv / code

In this work, we present SoccerBench, the largest and most comprehensive soccer-specific benchmark, along with a multi-agent system, SoccerAgent, for soccer understanding.

MRGen: Segmentation Data Engine for Underrepresented MRI Modalities
Haoning Wu*, Ziheng Zhao*, Ya Zhang, Yanfeng Wang^†, Weidi Xie^†
ICCV, 2025. (NEW)
project page / arXiv / code

In this work, we establish a novel paradigm for generative models in medical applications: controllably synthesizing data for underrepresented modalities.

Towards Universal Soccer Video Understanding
Jiayuan Rao*, Haoning Wu*, Hao Jiang, Ya Zhang, Yanfeng Wang^†, Weidi Xie^†
CVPR, 2025.
project page / arXiv / code

In this work, we present the first visual-language foundation model tailored for soccer video understanding, which can be applied various downstream tasks.

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Haoning Wu*, Shaocheng Shen*, Qiang Hu, Xiaoyun Zhang^†, Ya Zhang, Yanfeng Wang
WACV, 2025.
project page / arXiv / code

In this work, we propose a tuning-free strategy to extend the higher-resolution image generation capabilities of existing diffusion models.

MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao*, Haoning Wu*, Chang Liu, Yanfeng Wang^†, Weidi Xie^†
EMNLP, 2024. (Oral Presentation)
project page / arXiv / code

In this work, we focus on building an visual-language model for automatic soccer game commentary generation.

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu*, Haoning Wu*, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang^†, Weidi Xie^†
CVPR, 2024.
project page / arXiv / code

In this work, we focus on the task of generating a series of coherent image sequence based on a given storyline, denoted as open-ended visual storytelling.

NeRF-SDP: Efficient Generalizable Neural Radiance Field with Scene Depth Perception
Qiuwen Wang, Shuai Guo, Haoning Wu, Rong Xie, Li Song^†, Wenjun Zhang
ACM Multimedia Asia, 2023. (Oral Presentation)
paper / code

In this work, we propose a novel framework, termed as NeRF-SDP, to address the challenge of balancing rendering speed and quality in generalizable NeRF.

Boost Video Frame Interpolation via Simple Motion Adaptation
Haoning Wu, Xiaoyun Zhang^†, Weidi Xie, Ya Zhang, Yanfeng Wang^†
BMVC, 2023. (Oral Presentation)
project page / arXiv / code

In this work, we propose a novel optimization-based VFI method that can adapt to unseen motions at test time and boost existing pre-trained models.

LAR-SR: A Local Autoregressive Model for Image Super-Resolution
Baisong Guo*, Xiaoyun Zhang*^†, Haoning Wu, Yu Wang, Ya Zhang, Yanfeng Wang^†
CVPR, 2022.
paper / code

In this work, we propose LAR-SR for super-resolution based on a Local AutoRegessive module, achieving superior performance among generative models for SR.

Reviewer Service

Computer Vision and Pattern Recognition (CVPR 2023, 2024, 2025, 2026)
International Conference on Computer Vision (ICCV 2023, 2025)
European Conference on Computer Vision (ECCV 2024, 2026)
ACM Multimedia (ACM MM 2024, 2025, 2026)
British Machine Vision Conference (BMVC 2024, 2025) (Outstanding Reviewer in 2024)
AAAI Conference on Artificial Intelligence (AAAI 2025, 2026)
Conference on Neural Information Processing Systems (NeurIPS 2025)
Winter Conference on Applications of Computer Vision (WACV 2026)
International Conference on 3D Vision (3DV 2026)
IEEE Transactions on Visualization and Computer Graphics (TVCG)

Awards

[2024] BMVC 2024 Outstanding Reviewer
[2021] China National Scholarship (for Undergraduates)
[2021] School Scholarship B Prize
[2020] School Scholarship C Prize

Updated in February. 2026

Thanks Jon Barron for this amazing website template.