Haoning Wu 「吴浩宁」

I am currently a 4th-year PhD candidate at Shanghai Jiao Tong University (SJTU), fortunately advised by Prof. Weidi Xie and Prof. Ya Zhang. Before that, I received my B.S. degree in EE (IEEE Pilot Class) also from SJTU in 2022.

I am generally interested in multi-modal learning, especially generative models, spatial intelligence, and AI4Sports. My ultimate goal is to build an artificial general intelligence that surpasses humans in perception, thinking, and practical abilities.

I am always eager to communicate and cooperate, so feel free to contact me!!!

By the way, I am currently open to research internship opportunities related to multi-modal generation and understanding. Feel free to connect via email or WeChat.

Email: haoningwu3639 at gmail.com           WeChat: haoningwu_

Email  /  CV  /  Google Scholar  /  Github  /  Zhihu  /  LinkedIn

profile photo
News
Preprints

* denotes equal contribution, and denotes corresponding author.

SoccerMaster SoccerMaster: A Vision Foundation Model for Soccer Understanding
Haolin Yang, Jiayuan Rao, Haoning Wu, Weidi Xie
arXiv, 2025.   (NEW)
project page / arXiv / code

In this work, we present SoccerMaster, the first soccer-specific vision foundation model that unifies diverse understanding tasks within a single framework.

SpatialScore SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding
Haoning Wu*, Xiao Huang*, Yaohui Chen, Ya Zhang, Yanfeng Wang, Weidi Xie
arXiv, 2025.   (NEW)
project page / arXiv / code

In this work, we investigate a critical question: to what extent do existing MLLMs possess spatial intelligence, encompassing both spatial perception and spatial understanding?

Publications

* denotes equal contribution, and denotes corresponding author.

SceneGen SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
Yanxu Meng*, Haoning Wu*, Ya Zhang, Weidi Xie
3DV, 2026.   (NEW)
project page / arXiv / code

In this work, we propose a feedforward 3D scene generation model that can simultaneously synthesize multiple 3D assets from a single image.

SoccerAgent Multi-Agent System for Comprehensive Soccer Understanding
Jiayuan Rao*, Zifeng Li*, Haoning Wu, Ya Zhang, Yanfeng Wang, Weidi Xie
ACM Multimedia, 2025.   (NEW)
project page / arXiv / code

In this work, we present SoccerBench, the largest and most comprehensive soccer-specific benchmark, along with a multi-agent system, SoccerAgent, for soccer understanding.

MRGen MRGen: Segmentation Data Engine for Underrepresented MRI Modalities
Haoning Wu*, Ziheng Zhao*, Ya Zhang, Yanfeng Wang, Weidi Xie
ICCV, 2025.   (NEW)
project page / arXiv / code

In this work, we establish a novel paradigm for generative models in medical applications: controllably synthesizing data for underrepresented modalities.

UniSoccer Towards Universal Soccer Video Understanding
Jiayuan Rao*, Haoning Wu*, Hao Jiang, Ya Zhang, Yanfeng Wang, Weidi Xie
CVPR, 2025.   (NEW)
project page / arXiv / code

In this work, we present the first visual-language foundation model tailored for soccer video understanding, which can be applied various downstream tasks.

MegaFusion MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Haoning Wu*, Shaocheng Shen*, Qiang Hu, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang
WACV, 2025.
project page / arXiv / code

In this work, we propose a tuning-free strategy to extend the higher-resolution image generation capabilities of existing diffusion models.

MatchTime MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao*, Haoning Wu*, Chang Liu, Yanfeng Wang, Weidi Xie
EMNLP, 2024.   (Oral Presentation)
project page / arXiv / code

In this work, we focus on building an visual-language model for automatic soccer game commentary generation.

StoryGen Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu*, Haoning Wu*, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
CVPR, 2024.
project page / arXiv / code

In this work, we focus on the task of generating a series of coherent image sequence based on a given storyline, denoted as open-ended visual storytelling.

nerf_sdp NeRF-SDP: Efficient Generalizable Neural Radiance Field with Scene Depth Perception
Qiuwen Wang, Shuai Guo, Haoning Wu, Rong Xie, Li Song, Wenjun Zhang
ACM Multimedia Asia, 2023.   (Oral Presentation)
paper / code

In this work, we propose a novel framework, termed as NeRF-SDP, to address the challenge of balancing rendering speed and quality in generalizable NeRF.

vfi_adapter Boost Video Frame Interpolation via Simple Motion Adaptation
Haoning Wu, Xiaoyun Zhang, Weidi Xie, Ya Zhang, Yanfeng Wang
BMVC, 2023.   (Oral Presentation)
project page / arXiv / code

In this work, we propose a novel optimization-based VFI method that can adapt to unseen motions at test time and boost existing pre-trained models.

lar_sr LAR-SR: A Local Autoregressive Model for Image Super-Resolution
Baisong Guo*, Xiaoyun Zhang*, Haoning Wu, Yu Wang, Ya Zhang, Yanfeng Wang
CVPR, 2022.
paper / code

In this work, we propose LAR-SR for super-resolution based on a Local AutoRegessive module, achieving superior performance among generative models for SR.

Reviewer Service
  • Computer Vision and Pattern Recognition (CVPR 2023, 2024, 2025, 2026)
  • International Conference on Computer Vision (ICCV 2023, 2025)
  • European Conference on Computer Vision (ECCV 2024)
  • ACM Multimedia (ACM MM 2024, 2025)
  • British Machine Vision Conference (BMVC 2024, 2025) (Outstanding Reviewer in 2024)
  • AAAI Conference on Artificial Intelligence (AAAI 2025, 2026)
  • Conference on Neural Information Processing Systems (NeurIPS 2025)
  • Winter Conference on Applications of Computer Vision (WACV 2026)
  • International Conference on 3D Vision (3DV 2026)
Awards
  • [2024] BMVC 2024 Outstanding Reviewer
  • [2021] China National Scholarship (for Undergraduates)
  • [2021] School Scholarship B Prize
  • [2020] School Scholarship C Prize

Updated in December. 2025

Thanks Jon Barron for this amazing website template.