Haoning Wu 「吴浩宁」

I am currently a 4th-year PhD candidate at Shanghai Jiao Tong University (SJTU), fortunately advised by Prof. Weidi Xie and Prof. Ya Zhang. Previously, I received my B.S. degree in EE (IEEE Pilot Class) also from SJTU in June 2022.

I'm generally interested in multi-modal learning, especially generative models, spatial intelligence, AI4Sports, and AI4Science. I firmly believe that perception will be the next key step in achieving AGI, and my ultimate goal is to build a general artificial intelligence that surpasses humans in both thinking and practical abilities.

I'm always eager to communicate and cooperate, so feel free to contact me!!!

Email: haoningwu3639 at gmail.com           WeChat: haoningwu_

Email  /  CV  /  Google Scholar  /  Github  /  Zhihu  /  LinkedIn

profile photo
News
Preprints

* denotes equal contribution, and denotes corresponding author.

SceneGen SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
Yanxu Meng*, Haoning Wu*, Ya Zhang, Weidi Xie
arXiv, 2025.   (NEW)
project page / arXiv / code

In this work, we propose a feedforward 3D scene generation model that can simultaneously synthesize multiple 3D assets from a single image.

SpatialScore SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding
Haoning Wu*, Xiao Huang*, Yaohui Chen, Ya Zhang, Yanfeng Wang, Weidi Xie
arXiv, 2025.   (NEW)
project page / arXiv / code

In this work, we investigate a critical question: do existing MLLMs possess 3D spatial perception and understanding abilities?

Publications

* denotes equal contribution, and denotes corresponding author.

SoccerAgent Multi-Agent System for Comprehensive Soccer Understanding
Jiayuan Rao*, Zifeng Li*, Haoning Wu, Ya Zhang, Yanfeng Wang, Weidi Xie
ACM Multimedia, 2025.   (NEW)
project page / arXiv / code

In this work, we present SoccerBench, the largest and most comprehensive soccer-specific benchmark, along with a multi-agent system, SoccerAgent, for soccer understanding.

MRGen MRGen: Segmentation Data Engine for Underrepresented MRI Modalities
Haoning Wu*, Ziheng Zhao*, Ya Zhang, Yanfeng Wang, Weidi Xie
ICCV, 2025.   (NEW)
project page / arXiv / code

In this work, we establish a novel paradigm for generative models in medical applications: controllably synthesizing data for underrepresented modalities.

UniSoccer Towards Universal Soccer Video Understanding
Jiayuan Rao*, Haoning Wu*, Hao Jiang, Ya Zhang, Yanfeng Wang, Weidi Xie
CVPR, 2025.   (NEW)
project page / arXiv / code

In this work, we present the first visual-language foundation model tailored for soccer video understanding, which can be applied various downstream tasks.

MegaFusion MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Haoning Wu*, Shaocheng Shen*, Qiang Hu, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang
WACV, 2025.   (NEW)
project page / arXiv / code

In this work, we propose a tuning-free strategy to extend the higher-resolution image generation capabilities of existing diffusion models.

MatchTime MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao*, Haoning Wu*, Chang Liu, Yanfeng Wang, Weidi Xie
EMNLP, 2024.   (Oral Presentation)
project page / arXiv / code

In this work, we focus on building an visual-language model for automatic soccer game commentary generation.

StoryGen Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu*, Haoning Wu*, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
CVPR, 2024.
project page / arXiv / code

In this work, we focus on the task of generating a series of coherent image sequence based on a given storyline, denoted as open-ended visual storytelling.

nerf_sdp NeRF-SDP: Efficient Generalizable Neural Radiance Field with Scene Depth Perception
Qiuwen Wang, Shuai Guo, Haoning Wu, Rong Xie, Li Song, Wenjun Zhang
ACM Multimedia Asia, 2023.   (Oral Presentation)
paper / code

In this work, we propose a novel framework, termed as NeRF-SDP, to address the challenge of balancing rendering speed and quality in generalizable NeRF.

vfi_adapter Boost Video Frame Interpolation via Simple Motion Adaptation
Haoning Wu, Xiaoyun Zhang, Weidi Xie, Ya Zhang, Yanfeng Wang
BMVC, 2023.   (Oral Presentation)
project page / arXiv / code

In this work, we propose a novel optimization-based VFI method that can adapt to unseen motions at test time and boost existing pre-trained models.

lar_sr LAR-SR: A Local Autoregressive Model for Image Super-Resolution
Baisong Guo*, Xiaoyun Zhang*, Haoning Wu, Yu Wang, Ya Zhang, Yanfeng Wang
CVPR, 2022.
paper / code

In this work, we propose LAR-SR for super-resolution based on a Local AutoRegessive module, achieving superior performance among generative models for SR.

Reviewer Service
  • Computer Vision and Pattern Recognition (CVPR 2023, 2024, 2025, 2026)
  • International Conference on Computer Vision (ICCV 2023, 2025)
  • European Conference on Computer Vision (ECCV 2024)
  • ACM Multimedia (ACM MM 2024, 2025)
  • British Machine Vision Conference (BMVC 2024, 2025) (Outstanding Reviewer in 2024)
  • AAAI Conference on Artificial Intelligence (AAAI 2025, 2026)
  • Conference on Neural Information Processing Systems (NeurIPS 2025)
  • Winter Conference on Applications of Computer Vision (WACV 2026)
  • International Conference on 3D Vision (3DV 2026)
Awards
  • [2024] BMVC 2024 Outstanding Reviewer
  • [2021] China National Scholarship (for Undergraduates)
  • [2021] School Scholarship B Prize
  • [2020] School Scholarship C Prize

Updated in September. 2025

Thanks Jon Barron for this amazing website template.