VFI_Adapter

Boost Video Frame Interpolation via Motion Adaptation

Haoning Wu¹

Xiaoyun Zhang¹

Weidi Xie^1,2

Ya Zhang^1,2

Yanfeng Wang^1,2

¹CMIC, Shanghai Jiao Tong University

²Shanghai AI Lab

BMVC 2023 Oral

Code [GitHub]

Paper [arXiv]

Cite [BibTeX]

High-level idea overview. (a) To address the generalisation challenge of VFI models due to domain gap on unseen data, we propose the optimisation-based video frame interpolation. By performing test-time motion adaptation on our proposed lightweight adapter, we achieve the generalization of VFI models across different video scenarios and subsequently boost their performance. (b) Visual comparison on the cases with complex and large-scale motions from DAVIS dataset. Our method assists VFI models in generalising to diverse scenarios and synthesizing high-quality frames with clearer structures and fewer distortions.

Abstract

Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video. Existing learning-based VFI methods have achieved great success, but they still suffer from limited generalization ability due to the limited motion distribution of training datasets. In this paper, we propose a novel optimization-based VFI method that can adapt to unseen motions at test time. Our method is based on a cycle-consistency adaptation strategy that leverages the motion characteristics among video frames. We also introduce a lightweight adapter that can be inserted into the motion estimation module of existing pre-trained VFI models to improve the efficiency of adaptation. Extensive experiments on various benchmarks demonstrate that our method can boost the performance of two-frame VFI models, outperforming the existing state-of-the-art methods, even those that use extra input frames.

Architecture

Proposed cycle-consistency adaptation strategy and plug-in adapter module for efficient test-time adaptation. (a) Cycle-consistency adaptation first synthesizes intermediate frames between each two input frames and reuses them to interpolate the target frame to calculate cycle-loss, which fully utilizes the consistency within video sequences. (b) To improve efficiency, we freeze all the parameters of pre-trained VFI models and solely optimise the proposed plug-in adapter, which predicts a set of parameters {α, β} based on the extracted visual features. The pixel-wise weights α and biases β are used for rectifying the estimated flow to fit each video sequence.

Results

Quantitative Results

Quantitative (PSNR/SSIM) comparison. We compare our boosted models to representative state-of-the-art methods on Vimeo90K, DAVIS and SNU-FILM benchmarks. Both of the optimisation approaches exhibit a substantial improvement in performance. Note that FLAVR and VFIT take multiple frames as input, but our boosted models can still outperform them. RED: best performance, BLUE: second best performance.

Qualitative Results

Qualitative comparison against the state-of-the-art VFI algorithms. We show visualization on Vimeo90K, SNU-FILM and DAVIS benchmarks for comparison. The patches for careful comparison are marked with red in the original images. Our boosted models can generate higher-quality results with clearer structures and fewer distortions.

Ablation Studies

Quantitative (PSNR/SSIM) comparison of adaptation strategies. The experiments on Vimeo90K dataset have shown that cycle-consistency adaptation steadily boosts VFI models by fully leveraging the inter-frame consistency to learn motion characteristics within the test sequence.

Ablation Study on end-to-end and plug-in adapter adaptation. Models boosted by our proposed plug-in adapter require minimal finetuning parameters for adaptation, resulting in a 2 times improvement in efficiency while maintaining comparable inference efficiency and performance.

Motion field visualization. The VFI model boosted by our proposed motion adaptation can estimate more precise motion fields, thereby producing synthesized frames with higher quality.

More Visualizations

More visualizations on Vimeo90K benchmark. The patches for careful comparison are marked with red in the original images.

More visualizations on DAVIS benchmark. The patches for careful comparison are marked with red in the original images.

More visualizations on SNU-FILM benchmark. The patches for careful comparison are marked with red in the original images.

Acknowledgements

Based on a template by Phillip Isola and Richard Zhang.