FastFoley: Non-autoregressive Foley Sound Generation Based on Visual Semantics

Sipan Li; Luwen Zhang; Chenyu Dong; Haiwei Xue; Zhiyong Wu; Lifa Sun; Kun Li; Helen Meng

Conference Proceedings

FastFoley: Non-autoregressive Foley Sound Generation Based on Visual Semantics

Communications in Computer and Information Science (2023) 1765 CCIS 252-263

DOI: 10.1007/978-981-99-2401-1_23

0Citations

1Readers

Get full text

Abstract

Foley sound in movies and TV episodes is of great importance to bring a more realistic feeling to the audience. Traditionally, foley artists need to create the foley sound synchronous with the content occurring in the video using their expertise. However, it is quite laborious and time consuming. In this paper, we present FastFoley, a Transformer based non-autoregressive deep-learning method that can be used to synthesize a foley audio track from the silent video clip. Existing cross-model generation methods are still based on autoregressive models such as long short-term memory (LSTM) recurrent neural network. Our FastFoley offers a new non-autoregressive framework on the audio-visual task. Upon videos provided, FastFoley can synthesize associated audio files, which outperforms the LSTM based methods in time synchronization, sound quality, and sense of reality. Particularly, we have also created a dataset called Audio-Visual Foley Dataset(AVFD) for related foley work and make it open-source, which can be downloaded at https://github.com/thuhcsi/icassp2022-FastFoley.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, S., Zhang, L., Dong, C., Xue, H., Wu, Z., Sun, L., … Meng, H. (2023). FastFoley: Non-autoregressive Foley Sound Generation Based on Visual Semantics. In Communications in Computer and Information Science (Vol. 1765 CCIS, pp. 252–263). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-2401-1_23

FastFoley: Non-autoregressive Foley Sound Generation Based on Visual Semantics

Abstract

Author supplied keywords

Cite

Register to see more suggestions