In this paper, we introduce the DEEP-AD framework, a multimodal advertisement insertion system dedicated to online video platforms. The framework is designed from the viewer's perspective, in terms of commercial contextual relevance and degree of intrusiveness. The main contribution of the paper concerns a novel multimodal temporal video segmentation algorithm into scenes/stories, which makes it possible to determine automatically the temporal instants that are the most appropriate for inserting advertisement clips. The proposed algorithm exploits various deep convolutional neural networks, involved at several stages. The video stream is first divided into shots based on a graph partition method. The video shots are then clustered into scenes/story units with the help of an agglomerative clustering methodology taking as input visual, audio and semantic features. Furthermore, in order to facilitate the user's access to multimedia documents a novel thumbnail extraction method is proposed based on both semantic representativeness and visual quality information. Finally, the optimal advertisement insertion points are determined based on the ads temporal distribution, commercial diversity and degree of intrusiveness. The experimental results, carried out on a large dataset of more than 30 videos, taken from the French National Television and US TV series validate the proposed methodology with average accuracy and recognition rates superior to 88%. Moreover, when compared with other state of the art methods, the proposed temporal video segmentation yields gains of more than 6% in precision and recall rates.
CITATION STYLE
Tapu, R., Mocanu, B., & Zaharia, T. (2020). DEEP-AD: A Multimodal Temporal Video Segmentation Framework for Online Video Advertising. IEEE Access, 8, 99582–99597. https://doi.org/10.1109/ACCESS.2020.2997949
Mendeley helps you to discover research relevant for your work.