Lyric video analysis using text detection and tracking

1Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We attempt to recognize and track lyric words in lyric videos. Lyric video is a music video showing the lyric words of a song. The main characteristic of lyric videos is that the lyric words are shown at frames synchronously with the music. The difficulty of recognizing and tracking the lyric words is that (1) the words are often decorated and geometrically distorted and (2) the words move arbitrarily and drastically in the video frame. The purpose of this paper is to analyze the motion of the lyric words in lyric videos, as the first step of automatic lyric video generation. In order to analyze the motion of lyric words, we first apply a state-of-the-art scene text detector and recognizer to each video frame. Then, lyric-frame matching is performed to establish the optimal correspondence between lyric words and the frames. After fixing the motion trajectories of individual lyric words from correspondence, we analyze the trajectories of the lyric words by k-medoids clustering and dynamic time warping (DTW).

Cite

CITATION STYLE

APA

Sakaguchi, S., Kato, J., Goto, M., & Uchida, S. (2020). Lyric video analysis using text detection and tracking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12116 LNCS, pp. 426–440). Springer. https://doi.org/10.1007/978-3-030-57058-3_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free