The advent of affordable wearable devices with a video camera has established the new form of social data, lifelogs, where lives of people are captured to video. Enormous amount of lifelog data and need for on-site processing demand new fast video processing methods. In this work, we experimentally investigate seven hours of lifelogs and point out novel findings: (1) audio cues are exceptionally strong for lifelog processing; (2) cascades of audio and video detectors improve accuracy and enable fast (super frame rate) processing speed. We first construct strong detectors using state-of-the-art audio and visual features: Mel-frequency cepstral coefficients (MFCC), colour (RGB) histograms, and local patch descriptors (SIFT). In the second stage, we construct a cascade of the trained detectors and optimise cascade parameters. Separating the detector and cascade optimisation stages simplify training and results to a fast and accurate processing pipeline.
CITATION STYLE
Mahkonen, K., Kämäräinen, J. K., & Virtanen, T. (2015). Lifelog scene change detection using cascades of audio and video detectors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9010, pp. 434–444). Springer Verlag. https://doi.org/10.1007/978-3-319-16634-6_32
Mendeley helps you to discover research relevant for your work.