Eating sound dataset for 20 food types and sound classification using convolutional neural networks

3Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Food identification technology potentially benefits both food and media industries, and can enrich human-computer interaction. We assembled a food classification dataset consisting of 11,141 clips, based on YouTube videos of 20 food types. This dataset is freely available on Kaggle. We suggest the grouped holdout evaluation protocol as evaluation method to assess model performance. As a first approach, we applied Convolutional Neural Networks on this dataset. When applying an evaluation protocol based on grouped holdout, the model obtained an accuracy of 18.5%, whereas when applying an evaluation protocol based on uniform holdout, the model obtained an accuracy of 37.58%. When approaching this as a binary classification task, the model performed well for most pairs. In both settings, the method clearly outperformed reasonable baselines. We found that besides texture properties, eating action differences are important consideration for data driven eating sound researches. Protocols based on biting sound are limited to textural classification and less heuristic while assembling food differences.

Cite

CITATION STYLE

APA

Ma, J. S., Gómez Maureira, M. A., & Van Rijn, J. N. (2020). Eating sound dataset for 20 food types and sound classification using convolutional neural networks. In ICMI 2020 Companion - Companion Publication of the 2020 International Conference on Multimodal Interaction (pp. 348–351). Association for Computing Machinery, Inc. https://doi.org/10.1145/3395035.3425656

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free