Food identification technology potentially benefits both food and media industries, and can enrich human-computer interaction. We assembled a food classification dataset consisting of 11,141 clips, based on YouTube videos of 20 food types. This dataset is freely available on Kaggle. We suggest the grouped holdout evaluation protocol as evaluation method to assess model performance. As a first approach, we applied Convolutional Neural Networks on this dataset. When applying an evaluation protocol based on grouped holdout, the model obtained an accuracy of 18.5%, whereas when applying an evaluation protocol based on uniform holdout, the model obtained an accuracy of 37.58%. When approaching this as a binary classification task, the model performed well for most pairs. In both settings, the method clearly outperformed reasonable baselines. We found that besides texture properties, eating action differences are important consideration for data driven eating sound researches. Protocols based on biting sound are limited to textural classification and less heuristic while assembling food differences.
CITATION STYLE
Ma, J. S., Gómez Maureira, M. A., & Van Rijn, J. N. (2020). Eating sound dataset for 20 food types and sound classification using convolutional neural networks. In ICMI 2020 Companion - Companion Publication of the 2020 International Conference on Multimodal Interaction (pp. 348–351). Association for Computing Machinery, Inc. https://doi.org/10.1145/3395035.3425656
Mendeley helps you to discover research relevant for your work.