Understanding social interactions is relevant across many domains and applications, including psychology, behavioral sciences, human computer interaction, and healthcare. In this paper, we present a practical approach for automatically detecting face-to-face conversations by leveraging the acoustic sensing capabilities of an off-the-shelf, unmodified smartwatch. Our proposed framework incorporates feature representations extracted from different neural network setups and shows the benefits of feature fusion. The framework does not require an acoustic model specifically trained to the speech of the individual wearing the watch or of those nearby. We evaluate our framework with 39 participants in 18 homes in a semi-naturalistic study and with four participants in free living, obtaining an F1 score of 83.2% and 83.3% respectively for detecting user's conversations with the watch. Additionally, we study the real-time capability of our framework by deploying a system on an actual smartwatch and discuss several strategies to improve its practicality in real life. To support further work in this area by the research community, we also release our annotated dataset of conversations.
CITATION STYLE
Liang, D., Zhang, A., & Thomaz, E. (2023). Automated Face-To-Face Conversation Detection on a Commodity Smartwatch with Acoustic Sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7(3). https://doi.org/10.1145/3610882
Mendeley helps you to discover research relevant for your work.