Self-supervised learning of representations has important potential applications in human behaviour understanding. The ability to learn useful representations from large unlabeled datasets by modeling intrinsic properties of the data has been successfully employed in various fields of machine learning, often outperforming transfer learning or fully supervised training. My research interests lie in applying these ideas to multimodal human-centric data. In this extended abstract, I present the direction of research that I have followed during the first half of my PhD, along with ideas and work in progress for the second half. My completed research so far demonstrates the potential of cross-modal self-supervision for audio representation learning, especially on small downstream datasets. I want to explore similar ideas for visual and multimodal representation learning, and apply them to speech and emotion recognition and multimodal question answering.
CITATION STYLE
Shukla, A. (2020). Learning Self-Supervised Multimodal Representations of Human Behaviour. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 4748–4751). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3416518
Mendeley helps you to discover research relevant for your work.