Purpose: Virtual reality-based simulators have the potential to become an essential part of surgical education. To make full use of this potential, they must be able to automatically recognize activities performed by users and assess those. Since annotations of trajectories by human experts are expensive, there is a need for methods that can learn to recognize surgical activities in a data-efficient way. Methods: We use self-supervised training of deep encoder–decoder architectures to learn representations of surgical trajectories from video data. These representations allow for semi-automatic extraction of features that capture information about semantically important events in the trajectories. Such features are processed as inputs of an unsupervised surgical activity recognition pipeline. Results: Our experiments document that the performance of hidden semi-Markov models used for recognizing activities in a simulated myomectomy scenario benefits from using features extracted from representations learned while training a deep encoder–decoder network on the task of predicting the remaining surgery progress. Conclusion: Our work is an important first step in the direction of making efficient use of features obtained from deep representation learning for surgical activity recognition in settings where only a small fraction of the existing data is annotated by human domain experts and where those annotations are potentially incomplete.
CITATION STYLE
Paysan, D., Haug, L., Bajka, M., Oelhafen, M., & Buhmann, J. M. (2021). Self-supervised representation learning for surgical activity recognition. International Journal of Computer Assisted Radiology and Surgery, 16(11), 2037–2044. https://doi.org/10.1007/s11548-021-02493-z
Mendeley helps you to discover research relevant for your work.