Recognition of Instrument-Tissue Interactions in Endoscopic Videos via Action Triplets

38Citations
Citations of this article
41Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room. In this work, we tackle the recognition of fine-grained activities, modeled as action triplets ⟨ instrument, verb, target⟩ representing the tool activity. To this end, we introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80 in which all frames have been annotated using 128 triplet classes. Furthermore, we present an approach to recognize these triplets directly from the video data. It relies on a module called class activation guide, which uses the instrument activation maps to guide the verb and target recognition. To model the recognition of multiple triplets in the same frame, we also propose a trainable 3D interaction space, which captures the associations between the triplet components. Finally, we demonstrate the significance of these contributions via several ablation studies and comparisons to baselines on CholecT40.

Cite

CITATION STYLE

APA

Nwoye, C. I., Gonzalez, C., Yu, T., Mascagni, P., Mutter, D., Marescaux, J., & Padoy, N. (2020). Recognition of Instrument-Tissue Interactions in Endoscopic Videos via Action Triplets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12263 LNCS, pp. 364–374). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-59716-0_35

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free