medical-11

Surgical Gesture Recognition

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in surgical-gesture-recognition-11

Trend

Dataset

Best Model

Actions

No benchmarks available.

Libraries

i

Use these libraries to find surgical-gesture-recognition-11 models and implementations

Datasets

MISAW

PETRAW

Subtasks

No subtasks available.

Most implemented papers

Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification

Tingting Jiang, Daochang Liu•Wed Jun 20 2018

The proposed method performs better than state-of-the-art methods in terms of the edit score and on par in frame-wise accuracy and is integrated into the action design and reward mechanism to reduce over-segmentation errors.

67

Content

0

Paper Graph

Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video

S. Speidel, S. Bodenstedt, Isabel Funke, F. Oehme, F. Bechtolsheim, J. Weitz•Thu Jul 25 2019

This work proposes to use a 3D Convolutional Neural Network to learn spatiotemporal features from consecutive video frames to achieve high frame-wise surgical gesture recognition accuracies, outperforming comparable models that either extract only spatial features or model spatial and low-level temporal information separately.

96 0

Paper Graph

Symmetric Dilated Convolution for Surgical Gesture Recognition

J. Zhang, Jinglu Zhang, Y. Nie, Yao Lyu, Hailin Li, Jian Chang, Xiaosong Yang•Sun Jul 12 2020

A novel temporal convolutional architecture to automatically detect and segment surgical gestures with corresponding boundaries only using RGB videos is proposed with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns and establish the frame-to-frame relationship accordingly.

23 0

Paper Graph

Bounded Future MS-TCN++ for surgical gesture recognition

Adam Goldbraikh, Netanell Avisdris, Carla M. Pugh, S. Laufer•Wed Sep 28 2022

The goal in this study was to learn the performance-delay trade-off and design an MS-TCN++-based algorithm that can utilize this trade-offs and achieve significantly better performance than in the naive approach.

6 0

Paper Graph

Zero-shot prompt-based video encoder for surgical gesture recognition

Soheil Kolouri, Mingxing Rao, Yi-Ping Qin, Jie Ying Wu, Daniel Moyer•Wed Mar 27 2024

In order to produce a surgical gesture recognition system that can support a wide variety of procedures, either a very large annotated dataset must be acquired, or fitted models must generalize to new labels (so-called zero-shot capability). In this paper we investigate the feasibility of latter option. Leveraging the bridge-prompt framework, we prompt-tune a pre-trained vision-text model (CLIP) for gesture recognition in surgical videos. This can utilize extensive outside video data such as text, but also make use of label meta-data and weakly supervised contrastive losses. Our experiments show that prompt-based video encoder outperforms standard encoders in surgical gesture recognition tasks. Notably, it displays strong performance in zero-shot scenarios, where gestures/tasks that were not provided during the encoder training phase are included in the prediction phase. Additionally, we measure the benefit of inclusion text descriptions in the feature extractor training schema. Bridge-prompt and similar pre-trained + prompt-tuned video encoder models present significant visual representation for surgical robotics, especially in gesture recognition tasks. Given the diverse range of surgical tasks (gestures), the ability of these models to zero-shot transfer without the need for any task (gesture) specific retraining makes them invaluable.

5 0

Paper Graph

Adding a benchmark result helps the community track progress.