SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge (2023-12-31T00:00:00.000000Z)

TL;DR

This work releases the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP).

Abstract

Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation. The complete SAR-RARP50 dataset is available at: https://rdr.ucl.ac.uk/projects/SARRARP50_Segmentation_of_surgical_instrumentation_and_Action_Recognition_on_Robot-Assisted_Radical_Prostatectomy_Challenge/191091

References78 items

MATIS: Masked-Attention Transformers for Surgical Instrument Segmentation

Adaptive t-vMF Dice Loss for Multi-class Medical Image Segmentation

Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery

MSDESIS: Multi-task stereo disparity estimation and surgical instrument segmentation

SSIS-Seg: Simulation-Supervised Image Synthesis for Surgical Instrument Segmentation

PATG: position-aware temporal graph networks for surgical phase recognition on laparoscopic videos

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery

Gesture Recognition in Robotic Surgery with Multimodal Attention

Transformers in Medical Imaging: A Survey

Masked-attention Mask Transformer for Universal Image Segmentation

ASFormer: Transformer for Action Segmentation

ActionCLIP: A New Paradigm for Video Action Recognition

Towards accurate and interpretable surgical skill assessment: a video-based method for skill score prediction and guiding feedback generation

Ranger21: a synergistic deep learning optimizer

Multiscale Vision Transformers

EfficientNetV2: Smaller Models and Faster Training

Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer

Co-Generation and Segmentation for Generalized Surgical Instrument Segmentation on Unlabelled Data

Simulation-to-real domain adaptation with teacher–student learning for endoscopic instrument segmentation

Multi-Class Detection of Laparoscopic Instruments for the Intelligent Box-Trainer System Using Faster R-CNN Architecture

Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Computer Vision in the Surgical Operating Room

daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery

Synthetic and Real Inputs for Tool Segmentation in Robotic Surgery

A comparative analysis of multi-backbone Mask R-CNN for surgical tools detection

Surgical tool segmentation and localization using spatio-temporal deep network

2018 Robotic Scene Segmentation Challenge

Object-Contextual Representations for Semantic Segmentation

A DVRK-based Framework for Surgical Subtask Automation

On the Variance of the Adaptive Learning Rate and Beyond

Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video

Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis

Learning Where to Look While Tracking Instruments in Robot-assisted Surgery

High-Resolution Representations for Labeling Pixels and Regions

MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation

2017 Robotic Instrument Segmentation Challenge

UPSNet: A Unified Panoptic Segmentation Network

SlowFast Networks for Video Recognition

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Why rankings of biomedical image analysis competitions should be interpreted with care

SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network

Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning

Knowledge-based support for surgical workflow analysis and recognition

Decoupled Weight Decay Regularization

Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations

ToolNet: Holistically-nested real-time segmentation of robotic surgical tools

LinkNet: Exploiting encoder representations for efficient semantic segmentation

Attention is All you Need

The Kinetics Human Action Video Dataset

Feature Pyramid Networks for Object Detection

Xception: Deep Learning with Depthwise Separable Convolutions

Temporal Convolutional Networks: A Unified Approach to Action Segmentation

SGDR: Stochastic Gradient Descent with Warm Restarts

Deep Residual Learning for Image Recognition

U-Net: Convolutional Networks for Biomedical Image Segmentation

Adam: A Method for Stochastic Optimization

Microsoft COCO: Common Objects in Context

Combining embedded accelerometers with computer vision for recognizing food preparation activities

ImageNet: A large-scale hierarchical image database

Long Short-Term Memory

Japan

$I$-Divergence Geometry of Probability Distributions and Minimization Problems

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Seg-former: Simple and e ffi cient design for semantic segmentation with trans-formers

Robust deep learning-based semantic organ segmentation in hyperspectral images

Informer: Beyond e ffi cient transformer for long sequence time-series forecasting

E ffi cientnet: Rethinking model scaling for convolutional neural networks

JHU-ISI Gesture and Skill Assessment Working Set ( JIGSAWS ) : A Surgical Activity Dataset for Human Motion Modeling

The Chinese University of Hong Kong , Hong Kong , China

Reggio Emilia, Modena, Italy

Taipei, Taiwan

Parts of instrumentation that are fully submerged in fluids are not annotated (Fig. 2d)

Claspers that have holes, (i.e. Cadiere forceps and Pro-Grasp forceps), are labeled as if they were not perforated (Fig. 2b)

e) Tool parts near the edge of frames that are not clearly visible due to illumination, are not annotated (Fig. 2e)

2022. Towards holistic surgical scene understanding

When fluids occlude surgical instrumentation by floating on top of or away from them, masks are defined to approximate the expected shape of the occluded object (Fig. 2c)