3260 papers • 126 benchmarks • 313 datasets
Online surgical phase recognition: the first 40 videos to train, the last 40 videos to test.
(Image credit: Papersgraph)
These leaderboards are used to track progress in online-surgical-phase-recognition-2
No benchmarks available.
Use these libraries to find online-surgical-phase-recognition-2 models and implementations
No subtasks available.
This paper proposes a novel method for phase recognition that uses a convolutional neural network (CNN) to automatically learn features from cholecystectomy videos and that relies uniquely on visual information.
A new non end-to-end training strategy and different designs of multi-stage architecture for surgical phase recognition task, which largely boosts the performance of the current state-of-the-art single-stage model.
This work confronts the problem of learning surgical phase recognition in scenarios presenting scarce amounts of annotated data and proposes a teacher/student type of approach, where a strong predictor called the teacher, trained beforehand on a small dataset of ground truth-annotated videos, generates synthetic annotations for a larger dataset, which another model - the student - learns from.
This paper introduces SKiT, a fast Key information Transformer for phase recognition of videos. Unlike previous methods that rely on complex models to capture long-term temporal information, SKiT accurately recognizes high-level stages of videos using an efficient key pooling operation. This operation records important key information by retaining the maximum value recorded from the beginning up to the current video frame, with a time complexity of ${\mathcal{O}}\left( 1 \right)$. Experimental results on Cholec80 and AutoLaparo surgical datasets demonstrate the ability of our model to recognize phases in an online manner. SKiT achieves higher performance than state-of-the-art methods with an accuracy of 92.5% and 82.9% on Cholec80 and AutoLaparo, respectively, while running the temporal model eight times faster (7ms v.s. 55ms) than LoViT, which uses ProbSparse to capture global information. We highlight that the inference time of SKiT is constant, and independent from the input length, making it a stable choice for keeping a record of important global information, that appears on long surgical videos, essential for phase recognition. To sum up, we propose an effective and efficient model for surgical phase recognition that leverages key global information. This has an intrinsic value when performing this task in an online manner on long surgical videos for stable real-time surgical recognition systems.
The results demonstrate the effectiveness of the LoViT approach in achieving state-of-the-art performance of surgical phase recognition on two datasets of different surgical procedures and temporal sequencing characteristics.
Adding a benchmark result helps the community track progress.