3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in vision-language-segmentation-10
No benchmarks available.
Use these libraries to find vision-language-segmentation-10 models and implementations
No datasets available.
No subtasks available.
This study introduces the first systematic study on transferring VLSMs to 2D medical images, using carefully curated datasets encompassing diverse modalities and insightful language prompts and experiments, and demonstrates that although VLSMs show competitive performance compared to image-only models for segmentation after finetuning in limited medical image datasets, not all VLSMs utilize the additional information from language prompts.
This study evaluates results for two popular VLSMs using seven different kinds of language prompts derived from several attributes, automatically extracted from echocardiography images, segmentation masks, and their metadata, and shows improved metrics and faster convergence when pretraining VLSM on SDM-generated synthetic images before finetuning on real images.
It is shown that a natural non-negative decomposition of mutual information emerges, allowing us to quantify informative relationships between words and pixels in an image and to measure effects when selectively editing images through prompt interventions.
A novel adapter, VLSM-Adapter, that can fine-tune pretrained vision-language segmentation models using transformer encoders is introduced that outperforms state-of-the-art and is comparable to the upper bound end-to-end fine-tuning.
Adding a benchmark result helps the community track progress.