3260 papers • 126 benchmarks • 313 datasets
Medical report generation (MRG) is a task which focus on training AI to automatically generate professional report according the input image data. This can help clinicians make faster and more accurate decision since the task itself is both time consuming and error prone even for experienced doctors. Deep neural network and transformer based architecture are currently the most popular methods for this certain task, however, when we try to transfer out pre-trained model into this certain domain, their performance always degrade. The following are some of the reasons why RSG is hard for pre-trained models: Language datasets in a particular domain can sometimes be quite different from the large number of datasets available on the Internet During the fine-tuning phase, datasets in the medical field are often unevenly distributed More recently, multi-modal learning and contrastive learning have shown some inspiring results in this field, but it's still challenging and requires further attention. Here are some additional readings to go deeper on the task: On the Automatic Generation of Medical Imaging Reports https://doi.org/10.48550/arXiv.1711.08195 A scoping review of transfer learning research on medical image analysis using ImageNet https://arxiv.org/abs/2004.13175 A Survey on Incorporating Domain Knowledge into Deep Learning for Medical Image Analysis https://arxiv.org/abs/2004.12150 (Image credit : Transformers in Medical Imaging: A Survey)
(Image credit: Papersgraph)
These leaderboards are used to track progress in medical-report-generation-3
Use these libraries to find medical-report-generation-3 models and implementations
No subtasks available.
This work builds a multi-task learning framework which jointly performs the prediction of tags and the generation of paragraphs, proposes a co-attention mechanism to localize regions containing abnormalities and generate narrations for them, and develops a hierarchical LSTM model to generate long paragraphs.
This work proposes an Auxiliary Signal-Guided Knowledge Encoder-Decoder (ASGK) to mimic radiologists’ working patterns and confirms that auxiliary signals driven Transformer-based models are with solid capabilities to outperform previous approaches on both medical terminology classification and paragraph generation metrics.
A novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG is proposed, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM), to implicitly mitigate the visual-linguistic confounders by causal front-door intervention.
We present CausalVLR (Causal Visual-Linguistic Reasoning), an open-source toolbox containing a rich set of state-of-the-art causal relation discovery and causal inference methods for various visual-linguistic reasoning tasks, such as VQA, image/video captioning, medical report generation, model generalization and robustness, etc. These methods have been included in the toolbox with PyTorch implementations under NVIDIA computing system. It not only includes training and inference codes, but also provides model weights. We believe this toolbox is by far the most complete visual-linguitic causal reasoning toolbox. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to re-implement existing methods and develop their own new causal reasoning methods. Code and models are available at https://github.com/HCPLab-SYSU/CausalVLR. The project is under active development by HCP-Lab's contributors and we will keep this document updated.
An AI-based method is proposed that intends to improve the conventional retinal disease treatment procedure and help ophthalmologists increase diagnosis efficiency and accuracy and is capable of creating meaningful retinal image descriptions and visual explanations that are clinically relevant.
It is shown that simple and even naive approaches yield near SOTA performance on most traditional NLP metrics, and that evaluation methods in this task should be further studied towards correctly measuring clinical accuracy, involving physicians to contribute to this end.
This work proposes VisualGPT, which employs a novel self-resurrecting encoder-decoder attention mechanism to quickly adapt the PLM with a small amount of in-domain image-text data and achieves the state-of-the-art result on IU X-ray, a medical report generation dataset.
This work is the first work to condition a pre-trained transformer on visual and semantic features to generate medical reports and to include semantic similarity metrics in the quantitative analysis of the generated reports.
This work proposes a novel weakly supervised contrastive loss for medical report generation that outperforms previous work on both clinical correctness and text generation metrics for two public benchmarks.
Adding a benchmark result helps the community track progress.