3260 papers • 126 benchmarks • 313 datasets
Context: Prediction of medical codes from clinical notes is both a practical and essential need for every healthcare delivery organization within current medical systems. Automating annotation will save significant time and excessive effort by human coders today. A new milestone will mark a meaningful step toward fully Autonomous Medical Coding in machines reaching parity with human coders' performance in medical code prediction. Question: What exactly is the medical code prediction problem? Answer: Clinical notes contain much information about what precisely happened during the patient's entire stay. And those clinical notes (e.g., discharge summary) is typically long, loosely structured, consists of medical domain language, and sometimes riddled with spelling errors. So, it's a highly multi-label classification problem, and the forthcoming ICD-11 standard will add more complexity to the problem! The medical code prediction problem is to annotate this clinical note with multiple codes subset from nearly 70K total codes (in the current ICD-10 system, for example).
(Image credit: Papersgraph)
These leaderboards are used to track progress in medical-code-prediction-32
Use these libraries to find medical-code-prediction-32 models and implementations
No subtasks available.
An attentional convolutional network that predicts medical codes from clinical text using a convolutionAL neural network and uses an attention mechanism to select the most relevant segments for each of the thousands of possible codes is presented.
A Multi-Filter Residual Convolutional Neural Network (MultiResCNN) for ICD coding that utilizes a multi-filter convolutional layer to capture various text patterns with different lengths and a residual convolutionAL layer to enlarge the receptive field.
MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework. Design Type(s) data integration objective Measurement Type(s) Demographics • clinical measurement • intervention • Billing • Medical History Dictionary • Pharmacotherapy • clinical laboratory test • medical data Technology Type(s) Electronic Medical Record • Medical Record • Electronic Billing System • Medical Coding Process Document • Free Text Format Factor Type(s) Sample Characteristic(s) Homo sapiens Design Type(s) data integration objective Measurement Type(s) Demographics • clinical measurement • intervention • Billing • Medical History Dictionary • Pharmacotherapy • clinical laboratory test • medical data Technology Type(s) Electronic Medical Record • Medical Record • Electronic Billing System • Medical Coding Process Document • Free Text Format Factor Type(s) Sample Characteristic(s) Homo sapiens Machine-accessible metadata file describing the reported data (ISA-Tab format)
This paper proposes a new label attention model for automatic ICD coding, which can handle both the various lengths and the interdependence of the ICD code related text fragments, and proposes a hierarchical joint learning mechanism extending the authors' label attention model to handle the issue.
The HLAN model and label embedding initialisation can provide better or comparable results for automated coding to the state-of-the-art, CNN-based models and can provide more comprehensive explanations for each label by highlighting key words and sentences in the discharge summaries, compared to the n-grams in the CNN-based models and the downgraded baselines.
The key idea is that for the automatic ICD coding task, the presence of informative snippets in the clinical text that correlated with each code plays an important role in the prediction of codes, and an informative snippet can be considered as a local and low-level feature.
A novel neural network, called the Multitask Balanced and Recalibrated Neural Network, is proposed to solve the imbalanced class problem and outperforms competitive baselines on a real-world clinical dataset called the Medical Information Mart for Intensive Care (MIMIC-III).
A multitask recalibrated aggregation network that shares information across different coding schemes and captures the dependencies between different medical codes is proposed to solve the challenges of encoding lengthy and noisy clinical documents and capturing code associations.
A model based on bidirectional encoder representations from transformers (BERT) using the sequence attention method for automatic ICD code assignment and is performing better than a performance of the state-of-the-art model using the MIMIC-III dataset.
This paper is the first attempt at learning the label set distribution as a reranking module for ICD coding, and is able to improve upon best-performing predictors for medical code prediction on the benchmark MIMIC datasets.
Adding a benchmark result helps the community track progress.