CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark (2021-06-15T00:00:00.000000Z)

TL;DR

The first Chinese Biomedical Language Understanding Evaluation Evaluation (CBLUE) benchmark is presented: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification, and an associated online platform for model evaluation, comparison, and analysis.

Abstract

Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually offering great promise for medical practice. With the development of biomedical language understanding benchmarks, AI applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.

Authors

References48 items

Semantic categorization of Chinese eligibility criteria in clinical trials using machine learning methods

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art

MedDialog: A Large-scale Medical Dialogue Dataset

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

MedDG: A Large-scale Medical Consultation Dataset for Building Medical Dialogue System

CMeIE: Construction and Evaluation of Chinese Medical Information Extraction Dataset

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

CORD-19: The Covid-19 Open Research Dataset

CLUE: A Chinese Language Understanding Evaluation Benchmark

Revisiting Pre-Trained Models for Chinese Natural Language Processing

5分で分かる!? 有名論文ナナメ読み：Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Natural Language

PyTorch: An Imperative Style, High-Performance Deep Learning Library

ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

UER: An Open-Source Toolkit for Pre-training Models

PubMedQA: A Dataset for Biomedical Research Question Answering

RoBERTa: A Robustly Optimized BERT Pretraining Approach

An Open Source AutoML Benchmark

Pre-Training with Whole Word Masking for Chinese BERT

RENET: A Deep Learning Approach for Extracting Gene-Disease Associations from Literature

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

SciBERT: A Pretrained Language Model for Scientific Text

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Task-oriented Dialogue System for Automatic Diagnosis

The Natural Language Decathlon: Multitask Learning as Question Answering

BioRead: A New Dataset for Biomedical Reading Comprehension

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Datasheets for datasets

SentEval: An Evaluation Toolkit for Universal Sentence Representations

Utility-preserving anonymization for health data publishing

Biomedical Natural Language Processing

BioCreative V CDR task corpus: a resource for chemical disease relation extraction

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition

Domain-speciﬁc language model pretraining

Building a Pediatric Medical Corpus: Word Segmentation and Named Entity Annotation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

2018. Task-oriented dialogue

Pediatrics, 9th edn. People's Medical Publishing House

Clinical Pedi

Introduction to Artificial Intelligence: Reasoning with Constraints

Empirical Methods in Natural Language Processing

Measuring nominal scale agreement among many raters.

C Limitations Although our CBLUE offers diverse settings

Chinese biomedical language understanding benchmark, can serve as an open testbed for model evaluations to promote the advancement of this technology

The instance is quite different from written language