AxCell: Automatic Extraction of Results from Machine Learning Papers (2020-04-29T00:00:00.000000Z)

TL;DR

AxCell, an automatic machine learning pipeline for extracting results from papers using several novel components, including a table segmentation subtask, to learn relevant structural knowledge that aids extraction significantly improves the state of the art for results extraction.

Abstract

Tracking progress in machine learning has become increasingly difficult with the recent explosion in the number of papers. In this paper, we present AxCell, an automatic machine learning pipeline for extracting results from papers. AxCell uses several novel components, including a table segmentation subtask, to learn relevant structural knowledge that aids extraction. When compared with existing methods, our approach significantly improves the state of the art for results extraction. We also release a structured, annotated dataset for training models for results extraction, and a dataset for evaluating the performance of models on this task. Lastly, we show the viability of our approach enables it to be used for semi-automated results extraction in production, suggesting our improvements make this task practically viable for the first time. Code is available on GitHub.

Authors

Sebastian Ruder

20 papers

Pontus Stenetorp

9 papers

Sebastian Riedel

20 papers

AxCell: Automatic Extraction of Results from Machine Learning Papers

TL;DR

Abstract

Authors

References15 items

TaPas: Weakly Supervised Table Parsing via Pre-training

fastai: A Layered API for Deep Learning

Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing

A framework for information extraction from tables in biomedical literature

Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates

TabVec: Table Vectors for Classification of Web Tables

Automated Early Leaderboard Generation from Comparative Tables

Universal Language Model Fine-tuning for Text Classification

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Table extraction for answer retrieval

Table extraction using conditional random fields

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

The discipline of machine learning

Field of Study

Journal Information

Name

Volume

Venue Information

Name

Type

URL

Alternate Names