This paper reduces literature graph construction into familiar NLP tasks, point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task.
We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction into familiar NLP tasks (e.g., entity extraction and linking), point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task. The methods described in this paper are used to enable semantic features in www.semanticscholar.org.
Ahmed Elgohary
3 papers
Sebastian Kohlmeier
3 papers
Dirk Groeneveld
1 papers
Chandra Bhagavatula
10 papers
Iz Beltagy
11 papers
Miles Crawford
1 papers
Doug Downey
8 papers
Jason Dunkelberger
1 papers
Sergey Feldman
3 papers
Vu A. Ha
1 papers
Rodney Michael Kinney
3 papers
Kyle Lo
14 papers
Tyler C. Murray
1 papers
Hsu-Han Ooi
1 papers
Matthew E. Peters
7 papers
Joanna L. Power
1 papers
Sam Skjonsberg
2 papers
Lucy Lu Wang
7 papers
Christopher Wilhelm
2 papers
Zheng Yuan
1 papers