Japanese Lexical Complexity for Non-Native Readers: A New Dataset (2023-06-30T00:00:00.000000Z)

TL;DR

To study lexical complexity in Japanese, the first Japanese LCP dataset is constructed and the effectiveness of a BERT-based system forJapanese LCP is demonstrated.

Abstract

Lexical complexity prediction (LCP) is the task of predicting the complexity of words in a text on a continuous scale. It plays a vital role in simplifying or annotating complex words to assist readers.To study lexical complexity in Japanese, we construct the first Japanese LCP dataset. Our dataset provides separate complexity scores for Chinese/Korean annotators and others to address the readers’ L1-specific needs. In the baseline experiment, we demonstrate the effectiveness of a BERT-based system for Japanese LCP.

Authors

Masato Mita

3 papers

Adam Nohejl

2 papers

Taro Watanabe

3 papers

TL;DR

Abstract

Authors

References30 items

Lexical simplification benchmarks for English, Portuguese, and Spanish

One Size Does Not Fit All: The Case for Personalised Word Complexity Models

SemEval-2021 Task 1: Lexical Complexity Prediction

ORGANIZATIONS

Word Complexity is in the Eye of the Beholder

Predicting lexical complexity in English texts: the Complex 2.0 dataset

Unknown vocabulary density and reading comprehension

Detecting Multiword Expression Type Helps Lexical Complexity Assessment

Word Complexity Estimation for Japanese Lexical Simplification

CompLex — A New Corpus for Lexical Complexity Prediction from Likert Scale Data

Complex Word Identification as a Sequence Labelling Task

A Report on the Complex Word Identification Shared Task 2018

The Construction of a Database to Support the Compilation of Japanese Learners’ Dictionaries

Applying Conditional Random Fields to Japanese Morphological Analysis

The Effects of Transferred Vocabulary Knowledge on the Development of L2 Reading Proficiency

Context availability and lexical decisions for abstract and concrete words

A Computer Readability Formula of Japanese Texts for Machine Scoring

Findings of the 2022 Conference on Machine Translation (WMT22)

OCHADAI-KYODAI at SemEval-2021 Task 1: Enhancing Model Generalization and Robustness for Lexical Complexity Prediction

Word Delimitation Issues in UD Japanese

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Adaptation of Long-Unit-Word Analysis System to Different Part-Of-Speech Tagset

Balanced Corpus of Contemporary Written Japanese

The development of an electronic dictionary for morphological analysis and its application to japanese corpus linguistics

A Dictionary of Japanese Functional Expressions with Hierarchical Organization

Bunshō rikai o sokushin suru goi chishiki no ryōteki sokumen : Kichigo ritsu no ikichi tansaku no kokoromi [What percentage of known words in a text facilities reading comprehension? : A Case

A Report to

Bivariate Agreement Coefficients for Reliability of Data

kiteish¯u dai 4 ban j¯o [Regulations of morphological information for balanced corpus of contemporary written Japanese 4th edition volume 1] (in Japanese)

United Arab Emirates (Hybrid)

Field of Study

Venue Information

Name

Type

URL

Alternate Names