3260 papers • 126 benchmarks • 313 datasets
Native Language Identification (NLI) is the task of determining an author's native language (L1) based only on their writings in a second language (L2).
(Image credit: Papersgraph)
These leaderboards are used to track progress in native-language-identification-9
Use these libraries to find native-language-identification-9 models and implementations
No subtasks available.
N-grams covering word, character, POS and word-POS mixed representations and embedding based feature representations for Native Language Identification had a relatively poor performance, which could be because of the fact that embeddings capture semantic similarities whereas L1 differences are more stylistic in nature.
It is demonstrated that while NEM consistently reduces false positives when key named entities are mentioned, both masked and unmasked models exhibit increased false positive rates on English sentences by Russian native speakers, raising ethical considerations that should be addressed in future research.
This work proposes a method that represents the latent topical confounds and a model which “unlearns” confounding features by predicting both the label of the input text and the confound; but it shows that this model generalizes better and learns features that are indicative of the writing style rather than the content.
Adding a benchmark result helps the community track progress.