Lexical Normalization

Lexical normalization is the task of translating/transforming a non standard text to a standard register. Example: new pix comming tomoroe new pictures coming tomorrow Datasets usually consists of tweets, since these naturally contain a fair amount of these phenomena. For lexical normalization, only replacements on the word-level are annotated. Some corpora include annotation for 1-N and N-1 replacements. However, word insertion/deletion and reordering is not part of the task.

Benchmarks

Libraries

Datasets

Subtasks

Most implemented papers

MoNoise: Modeling Noise Using a Modular Normalization System

Content

Modeling Input Uncertainty in Neural Network Dependency Parsing

Adapting Sequence to Sequence models for Text Normalization in Social Media

MoNoise: A Multi-lingual and Easy-to-use Lexical Normalization Tool

Adapting deep learning for sentiment classification of code-switched informal short text

A Multi-cascaded Deep Model for Bilingual SMS Classification

A clustering framework for lexical normalization of Roman Urdu

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

DaN+: Danish Nested Named Entities and Lexical Normalization

User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization