3260 papers • 126 benchmarks • 313 datasets
Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language. For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech. Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources
(Image credit: Papersgraph)
These leaderboards are used to track progress in transliteration-10
No benchmarks available.
Use these libraries to find transliteration-10 models and implementations
No subtasks available.
This paper presents a treebank of Hindi-English code-switching tweets under Universal Dependencies scheme and proposes a neural stacking model for parsing that efficiently leverages the part-of-speech tag and syntactic tree annotations in the code- Switching treebank and the preexisting Hindi and English treebanks.
A new context independent method for bilingual term mapping using maximised character alignment maps that allows integrating linguistic resources that significantly increase the mapping recall while maintaining a stable precision.
This work presents three different methods for cleaning noise from automatically generated bilingual dictionaries: LLR, pivot and translation based approach, and shows that all methods help to reduce noise.
This work evaluates methods for name matching in Chinese, including both string matching and learning approaches, based on new representations for Chinese, which improves both name matching and a downstream entity clustering task.
This work presents LingEval97, a large-scale data set of 97000 contrastive translation pairs based on the WMT English->German translation task, with errors automatically created with simple rules, and finds that recently introduced character-level NMT systems perform better at transliteration than models with byte-pair encode segmentation, but perform more poorly at morphosyntactic agreement, and translating discontiguous units of meaning.
This work applies well-established techniques from NMT, such as dropout regularization, model ensembling, rescoring with right-to-left models, and back-translation to build a strong transliteration system.
Adding a benchmark result helps the community track progress.