3260 papers • 126 benchmarks • 313 datasets
Dialectal Arabic Identification
(Image credit: Papersgraph)
These leaderboards are used to track progress in dialect-identification-22
No benchmarks available.
Use these libraries to find dialect-identification-22 models and implementations
No subtasks available.
This paper releases "AraCOVID19-MFH" a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset that contains 10,828 Arabic tweets annotated with 10 different labels.
GlotLID-M is published, an LID model that satisfies the desiderata of wide coverage, reliability and efficiency, and it identifies 1665 languages, a large increase in coverage compared to prior work.
This work investigates different approaches for dialect identification in Arabic broadcast speech, using phonetic, lexical features obtained from a speech recognition system, and acoustic features using the i-vector framework, and combined these features using a multi-class Support Vector Machine (SVM).
This paper demonstrates the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification.
This paper describes the submission to the DSL 2016 shared-task, which included two sub-tasks: one on discriminating similar languages and one on identifying Arabic dialects, and develops a character-level neural network for this task.
The design of CAMeL Tools is described and the functionalities it provides are described, including utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis.
This paper presents the experiments conducted, and the models developed by the competing team, Mawdoo3 AI, along the way to achieving the winning solution to subtask 1 of the Nuanced Arabic Dialect Identification (NADI) shared task.
The Arabic MGB-Challenge comprised two tasks: speech transcription and Arabic dialect identification, introduced this year in order to distinguish between four major Arabic dialects — Egyptian, Levantine, North African, Gulf, as well as Modern Standard Arabic.
A subjective evaluation by human annotators shows that humans attain much lower accuracy rates compared with ML models, and experiments show that ML models can accurately identify the dialects, even at the sentence level and across different domains (news articles vs. tweets).
Adding a benchmark result helps the community track progress.