3260 papers • 126 benchmarks • 313 datasets
Hate speech detection is the task of detecting if communication such as text, audio, and so on contains hatred and or encourages violence towards a person or a group of people. This is usually based on prejudice against 'protected characteristics' such as their ethnicity, gender, sexual orientation, religion, age et al. Some example benchmarks are ETHOS and HateXplain. Models can be evaluated with metrics like the F-score or F-measure.
(Image credit: Papersgraph)
These leaderboards are used to track progress in hate-speech-detection-4
Use these libraries to find hate-speech-detection-4 models and implementations
This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.
This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.
Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, is presented, which is aimed to fully and responsibly share with interested researchers.
HateXplain is introduced, the first benchmark hate speech dataset covering multiple aspects of the issue and utilizes existing state-of-the-art models, observing that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.
This paper conducts the first comparative study of various learning models on Hate and Abusive Speech on Twitter, and shows that bidirectional GRU networks trained on word-level features, with Latent Topic Clustering modules, is the most accurate model.
HateCheck, a suite of functional tests for hate speech detection models that specifies 29 model functionalities motivated by a review of previous research and a series of interviews with civil society stakeholders, is introduced.
A large scale analysis of multilingual hate speech in 9 languages from 16 different sources shows that in low resource setting, simple models such as LASER embedding with logistic regression performs the best, while in high resource setting BERT based models perform better.
An annotated corpus ofhate speech with context information well kept is provided and two types of hate speech detection models that incorporate context information are proposed, a logistic regression model with context features and a neural network model with learning components for context.
A custom annotation tool has been developed to carry out the manual labelling task which, among other things, allows the annotators to choose whether to read the context of a sentence before labelling it.
The machine learning models developed for the Automatic Misogyny Identification (AMI) shared task at EVALITA 2018 are presented and the winning model is released for public use.
Adding a benchmark result helps the community track progress.