3260 papers • 126 benchmarks • 313 datasets
Academic studies estimate that up to 15% of Twitter users are automated bot accounts [1]. The prevalence of Twitter bots coupled with the ability of some bots to give seemingly human responses has enabled these non-human accounts to garner widespread influence. Hence, detecting non-human Twitter users or automated bot accounts using machine learning techniques has become an area of interest to researchers in the last few years. [1] https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15587
(Image credit: Papersgraph)
These leaderboards are used to track progress in twitter-bot-detection
Use these libraries to find twitter-bot-detection models and implementations
No subtasks available.
A graphical framework that generalizes existing attacks in discrete domains, can accommodate complex cost functions beyond $p-norms, including financial cost incurred when attacking a classifier, and efficiently produces valid adversarial examples with guarantees of minimal adversarial cost is introduced.
This paper presents state of the art methods for addressing three important challenges in automated fake news detection: fake news detection, domain identification, and bot identification in tweets. The proposed solutions achieved first place in a recent international competition on fake news. For fake news detection, we present two models. The winning model in the competition combines similarity between the embedding of each article's title and the embedding of the top five corresponding google search results. The new model relies on advances in Natural Language Understanding (NLU) end to end deep learning models to identify stylistic differences between legitimate and fake news articles. This second model was developed after the competition and outperforms the winning approach. For news domain detection, the winning model is a hybrid approach composed of named entity features concatenated with semantic embeddings derived from end to end models. For twitter bot detection, we propose to use the following features: duration between account creation and tweet date, presence of a tweet's link, presence of user's location, other tweet's features, and the tweets' metadata. Experiments include insights into the importance of the different features and the results indicate the superior performances of all proposed models.
This work proposes a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection, and performs a thorough evaluation of MGTAB and other public datasets.
A supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm, where the hyper-parameters are tuned via cross-validation and Shapley Additive Explanations (SHAP) for explaining the ML model predictions by calculating feature importance, using the game theoretic-based Shapley values.
TwiBot-22 is proposed, a comprehensive graph-based Twitter bot detection benchmark that presents the largest dataset to date, provides diversified entities and relations on the Twitter network, and has considerably better annotation quality than existing datasets.
It is shown that simple decision rules — shallow decision trees trained on a small number of features — achieve near-state- of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets.
A novel bot detection framework LMBot is proposed that distills the graph knowledge into language models (LMs) for graph-less deployment in Twitter bot detection to combat data dependency challenge and is compatible with graph-based and graph-less datasets.
BIC is proposed, a Twitter Bot detection framework with text-graph Interaction and semantic Consistency that consistently outperforms state-of-the-art baselines on two widely adopted datasets and reveals that text- graph interactions and modeling semantic consistency are essential improvements and help combat bot evolution.
Adding a benchmark result helps the community track progress.