LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection (2023-06-30T00:00:00.000000Z)

TL;DR

A novel bot detection framework LMBot is proposed that distills the graph knowledge into language models (LMs) for graph-less deployment in Twitter bot detection to combat data dependency challenge and is compatible with graph-based and graph-less datasets.

Abstract

As malicious actors employ increasingly advanced and widespread bots to disseminate misinformation and manipulate public opinion, the detection of Twitter bots has become a crucial task. Though graph-based Twitter bot detection methods achieve state-of-the-art performance, we find that their inference depends on the neighbor users multi-hop away from the targets, and fetching neighbors is time-consuming and may introduce sampling bias. At the same time, our experiments reveal that after finetuning on Twitter bot detection task, pretrained language models achieve competitive performance while do not require a graph structure during deployment. Inspired by this finding, we propose a novel bot detection framework LMBot that distills the graph knowledge into language models (LMs) for graph-less deployment in Twitter bot detection to combat data dependency challenge. Moreover, LMBot is compatible with graph-based and graph-less datasets. Specifically, we first represent each user as a textual sequence and feed them into the LM for domain adaptation. For graph-based datasets, the output of LM serves as input features for the GNN, enabling LMBot to optimize for bot detection and distill knowledge back to the LM in an iterative, mutually enhancing process. Armed with the LM, we can perform graph-less inference with graph knowledge, which resolves the graph data dependency and sampling bias issues. For datasets without graph structure, we simply replace the GNN with an MLP, which also shows strong performance. Our experiments demonstrate that LMBot achieves state-of-the-art performance on four Twitter bot detection benchmarks. Extensive studies also show that LMBot is more robust, versatile, and efficient compared to existing graph-based Twitter bot detection methods.

References51 items

BotMoE: Twitter Bot Detection with Community-Aware Mixtures of Modal-Specific Experts

BotTriNet: A Unified and Efficient Embedding for Social Bots Detection via Metric Learning

Linkless Link Prediction via Relational Distillation

BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency

BotBuster: Multi-platform Bot Detection Using A Mixture of Experts

RoSGAS: Adaptive Social Bot Detection with Reinforced Self-supervised GNN Architecture Search

TwiBot-22: Towards Graph-Based Twitter Bot Detection

DeeProBot: a hybrid deep neural network model for social bot detection based on user profile data

Social Bots Detection via Fusing BERT and Graph Convolutional Networks

Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation

Heterogeneity-aware Twitter Bot Detection with Relational Graph Transformers

Are we really making much progress?: Revisiting, benchmarking and refining heterogeneous graph neural networks

EDITS: Modeling and Mitigating Data Bias for Graph Neural Networks

TwiBot-20: A Comprehensive Twitter Bot Detection Benchmark

BotRGCN: Twitter bot detection with relational graph convolutional networks

Towards Understanding and Mitigating Social Biases in Language Models

SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detection

Twitter Bot Detection with Reduced Feature Set

Bot-Detective: An explainable Twitter bot detection service with crowdsourcing functionalities

Transformers: State-of-the-Art Natural Language Processing

TinyGNN: Learning Efficient Graph Neural Networks

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

BERTweet: A pre-trained language model for English Tweets

StereoSet: Measuring stereotypical bias in pretrained language models

Twitter Bot Detection Using Bidirectional Long Short-Term Memory Neural Networks and Word Embeddings

Scalable and Generalizable Social Bot Detection through Data Selection

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

TinyBERT: Distilling BERT for Natural Language Understanding

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Detect Me If You Can: Spam Bot Detection Using Inductive Representation Learning

Fast Graph Representation Learning with PyTorch Geometric

LOBO: Evaluation of Generalization Deficiencies in Twitter Bot Classifiers

Deep Neural Networks for Bot Detection

Modeling Relational Data with Graph Convolutional Networks

The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race

Fame for sale: Efficient detection of fake Twitter followers

Distilling the Knowledge in a Neural Network

Inferring Latent User Properties from Texts Published in Social Media

The rise of social bots

Twitter spammer detection using data stream clustering

Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers

Scikit-learn: Machine Learning in Python

Sampling from large graphs

On Information and Sufficiency

BotPercent: Estimating Twitter Bot Populations from Groups to Crowds

LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection Conference acronym ’XX

Bot-hunter: A Tiered Approach to Detecting & Characterizing Automated Activity on Twitter

2017. Junk news and bots during the US election: What were Michigan voters sharing over Twitter. CompProp, OII, Data Memo 1 (2017)

Kipf and MaxWelling