A fast solution to propaganda detection at SemEval-2020 Task 11, based on feature adjustment, using per-token vectorization of features and a simple Logistic Regression classifier to quickly test different hypotheses about the data.
The article describes a fast solution to propaganda detection at SemEval-2020 Task 11, based on feature adjustment. We use per-token vectorization of features and a simple Logistic Regression classifier to quickly test different hypotheses about our data. We come up with what seems to us the best solution, however, we are unable to align it with the result of the metric suggested by the organizers of the task. We test how our system handles class and feature imbalance by varying the number of samples of two classes (Propaganda and None) in the training set, the size of a context window in which a token is vectorized and combination of vectorization means. The result of our system at SemEval2020 Task 11 is F-score=0.37.
Yuliya Bidulya
1 papers