1
The Document Vectors Using Cosine Similarity Revisited
2
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
3
TimeLMs: Diachronic Language Models from Twitter
4
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
5
Sentiment Analysis of Twitter Data Using Naïve Bayes Classifier
6
Composition and Style Attributes Guided Image Aesthetic Assessment
7
Generating Aesthetic Based Critique For Photographs
8
Sentiment Analysis of Drug Reviews using Transfer Learning
9
MUSIQ: Multi-scale Image Quality Transformer
10
Composing Photos Like a Photographer
11
Mass-scale emotionality reveals human behaviour and marketplace success
12
Training data-efficient image transformers & distillation through attention
13
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification
14
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
15
EVA: An Explainable Visual Aesthetics Dataset
16
Transformers: State-of-the-Art Natural Language Processing
17
A Unified Framework for Shot Type Classification Based on Subject Centric Lens
18
MovieNet: A Holistic Dataset for Movie Understanding
19
Adaptive Fractional Dilated Convolution Network for Image Aesthetics Assessment
20
The Pushshift Reddit Dataset
21
HuggingFace's Transformers: State-of-the-art Natural Language Processing
22
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
23
Aesthetic Image Captioning From Weakly-Labelled Photographs
24
RoBERTa: A Robustly Optimized BERT Pretraining Approach
25
Aesthetic Attributes Assessment of Images
26
Effective Aesthetics Prediction With Multi-Level Spatially Pooled Features
27
Photographic composition classification and dominant geometric element detection for outdoor scenes
28
Datasheets for datasets
29
Neural Aesthetic Image Reviewer
30
Decoupled Weight Decay Regularization
31
Aesthetic Critiques Generation for Photos
32
NIMA: Neural Image Assessment
33
SemEval-2017 Task 4: Sentiment Analysis in Twitter
34
Twitter sentiment analysis using hybrid cuckoo search method
35
A-Lamp: Adaptive Layout-Aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment
36
Sentiment Analysis on Tweets about Diabetes: An Aspect-Level Approach
37
Joint Image and Text Representation for Aesthetics Analysis
38
Comparison of Text Sentiment Analysis Based on Machine Learning
39
Photo Aesthetics Ranking Network with Attributes and Content Adaptation
40
An Image Is Worth More than a Thousand Favorites: Surfacing the Hidden Beauty of Flickr Pictures
41
Adam: A Method for Stochastic Optimization
42
RAPID: Rating Pictorial Aesthetics using Deep Learning
43
Fusion of Multichannel Local and Global Structural Cues for Photo Aesthetics Evaluation
44
AVA: A large-scale database for aesthetic visual analysis
45
Content-based photo quality assessment
46
Assessing the aesthetic quality of photographs using generic image descriptors
47
Scikit-learn: Machine Learning in Python
48
Studying Aesthetics in Photographic Images Using a Computational Approach
49
A Comprehensive Survey on Computational Aesthetic Evaluation of Visual Art Images: Metrics and Challenges
50
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
52
Sentiment Analysis on Twitter Data using KNN and SVM
53
Maintenance • Who is supporting/hosting/maintaining the dataset? RPCD is supported and maintained by ETH MTC and University of Milano-Bicocca
54
• Do any export controls or other regulatory restrictions apply to the dataset or to individual instances? No
55
If the dataset is a sample from a larger set, what was the sampling strategy (e.g., deterministic, probabilistic with specific sampling probabilities
56
• Were the individuals in question notified about the data collection? No
57
• Does the dataset relate to people? Yes, but not exclusively
58
• How many instances are there in total (of each type, if appropriate)? RPCD consists of 73,965 data instances. Specifically, there are 73,965 images and 219,790 photo critiques
59
(a) Did you state the full set of assumptions of all theoretical results
60
Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? [Yes] The main claims are listed at the end of Section 1
61
• Did the individuals in question consent to the collection and use of their data? According to Reddit's Privacy Policy 24 , which is accepted by every user upon registration
62
with respect to the random seed after running experiments multiple times)? [No] Training are very compute intensive. We can only run the training once per experiment using a random seed
63
Collection process • How was the data associated with each instance acquired? The data was directly observable (posts in Reddit stored in Pushshift's and Reddit's servers)
64
(c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [No] Training are very compute intensive
65
c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation?
66
Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]
67
Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable?
68
• Is there an erratum? All changes to the dataset will be announced on our Zenodo repository 28
69
Did you discuss any potential negative societal impacts of your work? [Yes] The potential negative societal impacts are described in Section 6