Research Connect
Research PapersAboutContact

The Values Encoded in Machine Learning Research

Published in Conference on Fairness, Accountabi... (2021-06-29)
aionlincourseaionlincourseaionlincourseaionlincourseaionlincourseaionlincourse
Generate GraphDownload

On This Page

  • TL;DR
  • Abstract
  • Authors
  • Datasets
  • References
TL

TL;DR

A method and annotation scheme for studying the values encoded in documents such as research papers is introduced and systematic textual evidence that these top values are being defined and applied with assumptions and implications generally supporting the centralization of power is found.

Abstract

Machine learning currently exerts an outsized influence on the world, increasingly affecting institutional practices and impacted communities. It is therefore critical that we question vague conceptions of the field as value-neutral or universally beneficial, and investigate what specific values the field is advancing. In this paper, we first introduce a method and annotation scheme for studying the values encoded in documents such as research papers. Applying the scheme, we analyze 100 highly cited machine learning papers published at premier machine learning conferences, ICML and NeurIPS. We annotate key features of papers which reveal their values: their justification for their choice of project, which attributes of their project they uplift, their consideration of potential negative consequences, and their institutional affiliations and funding sources. We find that few of the papers justify how their project connects to a societal need (15%) and far fewer discuss negative potential (1%). Through line-by-line content analysis, we identify 59 values that are uplifted in ML research, and, of these, we find that the papers most frequently justify and assess themselves based on Performance, Generalization, Quantitative evidence, Efficiency, Building on past work, and Novelty. We present extensive textual evidence and identify key themes in the definitions and operationalization of these values. Notably, we find systematic textual evidence that these top values are being defined and applied with assumptions and implications generally supporting the centralization of power. Finally, we find increasingly close ties between these highly cited papers and tech companies and elite universities.

Authors

Abeba Birhane

2 Papers

Pratyusha Kalluri

1 Paper

Dallas Card

1 Paper

References190 items

1

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Computer ScienceMathematics
2

Simplifying Graph Convolutional Networks

3

Hierarchical Graph Representation Learning with Differentiable Pooling

4

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

5

Neural Ordinary Differential Equations

Research Impact

237

Citations

190

References

0

Datasets

6

William Agnew

1 Paper

Ravit Dotan

1 Paper

Michelle Bao

1 Paper

6

Using Pre-Training Can Improve Model Robustness and Uncertainty

7

Disentangling by Factorising

8

Do ImageNet Classifiers Generalize to ImageNet?

9

A unified architecture for natural language processing: deep neural networks with multitask learning

10

Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?

11

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification

12

Curriculum learning

13

A Framework for Understanding Unintended Consequences of Machine Learning

14

Unpacking the Expressed Consequences of AI Research in Broader Impact Statements

15

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

16

Like a Researcher Stating Broader Impact For the Very First Time

17

Data feminism

18

The De-democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research

19

Against Scale: Provocations and Resistances to Scale Thinking

20

Utility Is in the Eye of the User: A Critique of NLP Leaderboard Design

21

The Grey Hoodie Project: Big Tobacco, Big Tech, and the Threat on Academic Integrity

22

Algorithmic Colonization of Africa

23

Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence

24

Don’t ask if artificial intelligence is good or fair, ask how it shifts power

25

Large image datasets: A pyrrhic win for computer vision?

Computer ScienceMathematics
26

Language (Technology) is Power: A Critical Survey of “Bias” in NLP

27

Performative Prediction

28

Race after technology. Abolitionist tools for the new Jim Code

29

An overview of the qualitative descriptive design within nursing research

30

Value-laden disciplinary shifts in machine learning

31

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

32

XLNet: Generalized Autoregressive Pretraining for Language Understanding

33

A Unified Framework of Five Principles for AI in Society

34

Unlabeled Data Improves Adversarial Robustness

35

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

36

Defending Against Neural Fake News

37

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

38

Unified Language Model Pre-training for Natural Language Understanding and Generation

39

MASS: Masked Sequence to Sequence Pre-training for Language Generation

40

MixMatch: A Holistic Approach to Semi-Supervised Learning

41

Adversarial Examples Are Not Bugs, They Are Features

42

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

43

Adversarial Training and Robustness for Multiple Perturbations

44

Adversarial Training for Free!

45

On Exact Computation with an Infinitely Wide Neural Net

46

Algorithms of oppression: how search engines reinforce racism

47

NAS-Bench-101: Towards Reproducible Neural Architecture Search

48

Wide neural networks of any depth evolve as linear models under gradient descent

49

Certified Adversarial Robustness via Randomized Smoothing

50

BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling

51

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

52

Adversarial Examples Are a Natural Consequence of Test Error in Noise

53

Fairness in representation: quantifying stereotyping as a representational harm

54

Error Feedback Fixes SignSGD and other Gradient Compression Schemes

55

Theoretically Principled Trade-off between Robustness and Accuracy

56

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

57

Cross-lingual Language Model Pretraining

58

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

59

Gradient Descent Finds Global Minima of Deep Neural Networks

60

A Convergence Theory for Deep Learning via Over-Parameterization

61

Video-to-Video Synthesis

62

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

63

Design Justice, A.I., and Escape from the Matrix of Domination

64

Troubling Trends in Machine Learning Scholarship

65

Glow: Generative Flow with Invertible 1x1 Convolutions

66

How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift)

67

Self-Attention Generative Adversarial Networks

68

Data-Efficient Hierarchical Reinforcement Learning

69

Construction of the Literature Graph in Semantic Scholar

70

Black-box Adversarial Attacks with Limited Queries and Information

71

Gender Recognition or Gender Reductionism?: The Social Implications of Embedded Gender Recognition Systems

72

Adversarially Robust Generalization Requires More Data

73

Adversarial Logit Pairing

74

Addressing Function Approximation Error in Actor-Critic Methods

75

Stronger generalization bounds for deep nets via a compression approach

76

Isolating Sources of Disentanglement in Variational Autoencoders

77

Efficient Neural Architecture Search via Parameter Sharing

78

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

79

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

80

PointCNN: Convolution On X-Transformed Points

81

Which Training Methods for GANs do actually Converge?

82

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

83

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy

84

Do Artifacts Have Politics?

85

Inter-Coder Agreement in One-to-Many Classification: Fuzzy Kappa

86

Use of positive and negative words in scientific PubMed abstracts between 1974 and 2014: retrospective analysis

87

How novelty in knowledge earns recognition: The role of consistent identities

88

Issues of validity and reliability in qualitative research

89

When good isn't good enough.

90

Why Science Is Not Necessarily Self-Correcting

91

The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research

92

Machine Learning that Matters

93

Nonparametric Latent Feature Models for Link Prediction

94

Rethinking LDA: Why Priors Matter

95

Learning Non-Linear Combinations of Kernels

96

Measuring Invariances in Deep Networks

97

Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models

98

3D Object Recognition with Deep Belief Nets

99

Kernel Methods for Deep Learning

100

Replicated Softmax: an Undirected Topic Model

101

Slow Learners are Fast

102

Guaranteed Rank Minimization via Singular Value Projection

103

Online dictionary learning for sparse coding

104

An accelerated gradient method for trace norm minimization

105

Learning structural SVMs with latent variables

106

Group lasso with overlap and graph lasso

107

Evaluation methods for topic models

108

Large-scale deep unsupervised learning using graphics processors

109

Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

110

Multi-view clustering via canonical correlation analysis

111

Learning with structured sparsity

112

Feature hashing for large scale multitask learning

113

Multi-Label Prediction via Compressed Sensing

114

Deflation Methods for Sparse PCA

115

On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

116

Translated Learning: Transfer Learning across Different Feature Spaces

117

Online Metric Learning and Fast Similarity Search

118

Privacy-preserving logistic regression

119

Nonrigid Structure from Motion in Trajectory Space

120

Local Gaussian Process Regression for Real Time Online Model Learning

121

The Recurrent Temporal Restricted Boltzmann Machine

122

Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity

123

Domain Adaptation with Multiple Sources

124

Clustered Multi-Task Learning: A Convex Formulation

125

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

126

Grassmann discriminant analysis: a unifying view on subspace-based learning

127

Listwise approach to learning to rank: theory and algorithm

128

On the quantitative analysis of deep belief networks

129

Classification using discriminative restricted Boltzmann machines

130

Efficient projections onto the l1-ball for learning in high dimensions

131

Learning diverse rankings with multi-armed bandits

132

A dual coordinate descent method for large-scale linear SVM

133

Confidence-weighted linear classification

134

Bayesian probabilistic matrix factorization using Markov chain Monte Carlo

135

Extracting and composing robust features with denoising autoencoders

136

Training restricted Boltzmann machines using approximations to the likelihood gradient

137

Three Approaches to Qualitative Content Analysis

138

Understanding interobserver agreement: the kappa statistic.

139

Sorting Things Out: Classification and Its Consequences

140

Enhancing the quality and credibility of qualitative analysis.

141

To all authors

142

Qualitative evaluation and research methods

143

Rigor in qualitative research: the assessment of trustworthiness.

144

Values in Science

145

Autonomous Technology: Technics-out-of-Control as a Theme in Political Thought

146

The Sociology of Science: Theoretical and Empirical Investigations

147

On the impact of the computer on society.

148

The Problem of $m$ Rankings

149

Naturalistic Inquiry

150

Design Justice, AI, and Escape From the Matrix of Domination

151

A Retrospective on the NeurIPS

152

Indigenous Protocol and Artificial Intelligence Position Paper

153

Wrongfully Accused by an Algorithm

154

Peer review in NLP: reject-if-not-SOTA

155

The Values of Machine Learning

156

Digital defense playbook: Community power tools for reclaiming data

157

Rise of the robots: Are you ready? Financial Times Magazine (March 2018)

158

The Trouble with Bias

159

How to plan and perform a qualitative study using content analysis

160

The Moral Character of Cryptographic Work

161

The black box society

162

Sorting Things Out - Classification and Its Consequences

163

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

164

Issues of validity and reliability in qualitative research

165

From Computer Power and Human Reason From Judgment to Calculation

166

Qualitative research practice

167

The discovery of grounded theory: strategies for grounded research

168

COGNITIVE AND NON-COGNITIVE VALUES IN SCIENCE: RETHINKING THE DICHOTOMY'

169

Qualitative Research Methods for the Social Sciences

170

Content Analysis: An Introduction to Its Methodology

171

Note that due to minor errors in the data sources used, the distribution of papers over venues and years is not perfectly balanced

172

Sociological Methods: A Sourcebook

173

Langdon

174

Objectivity, Value Judgment, and Theory Choice

175

b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Included in Appendix

176

speed can be described as valuable in an antelope [44]

177

Did you state the full set of assumptions of all theoretical results

178

Did you discuss any potential negative societal impacts of your work? [Yes] Included in the Appendix

179

Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation

180

Did you discuss whether and how consent was obtained from people whose data you're using/curating?

181

Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope?

182

(a) If your work uses existing assets, did you cite the creators? [Yes] Full listing of annotated papers is given in

183

Have you read the ethics review guidelines and ensured that your paper conforms to them

184

code, data, models) or curating/releasing new assets... (a) If your work uses existing assets

185

Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable?

186

If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots

187

Did you include any new assets either in the supplemental material or as a URL? [Yes] Included in supplementary zipfile

188

(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes

189

Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes

190

Did you report error bars (e.g., with respect to the random seed after running experiments multiple times

Authors

Field of Study

Computer Science

Journal Information

Name

Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

Venue Information

Name

Conference on Fairness, Accountability and Transparency

Type

conference

URL

https://facctconference.org/

Alternate Names

  • FAccT
  • Conf Fairness Account Transpar