1
Causally motivated shortcut removal using auxiliary labels
2
Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI
3
Measuring and Reducing Gendered Correlations in Pre-trained Models
4
Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension
5
Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension
6
The myth of generalisability in clinical research and machine learning in health care
7
On Robustness and Transferability of Convolutional Neural Networks
8
Measuring Robustness to Natural Distribution Shifts in Image Classification
9
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
10
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
11
Hyperparameter Ensembles for Robustness and Uncertainty Quantification
12
Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe
14
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
15
How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
16
Skin Color in Dermatology Textbooks: An Updated Evaluation and Analysis.
17
A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy
18
StereoSet: Measuring stereotypical bias in pretrained language models
19
Quantifying Gender Bias in Different Corpora
20
Shortcut learning in deep neural networks
21
The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions
22
Understanding and Mitigating the Tradeoff Between Robustness and Accuracy
23
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
24
5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
25
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
26
Doctor XAI: an ontology-based approach to black-box sequential data classification explanations
27
Big Transfer (BiT): General Visual Representation Learning
28
Large Scale Learning of General Visual Representations for Transfer
29
Linear Mode Connectivity and the Lottery Ticket Hypothesis
30
Deep double descent: where bigger models and more data hurt
31
Causality for Machine Learning
32
BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance
33
Key challenges for delivering clinical impact with artificial intelligence
34
Dissecting racial bias in an algorithm used to manage the health of populations
35
Hidden stratification causes clinically meaningful failures in machine learning for medical imaging
36
Learning the Difference that Makes a Difference with Counterfactually-Augmented Data
37
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
38
Deep Ensembles: A Loss Landscape Perspective
39
Predictive Multiplicity in Classification
40
A deep learning system for differential diagnosis of skin diseases
41
Artificial intelligence to predict AKI: is it a breakthrough?
42
The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve
43
Association Between Surgical Skin Markings in Dermoscopic Images and Diagnostic Performance of a Deep Learning Convolutional Neural Network for Melanoma Recognition.
44
A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning
45
Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks
46
Developing Deep Learning Continuous Risk Models for Early Adverse Event Prediction in Electronic Health Records: an AKI Case Study
47
RoBERTa: A Robustly Optimized BERT Pretraining Approach
48
Analysis of polygenic risk score usage and performance in diverse human populations
49
Natural Adversarial Examples
50
Invariant Risk Minimization
51
A Clinically Applicable Approach to Continuous Prediction of Future Acute Kidney Injury
52
A Fourier Perspective on Model Robustness in Computer Vision
53
XLNet: Generalized Autoregressive Pretraining for Language Understanding
54
Analyzing the role of model uncertainty for electronic health records
55
Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
56
High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks
57
Adversarial Examples Are Not Bugs, They Are Features
58
HellaSwag: Can a Machine Really Finish Your Sentence?
59
Learning Robust Global Representations by Penalizing Local Predictive Power
60
Clinical use of current polygenic risk scores may exacerbate health disparities
61
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
62
Guidelines and recommendations for ensuring Good Epidemiological Practice (GEP): a guideline developed by the German Society for Epidemiology
63
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
64
Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting
65
Using Electronic Health Records to Identify Adverse Drug Events in Ambulatory Care: A Systematic Review
66
Reconciling modern machine learning and the bias-variance trade-off
67
On Lazy Training in Differentiable Programming
68
Predicting diabetes-related hospitalizations based on electronic health records
69
Machine Learning and Health Care Disparities in Dermatology.
70
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
71
Counterfactual Fairness in Text Classification through Robustness
72
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
73
Reduced signal for polygenic adaptation of height in UK Biobank
74
Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations
75
Stress Test Evaluation for Natural Language Inference
76
Gender Bias in Coreference Resolution
77
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
78
Averaging Weights Leads to Wider Optima and Better Generalization
79
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
80
A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog
81
Deep Contextualized Word Representations
82
Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification
83
Universal Language Model Fine-tuning for Text Classification
84
All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously
85
Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes
86
Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy
87
The Consciousness Prior
88
Simple Recurrent Units for Highly Parallelizable Recurrence
89
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
90
Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions
91
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
92
Invariant Causal Prediction for Nonlinear Models
93
Machine Learning: An Applied Econometric Approach
94
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
95
Counterfactual Fairness
96
Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations
97
Beyond prediction: Using big data for policy problems
98
Dermatologist-level classification of skin cancer with deep neural networks
99
Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.
100
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
101
GRAM: Graph-based Attention Model for Healthcare Representation Learning
102
Entropy-SGD: biasing gradient descent into wide valleys
103
Capacity and Trainability in Recurrent Neural Networks
104
Genomics is failing on diversity
105
Semantics derived automatically from language corpora contain human-like biases
106
Human demographic history impacts genetic risk prediction across diverse populations
107
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
108
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
109
Deep Residual Learning for Image Recognition
110
A large annotated corpus for learning natural language inference
111
Prediction Policy Problems.
112
UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age
113
Causal inference by using invariant prediction: identification and confidence intervals
114
Efficient Estimation of Word Representations in Vector Space
115
Large-scale association analysis identifies new risk loci for coronary artery disease
116
KDIGO Clinical Practice Guidelines for Acute Kidney Injury
117
Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation
118
Improving disease prediction using ICD-9 ontological features
119
Next generation disparities in human genomics: concerns and remedies.
120
Common polygenic variation contributes to risk of schizophrenia and bipolar disorder
121
ImageNet: A large-scale hierarchical image database
122
Eigenvectors of some large sample covariance matrix ensembles
123
Linkage disequilibrium — understanding the evolutionary past and mapping the medical future
124
Random Features for Large-Scale Kernel Machines
125
Prediction of individual genetic risk to disease from genome-wide association studies.
126
PLINK: a tool set for whole-genome association and population-based linkage analyses.
127
Principal components analysis corrects for stratification in genome-wide association studies
128
OntoNotes: The 90% Solution
129
Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)
130
Long Short-Term Memory
132
2020). Interestingly, however, it takes marginalizing across a larger subset of models
133
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models
134
2019) holds for the risk, conditional on the realization of X,y. The statement given here is obtained simply my taking expectation over X,y
135
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
136
Acute kidney injury: prevention, detection and management
137
Neural architecture search: A survey
138
x)〉. It is useful to consider a couple of examples. Example 1. Imagine training a highly overparametrized neural network using SGD
139
In order to evaluate the asymptotics of this expression, we recall some formulas that follow from Mei and Montanari
140
PRS show potential for identifying high-risk individuals for certain common diseases
141
PROTOCOL available at Protocol Exchange, version 1, jul 2019b. doi: 10.21203/RS
142
while representing less than a quarter of global population), this has raised scientific and ethical concerns about the clinical use of PRS and GWAS in the community Martin et al
143
Correction: Efficacy of Commercial Weight-Loss Programs
145
Intelligible Models for HealthCare
146
Priors for Infinite Networks
147
The use of misclassification costs to learn rule-based decision support models for cost-effective hospital admission strategies.
149
For W independent of W 0 , we have (for s = 0)