Research Connect
Research PapersAboutContact

Definitions, methods, and applications in interpretable machine learning

Published in Proceedings of the National Academ... (2019-01-14)
aionlincourseaionlincourseaionlincourseaionlincourseaionlincourseaionlincourse
Generate GraphDownload

On This Page

  • TL;DR
  • Abstract
  • Authors
  • Datasets
  • References
TL

TL;DR

This work defines interpretability in the context of machine learning and introduces the predictive, descriptive, relevant (PDR) framework for discussing interpretations, and introduces 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy.

Abstract

Significance The recent surge in interpretability research has led to confusion on numerous fronts. In particular, it is unclear what it means to be interpretable and how to select, evaluate, or even discuss methods for producing interpretations of machine-learning models. We aim to clarify these concerns by defining interpretable machine learning and constructing a unifying framework for existing methods which highlights the underappreciated role played by human audiences. Within this framework, methods are organized into 2 classes: model based and post hoc. To provide guidance in selecting and evaluating interpretation methods, we introduce 3 desiderata: predictive accuracy, descriptive accuracy, and relevancy. Using our framework, we review existing work, grounded in real-world studies which exemplify our desiderata, and suggest directions for future work. Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the predictive, descriptive, relevant (PDR) framework for discussing interpretations. The PDR framework provides 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post hoc categories, with subgroups including sparsity, modularity, and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often underappreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.

Authors

W. James Murdoch

1 Paper

Chandan Singh

1 Paper

Karl Kumbier

1 Paper

References113 items

1

Random Forests

2

A survey on deep learning in medical image analysis

3

Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)

4

Regression Shrinkage and Selection via the Lasso

5

Equality of Opportunity in Supervised Learning

6

Research Impact

1293

Citations

113

References

0

Datasets

5

R. Abbasi-Asl

1 Paper

Bin Yu

1 Paper

Fairness through awareness

7

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

8

European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation"

9

The mythos of model interpretability

10

Visualizing and Understanding Convolutional Networks

11

How to Explain Individual Classification Decisions

12

Science and Statistics

13

“Why Should I Trust You?”: Explaining the Predictions of Any Classifier

14

Deep learning for computational biology

15

A Unified Approach to Interpreting Model Predictions

16

Axiomatic Attribution for Deep Networks

17

A Survey of Methods for Explaining Black Box Models

18

Classification and regression trees

19

Histograms of oriented gradients for human detection

20

SmoothGrad: removing noise by adding noise

21

Learning Important Features Through Propagating Activation Differences

22

Understanding Black-box Predictions via Influence Functions

23

Probabilistic Graphical Models - Principles and Techniques

24

Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention

25

Sparse coding with an overcomplete basis set: A strategy employed by V1?

26

Sanity Checks for Saliency Maps

27

Real Time Image Saliency for Black Box Classifiers

28

Principal Component Analysis

29

Please Stop Explaining Black Box Models for High Stakes Decisions

30

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

31

Rationalizing Neural Predictions

32

Explanation and understanding

33

The structure and function of explanations

34

ggplot2

35

Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees

36

The DeepTune framework for modeling and characterizing neurons in visual cortex area V4

37

Refining interaction search through signed iterative Random Forests

38

Can I trust you more? Model-Agnostic Hierarchical Explanations

39

Hierarchical interpretations for neural network predictions

40

Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning

41

A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations

42

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

43

A Shared Vision for Machine Learning in Neuroscience

44

Consistent Individualized Feature Attribution for Tree Ensembles

45

Biclustering by sparse canonical correlation analysis

46

Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs

47

Distilling a Neural Network Into a Soft Decision Tree

48

Towards better understanding of gradient-based attribution methods for Deep Neural Networks

49

Interpreting CNN knowledge via an Explanatory Graph

50

Interpretability of deep learning models: A survey of results

51

Iterative random forests to discover predictive and stable high-order interactions

52

Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning

53

Detecting Statistical Interactions from Neural Network Weights

54

ggplot2 - Elegant Graphics for Data Analysis (2nd Edition)

55

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

56

A Roadmap for a Rigorous Science of Interpretability

57

Visualizing Deep Neural Network Decisions: Prediction Difference Analysis

58

Automatic Rule Extraction from Long Short Term Memory Networks

59

Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks

60

Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems

61

Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

62

Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks

63

Superheat: Supervised heatmaps for visualizing complex data

64

Grounding of Textual Phrases in Images by Reconstruction

65

Neural Module Networks

66

Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission

67

Understanding Intra-Class Knowledge Inside CNN

68

Visualizing and Understanding Recurrent Networks

69

Striving for Simplicity: The All Convolutional Net

70

Generalized Additive Models

71

The Emergence of Machine Learning Techniques in Criminology

72

Estimation Stability With Cross-Validation (ESCV)

73

CRITICAL QUESTIONS FOR BIG DATA

74

Toward a Unified Theory of Visual Area V4

75

Proceedings of the 3rd Innovations in Theoretical Computer Science Conference

76

ggplot2: Elegant Graphics for Data Analysis

77

Permutation importance: a corrected feature importance measure

78

Enriched random forests

79

PREDICTIVE LEARNING VIA RULE ENSEMBLES

80

Daytime Arctic Cloud Detection Based on Multi-Angle Satellite Data With Case Studies

81

Credit scoring with a data mining approach based on support vector machines

82

Generalized Functional ANOVA Diagnostics for High-Dimensional Functions of Dependent Variables

83

IPython: A System for Interactive Scientific Computing

84

An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data

85

Multimodel Inference

86

An Information-Maximization Approach to Blind Separation and Blind Deconvolution

87

Feature Visualization

88

Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition

89

Statistical models and shoe leather

90

Factor analysis and AIC

91

Robust Statistics: The Approach Based on Influence Functions

92

Stability

93

Relations Between Two Sets of Variates

94

Seaborn: Statistical Data Visualization

95

Report of David Card

96

Exhibit 33: Report of David Card. https://projects.iq.harvard.edu/files/diverseeducation/files/legal - card report revised filing.pdf (2018)

97

Exhibit a: Expert report of Peter S. Arcidiacono

98

Exhibit 157: Demographics of har-vard college applicants

99

tidyverse: Easily install and load the ‘tidyverse’ (Version 1.2.1

100

Jupyter Notebooks - a publishing format for reproducible computational workflows

101

Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization

102

RStudio Team

103

RStudio: Integrated Development Environment for R

104

Causal Inference for Statistics, Social, and Biomedical Sciences: Instrumental Variables Analysis of Randomized Experiments with Two-Sided Noncompliance

105

Deepdream-a code example for visualizing neural networks

106

Data Structures for Statistical Computing in Python

107

BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests

108

Using “Annotator Rationales” to Improve Machine Learning for Text Categorization

109

Using TF-IDF to Determine Word Relevance in Document Queries

110

Case-based explanation of non-case-based learning methods

111

Extracting Tree-structured Representations of Trained Networks

112

Robust statistics: the approach based on influence functions

113

Office of Institutional Research

Authors

Field of Study

MedicineComputer ScienceMathematics

Journal Information

Name

Proceedings of the National Academy of Sciences

Volume

116

Venue Information

Name

Proceedings of the National Academy of Sciences of the United States of America

Type

journal

URL

https://www.jstor.org/journal/procnatiacadscie

Alternate Names

  • PNAS
  • PNAS online
  • Proceedings of the National Academy of Sciences of the United States of America.
  • Proc National Acad Sci
  • Proceedings of the National Academy of Sciences
  • Proc National Acad Sci u s Am