Research Connect

TL;DR

This work defines interpretability in the context of machine learning and introduces the predictive, descriptive, relevant (PDR) framework for discussing interpretations, and introduces 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy.

Abstract

Significance The recent surge in interpretability research has led to confusion on numerous fronts. In particular, it is unclear what it means to be interpretable and how to select, evaluate, or even discuss methods for producing interpretations of machine-learning models. We aim to clarify these concerns by defining interpretable machine learning and constructing a unifying framework for existing methods which highlights the underappreciated role played by human audiences. Within this framework, methods are organized into 2 classes: model based and post hoc. To provide guidance in selecting and evaluating interpretation methods, we introduce 3 desiderata: predictive accuracy, descriptive accuracy, and relevancy. Using our framework, we review existing work, grounded in real-world studies which exemplify our desiderata, and suggest directions for future work. Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the predictive, descriptive, relevant (PDR) framework for discussing interpretations. The PDR framework provides 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post hoc categories, with subgroups including sparsity, modularity, and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often underappreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.

Authors

W. James Murdoch

1 Paper

Chandan Singh

1 Paper

Karl Kumbier

1 Paper

TL;DR

Abstract

Authors

References113 items

Random Forests

A survey on deep learning in medical image analysis

Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)

Regression Shrinkage and Selection via the Lasso

Equality of Opportunity in Supervised Learning

Fairness through awareness

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation"

The mythos of model interpretability

Visualizing and Understanding Convolutional Networks

How to Explain Individual Classification Decisions

Science and Statistics

“Why Should I Trust You?”: Explaining the Predictions of Any Classifier

Deep learning for computational biology

A Unified Approach to Interpreting Model Predictions

Axiomatic Attribution for Deep Networks

A Survey of Methods for Explaining Black Box Models

Classification and regression trees

Histograms of oriented gradients for human detection

SmoothGrad: removing noise by adding noise

Learning Important Features Through Propagating Activation Differences

Understanding Black-box Predictions via Influence Functions

Probabilistic Graphical Models - Principles and Techniques

Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention

Sparse coding with an overcomplete basis set: A strategy employed by V1?

Sanity Checks for Saliency Maps

Real Time Image Saliency for Black Box Classifiers

Principal Component Analysis

Please Stop Explaining Black Box Models for High Stakes Decisions

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

Rationalizing Neural Predictions

Explanation and understanding

The structure and function of explanations

ggplot2

Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees

The DeepTune framework for modeling and characterizing neurons in visual cortex area V4

Refining interaction search through signed iterative Random Forests

Can I trust you more? Model-Agnostic Hierarchical Explanations

Hierarchical interpretations for neural network predictions

Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning

A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

A Shared Vision for Machine Learning in Neuroscience

Consistent Individualized Feature Attribution for Tree Ensembles

Biclustering by sparse canonical correlation analysis

Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs

Distilling a Neural Network Into a Soft Decision Tree

Towards better understanding of gradient-based attribution methods for Deep Neural Networks

Interpreting CNN knowledge via an Explanatory Graph

Interpretability of deep learning models: A survey of results

Iterative random forests to discover predictive and stable high-order interactions

Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning

Detecting Statistical Interactions from Neural Network Weights

ggplot2 - Elegant Graphics for Data Analysis (2nd Edition)

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

A Roadmap for a Rigorous Science of Interpretability

Visualizing Deep Neural Network Decisions: Prediction Difference Analysis

Automatic Rule Extraction from Long Short Term Memory Networks

Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks

Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems

Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks

Superheat: Supervised heatmaps for visualizing complex data

Grounding of Textual Phrases in Images by Reconstruction

Neural Module Networks

Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission

Understanding Intra-Class Knowledge Inside CNN

Visualizing and Understanding Recurrent Networks

Striving for Simplicity: The All Convolutional Net

Generalized Additive Models

The Emergence of Machine Learning Techniques in Criminology

Estimation Stability With Cross-Validation (ESCV)

CRITICAL QUESTIONS FOR BIG DATA

Toward a Unified Theory of Visual Area V4

Proceedings of the 3rd Innovations in Theoretical Computer Science Conference

ggplot2: Elegant Graphics for Data Analysis