Research Connect

TL;DR

This project introduces Robust T CAV, which builds on TCAV and experimentally determines best practices for this method and is a step in the direction of making TCAVs, an already impactful algorithm in interpretability, more reliable and useful for practitioners.

Abstract

Interpretable machine learning has become a popular research direction as deep neural networks (DNNs) have become more powerful and their applications more mainstream, yet DNNs remain difficult to understand. Testing with Concept Activation Vectors, TCAV, (Kim et al. 2017) is an approach to interpreting DNNs in a human-friendly way and has recently received significant attention in the machine learning community. The TCAV algorithm achieves a degree of global interpretability for DNNs through human-defined concepts as explanations. This project introduces Robust TCAV, which builds on TCAV and experimentally determines best practices for this method. The objectives for Robust TCAV are 1) Making TCAV more consistent by reducing variance in the TCAV score distribution and 2) Increasing CAV and TCAV score resistance to perturbations. A difference of means method for CAV generation was determined to be the best practice to achieve both objectives. Many areas of the TCAV process are explored including CAV visualization in low dimensions, negative class selection, and activation perturbation in the direction of a CAV. Finally, a thresholding technique is considered to remove noise in TCAV scores. This project is a step in the direction of making TCAV, an already impactful algorithm in interpretability, more reliable and useful for practitioners.

Authors

Bradley C. Boehmke

1 Paper

Brandon M. Greenwell

1 Paper

TL;DR

Abstract

Authors

References165 items

Deep Residual Learning for Image Recognition

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

Going deeper with convolutions

ImageNet classification with deep convolutional neural networks

Deep learning in neural networks: An overview

ImageNet Large Scale Visual Recognition Challenge

Datasheets for datasets

Intriguing properties of neural networks

Machine learning: Trends, perspectives, and prospects

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification

Runaway Feedback Loops in Predictive Policing

Automated Experiments on Ad Privacy Settings

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Understanding artificial intelligence ethics and safety

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation"

The mythos of model interpretability

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Explainable AI: Interpreting, Explaining and Visualizing Deep Learning

“Why Should I Trust You?”: Explaining the Predictions of Any Classifier

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

Explaining and Harnessing Adversarial Examples

InterpretML: A Unified Framework for Machine Learning Interpretability

Explainable machine learning in deployment

Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR

The global landscape of AI ethics guidelines

A Survey of Methods for Explaining Black Box Models

Interpretable Explanations of Black Boxes by Meaningful Perturbation

SmoothGrad: removing noise by adding noise

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

AUTO-ENCODING VARIATIONAL BAYES

Model Cards for Model Reporting

Describing Textures in the Wild

beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Real Time Image Saliency for Black Box Classifiers

Image Style Transfer Using Convolutional Neural Networks

Explaining Explanations in AI

This looks like that: deep learning for interpretable image recognition

Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation

A Human-Centered Agenda for Intelligible Machine Learning

Machine Learning Explainability for External Stakeholders

The EU Commission

Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing

Face recognition vendor test part 3:

Algorithmic Decision-Making and the Control Problem

Millions of black people affected by racial bias in health-care algorithms

Dissecting racial bias in an algorithm used to manage the health of populations

A systematic review of algorithm aversion in augmented decision making

AI-Assisted Decision-making in Healthcare

Machine Learning Interpretability: A Survey on Methods and Metrics

How model accuracy and explanation fidelity influence user trust

Administrative law and the machines of government: judicial review of automated public-sector decision-making

FactSheets: Increasing trust in AI services through supplier's declarations of conformity

Shaping the State of Machine Learning Algorithms within Policing: Workshop Report

Principles alone cannot guarantee ethical AI

Affinity Profiling and Discrimination by Association in Online Behavioural Advertising

From What to How: An Initial Review of Publicly Available AI Ethics Tools, Methods and Research to Translate Principles into Practices

The Ethics of AI Ethics: An Evaluation of Guidelines

Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice

Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making

Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks

Towards a Definition of Disentangled Representations

Opening the black box of machine learning.

Response to centre for data ethics and innovation consultation

Algorithm-assisted decision-making in the public sector: framing the issues using administrative law rules governing discretionary power

Explaining Image Classifiers by Counterfactual Generation

General Data Protection Regulation

Defining Locality for Surrogates in Post-hoc Interpretablity

Transparent to whom? No algorithmic accountability without a critical audience

Explaining Explanations: An Overview of Interpretability of Machine Learning

What About Us?

The General Data Protection Regulation (GDPR)