-By Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter and Lalana Kagal
Massachusetts Institute of Technology
Cambridge, MA 02139
Explaining Explanations: An Overview of Interpretability of Machine Learning
Explainable AI (XAI), Interpretable AI, or Transparent AI refer to techniques in artificial intelligence (AI) which can be trusted and easily understood by humans. It contrasts with the concept of the "black box" in machine learning where even their designers cannot explain why the AI arrived at a specific decision. XAI can be used to implement a social right to explanation. Some claim that transparency rarely comes for free and that there are often tradeoffs between how "smart" an AI is and how transparent it is; these tradeoffs are expected to grow larger as AI systems increase in internal complexity. The technical challenge of explaining AI decisions is sometimes known as the interpretability problem.
Some existing deployed systems and regulations make the need for explanatory systems urgent and timely. With impending regulations like the European Union’s “Right to Explanation”, calls for diversity and inclusion in AI systems, findings that some automated systems may reinforce inequality and bias, and requirements for safe and secure AI in safety-critical tasks, there has been a recent explosion of interest in interpreting the representations and decisions of black-box models. These models are everywhere, and the development of interpretable and explainable models is scattered throughout various disciplines. Examples of general “explainable systems” include interpretable AI, explainable ML, causality, safe AI, computational social
science, and automatic scientific discovery.
Explanation
Some say a good explanation depends on the question. “Why questions.” In particular, when you can phrase what you want to know from an algorithm as a why question. There are two whyquestions of interest; why and why-should. Similarly to the explainable planning literature, philosophers wonder about the why-shouldn’t and why-should questions, which can give the kinds of explainability requirements we want.
Interpretability vs. Completeness
An explanation can be evaluated in two ways: according to its interpretability, and according to its completeness. The goal of interpretability is to describe the internals of a system in a way that is understandable to humans. The success of this goal is tied to the cognition, knowledge, and biases of the user.The goal of completeness is to describe the operation of a system in an accurate way. An explanation is more complete when it allows the behavior of the system to be anticipated in more situations.
Explainability of Deep Networks
An explanation of processing answers “Why does this particular input lead to that particular output?” and is analogous to explaining the execution trace of a program. An explanation about representation answers “What information does the network contain?” and can be compared to explaining the internal data structures of a programExplanations of Deep Network Processing
Commonly used deep networks derive their decisions using a large number of elementary operations
1) Linear Proxy Models
The proxy model approach is exemplified well by the LIME method. With LIME, a black-box system is explained by probing behavior on perturbations of an input, and then that data is used to construct a local linear model that serves as a simplified proxy for the full model in the neighborhood of the input.2) Decision Trees
Decision Trees are focused on shallow networks, to generalizing the process for deep neural networks. DeepRED, which demonstrates a way of extending the CRED algorithm (designed for shallow networks) to arbitrarily many hidden layers. DeepRED utilizes several strategies to simplify its decision trees: it uses RxREN to prune unnecessary input, and it applies algorithm C4.5, a statistical method for creating a parsimonious decision tree.3) Automatic-Rule Extraction
Automatic rule extraction is another well-studied approach for summarizing decisions. Andrews outlines existing rule extraction techniques, and provides a useful taxonomy of five dimensions of ruleextraction methods including their expressive power, translucency and the quality of rules.4) Salience Mapping
The salience map approach is exemplified by occlusion procedure by Zeiler, where a network is repeatedly tested with portions of the input occluded to create a map showing which parts of the data actually have influence on the network output.Representations
1) Role of Layers
Layers can be understood by testing their ability to help solve different problems from the problems the network was originally trained on.2) Role of Individual Units
The information within a layer can be further subdivided into individual neurons or individual convolutional filters. The role of such individual units can be understood qualitatively, by creating visualizations of the input patterns that maximize the response of a single unit, or quantitatively, by testing the ability of a unit to solve a transfer problem.3) Role of Representation Vectors
Closely related to the approach of characterizing individual units is characterizing other directions in the representation vector space formed by linear combinations of individual units. Concept Activation Vectors (CAVs) are a framework for interpretation of a neural nets representations by identifying and probing directions that align with human-interpretable concepts.Production systems
Networks can be trained to use explicit attention as part of their architecture; they can be trained to learn disentangled representations; or they can be directly trained to create generative explanations
a) Attention Networks
Attention-based networks learn functions that provide a weighting over inputs or internal features to steer the information visible to other parts of a network. Attention-based approaches have shown remarkable success in solving problems such as allowing natural language translation models to process words in an appropriate nonsequential order, and they have also been applied in domains such as fine-grained image classification and visual question answering.
b) Disentangled Representations
Disentangled representations have individual dimensions that describe meaningful and independent factors of variation. The problem of separating latent factors is an old problem that has previously been attacked using a variety of techniques such as Principal Component Analysis, Independent Component Analysis, and Nonnegative Matrix Factorization. Deep networks can be trained to explicitly learn disentangled representations. One approach that shows promise is Variational Autoencoding, which trains a network to optimize a model to match the input probability distribution according to information-theoretic measures.
C) Generated Explanations
Finally, deep networks can also be designed to generate their own human-understandable explanations as part of the explicit training of the system. Explanation generation has been demonstrated as part of systems for visual question answering as well as in finegrained image classification. In addition to solving their primary task, these systems synthesize a “because” sentence that explains the decision in natural language.
Conclusion
We find, though, that the various approaches taken to address different facets of explainability are siloed. Work in the explainability space tends to advance a particular category of technique, with comparatively little attention given to approaches that merge different categories of techniques to achieve more effective explanation. Given the purpose and type of explanation, it is not obvious what the best type of explanation metric is and should be. We encourage the use of diverse metrics that align with the purpose and completeness of the targeted explanation. Our view is that, as the community learns to advance its work collaboratively by combining ideas from different fields, the overall state of system explanation will improve dramatically.
Comments