-By Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin
University of Washington
Seattle,
WA 98105, USA
TRUST
I would like to quote Stephen M.R. Covey "THE SPEED OF TRUST" Statements here, which is relevant to Trust. Executive summary link
- Simply put, trust means confidence. The opposite of trust — distrust — is suspicion.
- Trust always affects two outcomes: speed and cost. When trust goes down, speed goes down and cost goes up. When trust goes up, speed goes up and cost goes down
- (Strategy x Execution) x Trust = Results
- Not trusting people is a greater risk.
if the users do not trust a model or a prediction, they will not use it.
There are two definitions of trust
(1) Trusting a prediction, i.e. whether a user trusts an individual prediction sufficiently to take some action based on it
(2) Trusting a model, i.e. whether the user trusts a model to behave in reasonable ways if deployed. Both are directly impacted by how much the human understands a model’s behaviour, as opposed to seeing it as a black box.
The proxy model approach is exemplified well by the LIME method. With LIME, a black-box system is explained by probing behavior on perturbations of an input, and then that data is used to construct a local linear model that serves as a simplified proxy for the full model in the neighborhood of the input.
In this paper, they propose providing explanations for individual predictions as a solution to the “trusting a prediction” problem, and selecting multiple such predictions (and explanations) as a solution to the “trusting the model” problem.
Main contributions are summarized as follows.
• LIME, an algorithm that can explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model.
• SP-LIME, a method that selects a set of representative instances with explanations to address the “trusting the model” problem, via submodular optimization.
• Comprehensive evaluation with simulated and human subjects, where we measure the impact of explanations on trust and associated tasks.
CASE FOR EXPLANATIONS
By “explaining a prediction”, we mean presenting textual or
visual artifacts that provide qualitative understanding of the relationship between the instance’s components (e.g. words in text, patches in an image) and the model’s prediction. We argue that explaining predictions is an important aspect in getting humans to trust and use machine learning effectively, if the explanations are faithful and intelligible.
Explaining individual predictions. A model predicts that a patient has the flu, and LIME highlights the symptoms in the patient’s history that led to the prediction. Sneeze and headache are portrayed as contributing to the “flu” prediction, while “no fatigue” is evidence against it. With these, a doctor can make an informed decision about whether to trust the model’s prediction.
The process of explaining individual predictions is illustrated in Figure 1. It is clear that a doctor is much better positioned to make a decision with the help of a model if intelligible explanations are provided. In this case, an explanation is a small list of symptoms with relative weights symptoms that either contribute to the prediction (in green) or are evidence against it (in red). Humans usually have prior knowledge about the application domain, which they can use to accept (trust) or reject a prediction if they understand the reasoning behind it. It has been observed, for example, that providing explanations can increase the acceptance of movie recommendations and other automated systems.
Every machine learning application also requires a certain measure of overall trust in the model. Development and evaluation of a classification model often consists of collecting annotated data, of which a held-out subset is used for automated evaluation. Although this is a useful pipeline for many applications, evaluation on validation data may not correspond to performance “in the wild”, as practitioners often overestimate the accuracy of their models, and thus trust cannot rely solely on it. Looking at examples offers an alternative method to assess truth in the model, especially if the examples are explained. We thus propose explaining several representative individual predictions of a model as a way to provide a global understanding.
There are several ways a model or its evaluation can go wrong. Data leakage, for example, defined as the unintentional leakage of signal into the training (and validation) data that would not appear when deployed, potentially increases accuracy. A challenging example cited by Kaufman et al. is one where the patient ID was found to be heavily correlated with the target class in the training and validation data. This issue would be incredibly challenging to identify just by observing the predictions and the raw data, but much easier if explanations such as the one in Figure 1 are provided, as patient ID would be listed as an explanation for predictions. Another particularly hard to detect problem is dataset shift, where training data is different than test data (we give an example in the famous 20 newsgroups dataset later on). The insights given by explanations are particularly helpful in identifying what must be done to convert an untrustworthy model into a trustworthy one – for example, removing leaked data or changing the training data to avoid dataset shift.
Sparse Linear Explanations
Code and data for replicating our experiments are available
at
https://github.com/marcotcr/lime-experiments
Comments