Skip to main content

Rule Extraction Algorithm for Deep Neural Networks: A Review

-By Tameru Hailesilassie
Department of Computer Science and Engineering
National University of Science and Technology (MISiS)
Moscow, Russia


Today's blog is the continuation of XAI series. Rule Extraction from Neural Networks


Abstract—Despite the highest classification accuracy in wide varieties of application areas, the artificial neural network has one disadvantage. The way this Network comes to a decision is not easily comprehensible. The lack of explanation ability reduces the acceptability of neural network in data mining and decision system. This drawback is the reason why researchers have proposed many rule extraction algorithms to solve the problem. Recently, Deep Neural Network (DNN) is achieving a profound result over the standard neural network for classification and recognition problems. It is a hot machine learning area proven both useful and innovative. This paper has thoroughly reviewed various rule extraction algorithms, considering the classification scheme: decompositional, pedagogical, and eclectics. It also presents the evaluation of these algorithms based on the neural network structure with which the algorithm is intended to work. The main contribution of this review is to show that there is a limited study of rule extraction algorithm from DNN.


RULE EXTRACTION ALGORITHM

Rule Extraction Definition
 "…given a trained neural network and the data on which it was trained, produce a description of the network's hypothesis that is comprehensible yet closely approximates the network's predictive behavior."  

The rule extraction algorithm is useful for experts to verify and cross-check neural network systems. Extracted rules have different forms. We present an overview of some of the logical rules below.

IF-THEN rules: 

It is a conditional statement model, which is easily comprehensible.
The general form of IF-THEN rule is:



If the given condition is true, in this case, if X is a member of S(i) then the output will be labeled to a particular class. As a simple example, a single neuron in a neural network with a linear activation function can be modeled by IF-THEN logical rule. The weighted sum of ݆ jth  neuron is calculated as:



Where ܺXi is input and ܹWij is a corresponding weight of the connection between ݅ ith and ݆ jth neuron. The output Y of a neuron is a function of the weighted sum given as:



 So, the output can be expressed by simple IF-THEN rule as follow:

 Where α is a Threshold value.


M-of-N rules:

It searches for rules with a Boolean expression. The expression is satisfied when M of N sets are satisfied. The rule has the following form:

IF M of {N} THEN Z

This method is efficient and general. It can also easily converted to a simple IF-THEN rule.



Decision tree:

It is the most widely used tree structure classifier in Machine learning and Data mining. This model classifies an instance starting at the root of the tree and follows down to the branches until the end. Decision tree uses a white box system that is easy to explain. A simple Decision tree diagram is shown in Figure 2.

Andrew, Diederich, and Tickle proposed a multidimensional taxonomy for the various rule extraction algorithm.

The first dimension of classification is based on the expressive power of the extracted algorithm, which refers to the form of the rule (e.g. IF-THEN rule and Fuzzy rule).

The second scheme considers the relationship between the extracted rule and the architecture of a trained neural network. Accordingly, there are three categories. Rule extraction algorithms that work on Neuron- level, rather than the whole neural network architecture-level are called decompositional techniques. If an artificial neural network is considered as a black box irrespective of the architecture, then these algorithms fall into a pedagogical category.

The third is the combination of both decompositional and pedagogical approaches. It is called the eclectics.



DECOMPOSITIONAL APPROACH

Decomposition algorithms work by splitting the network into the neuron level. The result obtained from each neuron then aggregated to represent the network as a whole.
DIFACON-miner that can generate an IF-THEN rule from an artificial neural network. Differential evolution (DE) and touring ant colonization (TACO) algorithm are used for training and rule extraction respectively and simultaneously. It saves additional quite a long time spent for rule extraction. However, the algorithm addresses only a feedforward artificial neural network. This algorithm does not consider rule extraction from DNN.
CRED extracts both continuous and discrete rules by using a decision tree from pre-trained ANN. Algorithms that specifically work with discrete inputs requires discretization of continuous attributes to comprise continues data which in turn affects the accuracy of extracted rule. Thus, CRED does not employ discretization.  Even though this algorithm is independent of the network structure and training algorithm, it is not possible to apply the algorithm directly to DNN.
FERNN is an algorithm proposed to generate both M-of-N and IF-THEN rule from feedforward neural network with a single hidden layer. Besides, there is no pruning and retraining of the network. As a result, the speed of the extraction process is fast. The applicability of the algorithm to DNN is not even mentioned.
KT extracts IF-THEN rule. It follows layer by layer approach. This algorithm applies a tree search to find the rule. In spite of the fact that the extracted rules are robust, in some cases even better than Neural network itself, DNN is not taken into consideration. Tsukimoto has introduced a rule extraction algorithm that works with different types of the neural network, such as recurrent neural network and multilayer perceptron (MLP), whose activation function is monotonic. It extracts IF-THEN rules that are applicable to both continuous and discrete values. Also, the computational complexity of this algorithm is polynomial, whereas KT is in the order of exponential.

PEDAGOGICAL APPROACH

The neural network is treated as a black box system in pedagogical rule extraction algorithms. The focus of this approach is finding an output for a corresponding input. The weight of the internal structure of an artificial neural network is not subjected to analysis. There are different techniques in this approach. Some of them are validity interval analysis (VIA), sampling approach, and reverse engineering the neural network.
TREPAN It extracts M-of-N split point and decision tree from ANN by utilizing query and sampling approach. Learning with queries is also introduced with the aim of information retrieval, which is essential for the learning process. The architecture of the neural network used for the experiment has only one hidden layer.
HYPINV this is based on network inversion technique. This algorithm is capable of extracting hyperplane rule. The rule extraction is in the form of conjunction and disjunction of hyperplanes. The authors used a standard MLP for the experiment. Notwithstanding that the algorithm is independent of the architecture of MLP, DNN is not contemplated.

Three rule extraction algorithms.

BIO-RE It extracts a binary rule from a neural network which is trained with binary input. However, the algorithm is tested with a shallow MLP of four input, six hidden and three output node.

KDRuleEx - that can generate a two-dimensional matrix of a decision table. Training example set and trained artificial neural network are input for this algorithm. It can work with both discrete and continuous inputs, also handles non-binary inputs.
RxREN - A reverse engineering approach algorithm extracts an IF-THEN rule from a neural network. Reverse engineering techniques are the analysis of output, to trace back components that cause the final result. This is fast to search for the rules. However, the decompositional technique works layer by layer. As a result, this method may be tedious and time-consuming. Regarding computational limitation and execution time, the pedagogical approach is better than decompositional. Also, it has an advantage of flexibility in terms of ANN architecture. 


ECLECTIC APPROACH

Hruschka and Ebecken have proposed an algorithm which is based on an algorithm called RX proposed by Lu, Setiono, and Liu. The technique incorporates both decompositional and pedagogical approach. Clustering genetic algorithm (CGA) is employed for the purpose of hidden unit clustering. Subsequently, a logical rule extraction is based on the relationship between input and the generated a cluster. This algorithm is designed for shallow MLP neural network. An eclectic approach algorithm that uses an artificial immune system (AIS). It can generate a rule from a trained shallow feed-forward neural network and has a high accuracy value. 


RULE EXTRACTION FROM DEEP NEURAL NETWORK

Zilke has proposed a rule extraction algorithm called DeepRED from DNN by extending a decompositional algorithm CRED. The proposed algorithm has additional decision trees as well as intermediate rules for every hidden layer. Rule extraction from DNN is a step-wise process. It is a divide and conquers method that describes each layer by the previous layer. Accordingly, each result is merged to get the final rule that explains the whole DNN. The author has found the divide and conquers approach helps to reduce memory usage and computational time. Moreover, to facilitate the process of rule extraction, FERNN is used for pruning less important components of a neural network.


Table II shows the summary of rule extraction algorithms along with the form of extracted rule and used a neural network type for the experiment. 


CONCLUSION

Some of the state-of-the-art algorithms are discussed from each category named as Decompositional, Pedagogical, and Eclectics. Currently, Deep Learning provides an acceptable solution for lots of problems. Even though DNN architecture is complex, a pedagogical algorithm can be used as an advantage irrespective of the number of the hidden layers. Pedagogical algorithms do not depend on the architecture of the algorithm. Thus, they might fill this gap. Extracting a comprehensible rule from DNN enhance the real-world usability of the promising solutions of DNN. Also, it can remove uncertainty problems associated with neural network software.

Comments

Popular posts from this blog

ABOD and its PyOD python module

Angle based detection By  Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek  Ludwig-Maximilians-Universität München  Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By  Yue Zhao   Zain Nasrullah   Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada  Zheng Li jk  Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.  Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based similarity measures for text data. Object o is an out

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

 - By Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang Microsoft Research, Beijing 100080, China. Beihang University, Beijing 100191, China Paper Link Abstract Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for tab