The History Began from AlexNet: A Survey on Deep Learning Approaches

-By Md Zahangir Alom , Tarek M. Taha , Chris Yakopcic , Stefan Westberg, Paheding Sidike, Mst Shamima Nasrin , Brian C Van Essen , Abdul A S. Awwal, and Vijayan K. Asari

Paper covers vast information on Deep learning, its History, techniques, and analysis. I would present a miniature version of it. For detailed info refer

Paper Link

Since the 1950s, a small subset of Artificial Intelligence (AI), often called Machine Learning (ML), has revolutionized several fields in the last few decades. Neural Networks(NN) is a subfield of ML, and it was this subfield that spawned Deep Learning (DL). Since its inception, DL has been creating ever larger disruptions, showing outstanding success in almost every application domain. Fig. 1 shows, the taxonomy of AI. DL (using either deep architecture of learning or hierarchical learning approaches) is a class of ML developed largely from 2006 onward. Learning is a procedure consisting of estimating the model parameters so that the learned model (algorithm) can perform a specific task. For example, in Artificial Neural Networks (ANN), the parameters are the weight matrices (𝑤 𝑖,𝑗 ′𝑠). DL, on the other hand, consists of several layers in between the input and output layer which allows for many stages of non-linear information processing units with hierarchical architectures to be present that are exploited for feature learning and pattern classification [1, 2]. Learning methods based on representations of data can also be defined as representation learning. Recent literature states that DL based representation learning involves a hierarchy of features or concepts, where the high-level concepts can be defined from the low-level ones and low-level concepts can be defined from high-level ones. In some articles, DL has been described as a universal learning approach that is able to solve almost all kinds of problems in different application domains. In other words, DL is not task specific.

Types of DL approaches

Category of Deep Learning

Feature Learning

When and where to apply DL

DL is employed in several situations where machine intelligence would be useful:

Absence of a human expert (navigation on Mars)
Humans are unable to explain their expertise (speech recognition, vision and language understanding)
The solution to the problem changes over time (tracking, weather prediction, preference, stock, price prediction)
Solutions need to be adapted to particular cases (biometrics, personalization).
The problem size is too vast for our limited reasoning capabilities (calculation webpage ranks, matching ads to Facebook, sentiment analysis).

Why deep Learning

Universal learning approach

This approach is sometimes called universal learning because it can be applied to almost any application domain.

Robust

Deep learning approaches do not require the design of features ahead of time. Features are automatically learned that is optimal for the task at hand. As a result, the robustness to natural variations in the data is automatically learned.

Generalization

The same deep learning approach can be used in different applications or with different data types. This approach is often called transfer learning. In addition, this approach is helpful where the problem does not have sufficient available data.

Challenges of DL

There are several challenges for deep learning:

Big data analytics using Deep Learning
Scalability of DL approaches
Ability to generate data which is important where data is not available for learning the system (especially for computer vision task such as inverse graphics).
Energy efficient techniques for special purpose devices including mobile intelligence, FPGAs, and so on.
Multi-task and transfer learning (generalization) or multi-module learning. This means learning from different domains or with different models together.
Dealing with causality in learning.

Below is a brief history of neural networks highlighting key events:

1943: McCulloch & Pitts show that neurons can be combined to construct a Turing machine (using ANDs, ORs, & NOTs).
1958: Rosenblatt shows that perceptron’s will converge if what they are trying to learn can be represented.
1969: Minsky & Papert show the limitations of perceptron’s, killing research in neural networks for a decade.
1985: The backpropagation algorithm by GeoffreyHinton revitalizes the field.
1988: Neocognitron: a hierarchical neural network capable of visual pattern recognition.
1998: CNNs with Backpropagation for document analysis by Yan LeCun.
2006: The Hinton lab solves the training problem for DNNs.
2012 : AlexNet by Alex Krizhevesky in 2012.

Artificial neurons, which try to mimic the behaviour of the human brain, are the fundamental component for building ANNs. The basic computational element (neuron) is called a node (or unit) which receives inputs from external sources and has some internal parameters (including weights and biases that are learned during training) which produce outputs. This unit is called a perceptron.

Gradient descent

The gradient descent approach is a first order optimization algorithm which is used for finding the local minima of an objective function. This has been used for training ANNs in the
last couple of decades successfully.

Stochastic Gradient Descent (SGD)

Since a long training time is the main drawback for the traditional gradient descent approach, the SGD approach is used for training Deep Neural Networks (DNN)

Back-propagation

DNN are trained with the popular Back-Propagation (BP) algorithm with SGD. The pseudo code of the basic Backpropagation is given in Algorithm III. In the case of MLPs, we can easily represent NN models using computation graphs which are directive acyclic graphs. For that representation of DL, we can use the chain-rule to efficiently calculate the gradient from the top to the bottom layers with BP.

Momentum

Momentum is a method which helps to accelerate the training process with the SGD approach. The main idea behind it is to use the moving average of the gradient instead of using only the current real value of the gradient.

Learning rate (𝜼)

The learning rate is an important component for training DNN (as explained in Algorithm I and II). The learning rate is the step size considered during training which makes the training
process faster.

Weight decay

Weight decay is used for training deep learning models as a L2 regularization approach, which helps to prevent over fitting the network and model generalization.

Popular CNN architectures

In general, most deep convolutional neural networks are made of a key set of basic layers, including the convolution layer, the sub-sampling layer, dense layers, and the soft-max layer. The architectures typically consist of stacks of several convolutional layers and max-pooling layers followed by a fully connected and SoftMax layers at the end. Some examples of such models are LeNet, AlexNet, VGG Net, NiN and all convolutional (All Conv). Other alternatives and more efficient advanced architectures have been proposed including GoogLeNet with Inception units,
Residual Networks, DenseNet, and FractalNet

Paper deep dives on networks and other aspects.

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

- By Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang Microsoft Research, Beijing 100080, China. Beihang University, Beijing 100191, China Paper Link Abstract Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for...

SRI Blog

Search This Blog