Skip to main content

The History Began from AlexNet: A Survey on Deep Learning Approaches

-By Md Zahangir Alom , Tarek M. Taha , Chris Yakopcic , Stefan Westberg, Paheding Sidike, Mst Shamima Nasrin , Brian C Van Essen , Abdul A S. Awwal, and Vijayan K. Asari



Paper covers vast information on Deep learning, its History, techniques, and analysis. I would present a miniature version of it. For detailed info refer 



Since the 1950s, a small subset of Artificial Intelligence (AI), often called Machine Learning (ML), has revolutionized several fields in the last few decades. Neural Networks(NN) is a subfield of ML, and it was this subfield that spawned Deep Learning (DL). Since its inception, DL has been creating ever larger disruptions, showing outstanding success in almost every application domain. Fig. 1 shows, the taxonomy of AI. DL (using either deep architecture of learning or hierarchical learning approaches) is a class of ML developed largely from 2006 onward. Learning is a procedure consisting of estimating the model parameters so that the learned model (algorithm) can perform a specific task. For example, in Artificial Neural Networks (ANN), the parameters are the weight matrices (𝑤 𝑖,𝑗 ′𝑠). DL, on the other hand, consists of several layers in between the input and output layer which allows for many stages of non-linear information processing units with hierarchical architectures to be present that are exploited for feature learning and pattern classification [1, 2]. Learning methods based on representations of data can also be defined as representation learning. Recent literature states that DL based representation learning involves a hierarchy of features or concepts, where the high-level concepts can be defined from the low-level ones and low-level concepts can be defined from high-level ones. In some articles, DL has been described as a universal learning approach that is able to solve almost all kinds of problems in different application domains. In other words, DL is not task specific.

Types of DL approaches





Category of Deep Learning




Feature Learning



When and where to apply DL


DL is employed in several situations where machine intelligence would be useful:

  1. Absence of a human expert (navigation on Mars)
  2. Humans are unable to explain their expertise (speech recognition, vision and language understanding)
  3. The solution to the problem changes over time (tracking, weather prediction, preference, stock, price prediction)
  4. Solutions need to be adapted to particular cases (biometrics, personalization).
  5. The problem size is too vast for our limited reasoning capabilities (calculation webpage ranks, matching ads to Facebook, sentiment analysis).

Why deep Learning


  • Universal learning approach 

This approach is sometimes called universal learning because it can be applied to almost any application domain.

  • Robust

Deep learning approaches do not require the design of features ahead of time. Features are automatically learned that is optimal for the task at hand. As a result, the robustness to natural variations in the data is automatically learned.

  • Generalization

The same deep learning approach can be used in different applications or with different data types. This approach is often called transfer learning. In addition, this approach is helpful where the problem does not have sufficient available data.

Challenges of DL


There are several challenges for deep learning:

  • Big data analytics using Deep Learning
  • Scalability of DL approaches
  • Ability to generate data which is important where data is not available for learning the system (especially for computer vision task such as inverse graphics).
  • Energy efficient techniques for special purpose devices including mobile intelligence, FPGAs, and so on.
  • Multi-task and transfer learning (generalization) or multi-module learning. This means learning from different domains or with different models together.
  • Dealing with causality in learning. 

Below is a brief history of neural networks highlighting key events:


  • 1943: McCulloch & Pitts show that neurons can be combined to construct a Turing machine (using ANDs, ORs, & NOTs).
  • 1958: Rosenblatt shows that perceptron’s will converge if what they are trying to learn can be represented.
  • 1969: Minsky & Papert show the limitations of perceptron’s, killing research in neural networks for a decade.
  • 1985: The backpropagation algorithm by GeoffreyHinton revitalizes the field.
  • 1988: Neocognitron: a hierarchical neural network capable of visual pattern recognition.
  • 1998: CNNs with Backpropagation for document analysis by Yan LeCun.
  • 2006: The Hinton lab solves the training problem for DNNs.
  • 2012 : AlexNet by Alex Krizhevesky in 2012.


 Artificial neurons, which try to mimic the behaviour of the human brain, are the fundamental component for building ANNs. The basic computational element (neuron) is called a node (or unit) which receives inputs from external sources and has some internal parameters (including weights and biases that are learned during training) which produce outputs. This unit is called a perceptron.

Gradient descent

The gradient descent approach is a first order optimization algorithm which is used for finding the local minima of an objective function. This has been used for training ANNs in the
last couple of decades successfully.

Stochastic Gradient Descent (SGD)

Since a long training time is the main drawback for the traditional gradient descent approach, the SGD approach is used for training Deep Neural Networks (DNN)

Back-propagation

DNN are trained with the popular Back-Propagation (BP) algorithm with SGD. The pseudo code of the basic Backpropagation is given in Algorithm III. In the case of MLPs, we can easily represent NN models using computation graphs which are directive acyclic graphs. For that representation of DL, we can use the chain-rule to efficiently calculate the gradient from the top to the bottom layers with BP.

Momentum

Momentum is a method which helps to accelerate the training process with the SGD approach. The main idea behind it is to use the moving average of the gradient instead of using only the current real value of the gradient.

Learning rate (𝜼)

The learning rate is an important component for training DNN (as explained in Algorithm I and II). The learning rate is the step size considered during training which makes the training
process faster.

Weight decay

Weight decay is used for training deep learning models as a L2 regularization approach, which helps to prevent over fitting the network and model generalization.



Popular CNN architectures


In general, most deep convolutional neural networks are made of a key set of basic layers, including the convolution layer, the sub-sampling layer, dense layers, and the soft-max layer. The architectures typically consist of stacks of several convolutional layers and max-pooling layers followed by a fully connected and SoftMax layers at the end. Some examples of such models are LeNet, AlexNet, VGG Net, NiN and all convolutional (All Conv). Other alternatives and more efficient advanced architectures have been proposed including GoogLeNet with Inception units,
Residual Networks, DenseNet, and FractalNet


Paper deep dives on networks and other aspects.




Comments

Popular posts from this blog

ABOD and its PyOD python module

Angle based detection By  Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek  Ludwig-Maximilians-Universität München  Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By  Yue Zhao   Zain Nasrullah   Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada  Zheng Li jk  Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.  Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based similarity measures for text data. Object o is an out

Ownership at Large

 Open Problems and Challenges in Ownership Management -By John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Shan He, Ralf Lämmel, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers Facebook Inc.  Software-intensive organizations rely on large numbers of software assets of different types, e.g., source-code files, tables in the data warehouse, and software configurations. Who is the most suitable owner of a given asset changes over time, e.g., due to reorganization and individual function changes. New forms of automation can help suggest more suitable owners for any given asset at a given point in time. By such efforts on ownership health, accountability of ownership is increased. The problem of finding the most suitable owners for an asset is essentially a program comprehension problem: how do we automatically determine who would be best placed to understand, maintain, evolve (and

Hybrid Approach to Automation, RPA and Machine Learning

- By Wiesław Kopec´, Kinga Skorupska, Piotr Gago, Krzysztof Marasek  Polish-Japanese Academy of Information Technology Paper Link Courtesy DZone   Abstract One of the more prominent trends within Industry 4.0 is the drive to employ Robotic Process Automation (RPA), especially as one of the elements of the Lean approach.     The full implementation of RPA is riddled with challenges relating both to the reality of everyday business operations, from SMEs to SSCs and beyond, and the social effects of the changing job market. To successfully address these points there is a need to develop a solution that would adjust to the existing business operations and at the same time lower the negative social impact of the automation process. To achieve these goals we propose a hybrid, human-centred approach to the development of software robots. This design and  implementation method combines the Living Lab approach with empowerment through participatory design to kick-start the