Recent advances in deep learning have allowed Artificial Intelligence (AI) to reach near
human-level performance in many sensory, perceptual, linguistic or cognitive tasks.
There is a growing need, however, for novel, brain-inspired cognitive architectures. The
Global Workspace theory refers to a large-scale system integrating and distributing information among networks of specialized modules to create higher-level forms of cognition
and awareness. We argue that the time is ripe to consider explicit implementations
of this theory using deep learning techniques. We propose a roadmap based on unsupervised neural translation between multiple latent spaces (neural networks trained for
distinct tasks, on distinct sensory inputs and/or modalities) to create a unique, amodal
global latent workspace (GLW). Potential functional advantages of GLW are reviewed,
along with neuroscientific implications
Paper approach to a cognitive framework that has
been proposed to underlie perception, executive function and even consciousness: the
Global Workspace Theory (GWT).
The GWT, initially proposed by Baars, is a key element of modern cognitive
science (Figure 1A). The theory proposes that the brain is divided into specialized modules for specific functions, with long-distance connections between them. When
warranted by the inputs or by task requirements (through a process of attentional selection), the contents of a specialized module can be broadcast and shared among distinct
modules. According to the theory, the shared information at each moment in time—the
global workspace—is what constitutes our conscious awareness. In functional terms, the global workspace can serve to resolve problems that could not be solved by a single
specialized function, by coordinating multiple specialized modules.
Dehaene and colleagues proposed a neuronal version of the theory, Global
Neuronal Workspace (GNW), which has become one of the major contemporary neuroscientific theories of consciousness. According to GNW, conscious access occurs when
incoming information is made globally available to multiple brain systems through a
network of neurons with long-range axons densely distributed in prefrontal, parietal temporal, and cingulate cortices (Figure 1B).
A neural signature of this global broadcast
of information is the ignition property: an all-or-none activation of a broad network of
brain regions, likely supported by long-range recurrent connections (Figure 1C).
While Y. Bengio has explicitly linked his recent “consciousness prior” theory to GWT, his proposal focused on novel theoretical principles in
machine learning (e.g. sparse factor graphs). Our approach is a complementary one, in
which we emphasize practical solutions to implementing a global workspace with currently available deep learning components, while always keeping in mind the equivalent
mechanisms in the brain
This paper proposed a new prior to representation learning. It can be combined with other priors to disentangle abstract factors from one another. It is inspired by human consciousness, which can be thought of as a low-dimensional representation of conscious thought. Consciousness is defined as "the perception of what passes in a man's own mind".
This low-dimensionality of the representation is used as a regularizer which encourages the abstract representation to be such that when a sparse attention mechanism focuses on a few elements of the representation, the small set of variables can be combined to make a useful statement about reality or usefully condition an action or policy.
We have a recurrent neural network (RNN), h_t = F(s_t, h_{t-1}). s is the observed state, F is the representation RNN and h is the representation state. Think of F as the human brain, thus h is the high-dimensional. We want to learn good representation states which contain abstract explanatory factors. We want to be able to transform h so we can extract information about a single one of those factors.
In contrast, the conscious state, c, is very low-dimensional, derviced from h by an attention mechanism applied to h: c_t = C(h_t, c_{t-1}, z_t). z is a random noise source. You can think of c as the content of a thought. This is a small subset of all the information available to us unconsciously, but which has been brought to our awareness by the attention mechanism which uses several elements from h. C is the consciousness RNN. The random noise means the elements that get focused on have some stochasisity. Thus, the consciousness RNN is used to isolate a high-level abstraction and extract information from it. In general, C will aggregate a few factors of information - not just a single factor - into a more complex composed thought.
We want to assume this conscious thought can encapsulte a statement about the future. We do this with a verifier network, V(h_t, c_{t-k}) which outputs a scalar value. Here, the objective is to output h_t with the previous k conscious states. We want to define an objective function or reward function that uses the attended conscious elements in a way in which they can be quantified and optimized for.
The two mechanisms which map the high-level state representation to an objective function are:
the attention mechanism in the consciousness RNN which selects and combines a few elements from the high-level state representation into a low-dimensional consciousness "sub-state" objects
the predictions or actions derived from the sequence of these conscious sub-states
The difficulty is finding a way for the algorithm to pay attention to the most useful elements. Some form of entropy may be needed to make the attention mechanism stochastic.
There is also a link between consciousness and a natural language utterance. An externally provided sentence could elicit an associated conscious state. Thought the conscious state is a richer object (high dimensions?) than the uttered sentence. Think about mapping consciousness to sentences, there is always a loss of information. There also needs to be some context, as the same sentence could be interpreted differently depending on the context. This could be done with another RNN, which maps a conscious state to an utterance: u_t = U(c_t, u_{t-1}).
You can think of this as another regularization term, the loss of information from consciousness to utterance. A sentence focuses only on a handful of elements and concepts, unlike our full internal consciousness.
This can be used in unsupervised reinforcement learning, testing its ability to discover high-level abstractions, e.g. using an intrinsic reward that favours the discovery of how the environment works.
Conclusion with out standing questions
Outstanding questions
• A global workspace serves to flexibly connect neural representations arising in
multiple separate modules. Is there a minimal number of modules feeding into
the workspace? When does bimodal, trimodal, multimodal integration become
a “global workspace”?
• Can we identify neurons, e.g. in frontal regions, that incarnate copies of the
various latent spaces? This may explain the numerous reports of sensory and
multimodal neuronal responses in frontal cortex.
• Is cycle-consistency implemented in the brain? If yes, does it correspond to a
form of predictive coding?
• Could synesthesia be the consequence of an exaggerated or overactive translation between domains, crossing the threshold of perception instead of acting
as a background process?
• How does attention learn to select the relevant information to enter the GLW?
What is the corresponding objective function? Many candidates exist and
could be tested: self-prediction, free energy, survival, reward of a RL agent,
metalearning (learning progress), etc.
• How can newly learned tasks or modules be connected to an existing GLW?
Requirements include: a new “internal copy” with a new (learned) attention
mechanism to produce keys for the latent space, new (learned) translations to
the rest of the workspace.
- By Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang Microsoft Research, Beijing 100080, China. Beihang University, Beijing 100191, China Paper Link Abstract Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for...
Angle based detection By Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek Ludwig-Maximilians-Universität München Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By Yue Zhao Zain Nasrullah Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada Zheng Li jk Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing. Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based sim...
-By Xinghua Zheng1, Ming Tang1, Hankz Hankui Zhuo1*, Kevin X. Wen Paper Link Abstract Bike Sharing Systems (BSSs) have been adopted in many major cities of the world due to traffic congestion and carbon emissions. Although there have been approaches to exploiting either bike trailers via crowdsourcing or carrier vehicles to reposition bikes in the “right” stations in the “right” time, they do not jointly consider the usage of both bike trailers and carrier vehicles. In this paper, we aim to take advantage of both bike trailers and carrier vehicles to reduce the loss of demand with regard to the crowdsourcing of bike trailers and the fuel cost of carrier vehicles. In the experiment, we exhibit that our approach outperforms baselines in several datasets from bike sharing companies. Bike-sharing systems (BSSs) typically have a set of base stations that are strategically placed throughout a city and each station has a fixed number of docks, e.g., Capital Bike-share, ...
Comments