Deep Learning and the Global Workspace Theory

Rufin VanRullen, and Ryota Kanai

Abstract

Recent advances in deep learning have allowed Artificial Intelligence (AI) to reach near human-level performance in many sensory, perceptual, linguistic or cognitive tasks. There is a growing need, however, for novel, brain-inspired cognitive architectures. The Global Workspace theory refers to a large-scale system integrating and distributing information among networks of specialized modules to create higher-level forms of cognition and awareness. We argue that the time is ripe to consider explicit implementations of this theory using deep learning techniques. We propose a roadmap based on unsupervised neural translation between multiple latent spaces (neural networks trained for distinct tasks, on distinct sensory inputs and/or modalities) to create a unique, amodal global latent workspace (GLW). Potential functional advantages of GLW are reviewed, along with neuroscientific implications

Paper approach to a cognitive framework that has been proposed to underlie perception, executive function and even consciousness: the Global Workspace Theory (GWT).

Courtesy: youtube

Timeline of Global workspace theory

Courtesy: slideplayer.com

The GWT, initially proposed by Baars, is a key element of modern cognitive science (Figure 1A). The theory proposes that the brain is divided into specialized modules for specific functions, with long-distance connections between them. When warranted by the inputs or by task requirements (through a process of attentional selection), the contents of a specialized module can be broadcast and shared among distinct modules. According to the theory, the shared information at each moment in time—the global workspace—is what constitutes our conscious awareness. In functional terms, the global workspace can serve to resolve problems that could not be solved by a single specialized function, by coordinating multiple specialized modules.

Dehaene and colleagues proposed a neuronal version of the theory, Global Neuronal Workspace (GNW), which has become one of the major contemporary neuroscientific theories of consciousness. According to GNW, conscious access occurs when incoming information is made globally available to multiple brain systems through a network of neurons with long-range axons densely distributed in prefrontal, parietal temporal, and cingulate cortices (Figure 1B).

A neural signature of this global broadcast of information is the ignition property: an all-or-none activation of a broad network of brain regions, likely supported by long-range recurrent connections (Figure 1C).

While Y. Bengio has explicitly linked his recent “consciousness prior” theory to GWT, his proposal focused on novel theoretical principles in machine learning (e.g. sparse factor graphs). Our approach is a complementary one, in which we emphasize practical solutions to implementing a global workspace with currently available deep learning components, while always keeping in mind the equivalent mechanisms in the brain

This paper proposed a new prior to representation learning. It can be combined with other priors to disentangle abstract factors from one another. It is inspired by human consciousness, which can be thought of as a low-dimensional representation of conscious thought. Consciousness is defined as "the perception of what passes in a man's own mind".

This low-dimensionality of the representation is used as a regularizer which encourages the abstract representation to be such that when a sparse attention mechanism focuses on a few elements of the representation, the small set of variables can be combined to make a useful statement about reality or usefully condition an action or policy.

We have a recurrent neural network (RNN), h_t = F(s_t, h_{t-1}). s is the observed state, F is the representation RNN and h is the representation state. Think of F as the human brain, thus h is the high-dimensional. We want to learn good representation states which contain abstract explanatory factors. We want to be able to transform h so we can extract information about a single one of those factors.

In contrast, the conscious state, c, is very low-dimensional, derviced from h by an attention mechanism applied to h: c_t = C(h_t, c_{t-1}, z_t). z is a random noise source. You can think of c as the content of a thought. This is a small subset of all the information available to us unconsciously, but which has been brought to our awareness by the attention mechanism which uses several elements from h. C is the consciousness RNN. The random noise means the elements that get focused on have some stochasisity. Thus, the consciousness RNN is used to isolate a high-level abstraction and extract information from it. In general, C will aggregate a few factors of information - not just a single factor - into a more complex composed thought.

We want to assume this conscious thought can encapsulte a statement about the future. We do this with a verifier network, V(h_t, c_{t-k}) which outputs a scalar value. Here, the objective is to output h_t with the previous k conscious states. We want to define an objective function or reward function that uses the attended conscious elements in a way in which they can be quantified and optimized for.

The two mechanisms which map the high-level state representation to an objective function are:

the attention mechanism in the consciousness RNN which selects and combines a few elements from the high-level state representation into a low-dimensional consciousness "sub-state" objects

the predictions or actions derived from the sequence of these conscious sub-states

The difficulty is finding a way for the algorithm to pay attention to the most useful elements. Some form of entropy may be needed to make the attention mechanism stochastic.

There is also a link between consciousness and a natural language utterance. An externally provided sentence could elicit an associated conscious state. Thought the conscious state is a richer object (high dimensions?) than the uttered sentence. Think about mapping consciousness to sentences, there is always a loss of information. There also needs to be some context, as the same sentence could be interpreted differently depending on the context. This could be done with another RNN, which maps a conscious state to an utterance: u_t = U(c_t, u_{t-1}).

You can think of this as another regularization term, the loss of information from consciousness to utterance. A sentence focuses only on a handful of elements and concepts, unlike our full internal consciousness.

This can be used in unsupervised reinforcement learning, testing its ability to discover high-level abstractions, e.g. using an intrinsic reward that favours the discovery of how the environment works.

Conclusion with out standing questions

Outstanding questions

• A global workspace serves to flexibly connect neural representations arising in multiple separate modules. Is there a minimal number of modules feeding into the workspace? When does bimodal, trimodal, multimodal integration become a “global workspace”?

• Can we identify neurons, e.g. in frontal regions, that incarnate copies of the various latent spaces? This may explain the numerous reports of sensory and multimodal neuronal responses in frontal cortex.

• Is cycle-consistency implemented in the brain? If yes, does it correspond to a form of predictive coding?

• Could synesthesia be the consequence of an exaggerated or overactive translation between domains, crossing the threshold of perception instead of acting as a background process?

• How does attention learn to select the relevant information to enter the GLW? What is the corresponding objective function? Many candidates exist and could be tested: self-prediction, free energy, survival, reward of a RL agent, metalearning (learning progress), etc.

• How can newly learned tasks or modules be connected to an existing GLW? Requirements include: a new “internal copy” with a new (learned) attention mechanism to produce keys for the latent space, new (learned) translations to the rest of the workspace.

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

- By Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang Microsoft Research, Beijing 100080, China. Beihang University, Beijing 100191, China Paper Link Abstract Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for...

SRI Blog

Search This Blog

Deep Learning and the Global Workspace Theory

Labels

Comments

Popular posts from this blog

ABOD and its PyOD python module

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

Why Should I Trust You?. . LIME