Skip to main content

Deep Learning and the Global Workspace Theory

 

Rufin VanRullen,  and Ryota Kanai

Paper Link




Abstract

Recent advances in deep learning have allowed Artificial Intelligence (AI) to reach near human-level performance in many sensory, perceptual, linguistic or cognitive tasks. There is a growing need, however, for novel, brain-inspired cognitive architectures. The Global Workspace theory refers to a large-scale system integrating and distributing information among networks of specialized modules to create higher-level forms of cognition and awareness. We argue that the time is ripe to consider explicit implementations of this theory using deep learning techniques. We propose a roadmap based on unsupervised neural translation between multiple latent spaces (neural networks trained for distinct tasks, on distinct sensory inputs and/or modalities) to create a unique, amodal global latent workspace (GLW). Potential functional advantages of GLW are reviewed, along with neuroscientific implications

Paper approach to a cognitive framework that has been proposed to underlie perception, executive function and even consciousness: the Global Workspace Theory (GWT). 

Courtesy: youtube


Timeline of Global workspace theory 

Courtesy: slideplayer.com

The GWT, initially proposed by Baars, is a key element of modern cognitive science (Figure 1A). The theory proposes that the brain is divided into specialized modules for specific functions, with long-distance connections between them. When warranted by the inputs or by task requirements (through a process of attentional selection), the contents of a specialized module can be broadcast and shared among distinct modules. According to the theory, the shared information at each moment in time—the global workspace—is what constitutes our conscious awareness. In functional terms, the global workspace can serve to resolve problems that could not be solved by a single specialized function, by coordinating multiple specialized modules.


Dehaene and colleagues proposed a neuronal version of the theory, Global Neuronal Workspace (GNW), which has become one of the major contemporary neuroscientific theories of consciousness. According to GNW, conscious access occurs when incoming information is made globally available to multiple brain systems through a network of neurons with long-range axons densely distributed in prefrontal, parietal temporal, and cingulate cortices (Figure 1B).


A neural signature of this global broadcast of information is the ignition property: an all-or-none activation of a broad network of brain regions, likely supported by long-range recurrent connections (Figure 1C).




While Y. Bengio has explicitly linked his recent “consciousness prior” theory to GWT, his proposal focused on novel theoretical principles in machine learning (e.g. sparse factor graphs). Our approach is a complementary one, in which we emphasize practical solutions to implementing a global workspace with currently available deep learning components, while always keeping in mind the equivalent mechanisms in the brain


This paper proposed a new prior to representation learning. It can be combined with other priors to disentangle abstract factors from one another. It is inspired by human consciousness, which can be thought of as a low-dimensional representation of conscious thought. Consciousness is defined as "the perception of what passes in a man's own mind".

This low-dimensionality of the representation is used as a regularizer which encourages the abstract representation to be such that when a sparse attention mechanism focuses on a few elements of the representation, the small set of variables can be combined to make a useful statement about reality or usefully condition an action or policy.

We have a recurrent neural network (RNN), h_t = F(s_t, h_{t-1}). s is the observed state, F is the representation RNN and h is the representation state. Think of F as the human brain, thus h is the high-dimensional. We want to learn good representation states which contain abstract explanatory factors. We want to be able to transform h so we can extract information about a single one of those factors.

In contrast, the conscious state, c, is very low-dimensional, derviced from h by an attention mechanism applied to h: c_t = C(h_t, c_{t-1}, z_t). z is a random noise source. You can think of c as the content of a thought. This is a small subset of all the information available to us unconsciously, but which has been brought to our awareness by the attention mechanism which uses several elements from h. C is the consciousness RNN. The random noise means the elements that get focused on have some stochasisity. Thus, the consciousness RNN is used to isolate a high-level abstraction and extract information from it. In general, C will aggregate a few factors of information - not just a single factor - into a more complex composed thought.

We want to assume this conscious thought can encapsulte a statement about the future. We do this with a verifier network, V(h_t, c_{t-k}) which outputs a scalar value. Here, the objective is to output h_t with the previous k conscious states. We want to define an objective function or reward function that uses the attended conscious elements in a way in which they can be quantified and optimized for.

The two mechanisms which map the high-level state representation to an objective function are:

the attention mechanism in the consciousness RNN which selects and combines a few elements from the high-level state representation into a low-dimensional consciousness "sub-state" objects

the predictions or actions derived from the sequence of these conscious sub-states

The difficulty is finding a way for the algorithm to pay attention to the most useful elements. Some form of entropy may be needed to make the attention mechanism stochastic.


There is also a link between consciousness and a natural language utterance. An externally provided sentence could elicit an associated conscious state. Thought the conscious state is a richer object (high dimensions?) than the uttered sentence. Think about mapping consciousness to sentences, there is always a loss of information. There also needs to be some context, as the same sentence could be interpreted differently depending on the context. This could be done with another RNN, which maps a conscious state to an utterance: u_t = U(c_t, u_{t-1}).

You can think of this as another regularization term, the loss of information from consciousness to utterance. A sentence focuses only on a handful of elements and concepts, unlike our full internal consciousness.

This can be used in unsupervised reinforcement learning, testing its ability to discover high-level abstractions, e.g. using an intrinsic reward that favours the discovery of how the environment works.

Conclusion with out standing questions

Outstanding questions 

• A global workspace serves to flexibly connect neural representations arising in multiple separate modules. Is there a minimal number of modules feeding into the workspace? When does bimodal, trimodal, multimodal integration become a “global workspace”? 

• Can we identify neurons, e.g. in frontal regions, that incarnate copies of the various latent spaces? This may explain the numerous reports of sensory and multimodal neuronal responses in frontal cortex. 

• Is cycle-consistency implemented in the brain? If yes, does it correspond to a form of predictive coding? 

• Could synesthesia be the consequence of an exaggerated or overactive translation between domains, crossing the threshold of perception instead of acting as a background process? 

• How does attention learn to select the relevant information to enter the GLW? What is the corresponding objective function? Many candidates exist and could be tested: self-prediction, free energy, survival, reward of a RL agent, metalearning (learning progress), etc. 

• How can newly learned tasks or modules be connected to an existing GLW? Requirements include: a new “internal copy” with a new (learned) attention mechanism to produce keys for the latent space, new (learned) translations to the rest of the workspace. 

Comments

Popular posts from this blog

ABOD and its PyOD python module

Angle based detection By  Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek  Ludwig-Maximilians-Universität München  Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By  Yue Zhao   Zain Nasrullah   Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada  Zheng Li jk  Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.  Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based sim...

Ownership at Large

 Open Problems and Challenges in Ownership Management -By John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Shan He, Ralf Lämmel, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers Facebook Inc.  Software-intensive organizations rely on large numbers of software assets of different types, e.g., source-code files, tables in the data warehouse, and software configurations. Who is the most suitable owner of a given asset changes over time, e.g., due to reorganization and individual function changes. New forms of automation can help suggest more suitable owners for any given asset at a given point in time. By such efforts on ownership health, accountability of ownership is increased. The problem of finding the most suitable owners for an asset is essentially a program comprehension problem: how do we automatically determine who would be best placed to understand, maintain, ev...

Hybrid Approach to Automation, RPA and Machine Learning

- By Wiesław Kopec´, Kinga Skorupska, Piotr Gago, Krzysztof Marasek  Polish-Japanese Academy of Information Technology Paper Link Courtesy DZone   Abstract One of the more prominent trends within Industry 4.0 is the drive to employ Robotic Process Automation (RPA), especially as one of the elements of the Lean approach.     The full implementation of RPA is riddled with challenges relating both to the reality of everyday business operations, from SMEs to SSCs and beyond, and the social effects of the changing job market. To successfully address these points there is a need to develop a solution that would adjust to the existing business operations and at the same time lower the negative social impact of the automation process. To achieve these goals we propose a hybrid, human-centred approach to the development of software robots. This design and  implementation method combines the Living Lab approach with empowerment through part...