Reinforcement Learning, Bit by Bit

Other learning paradigms are about minimization;

Reinforcement learning is about maximization.

The statement quoted above has been attributed to Harry Klopf, though it might only be accurate in sentiment. The statement may sound vacuous, since minimization can be converted to maximization simply via negation of an objective. However, further reflection reveals a deeper observation. Many learning algorithms aim to mimic observed patterns, minimizing differences between model and data. Reinforcement learning is distinguished by its open-ended view. A reinforcement learning agent learns to improve its behavior over time, without a prescription for eventual dynamics or the limits of performance. If the objective takes non-negative values, minimization suggests a well-defined desired outcome while maximization conjures pursuit of the unknown

Video Courtesy:bdtechtalks.com

What happens when AI Plays Hide and Seek 500 Times

Paper By -

Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla,

Morteza Ibrahimi, Ian Osband ,Zheng Wen

Paper Link

Data Efficiency In reinforcement learning, the nature of data depends on the agent’s behavior. This bears important implications on the need for data efficiency. In supervised and unsupervised learning, data is typically viewed as static or evolving slowly. If data is abundant, as is the case in many modern application areas, the performance bottleneck often lies in model capacity and computational infrastructure. This holds also when reinforcement learning is applied to simulated environments; while data generated in the course of learning does evolve, a slow rate can be maintained, in which case model capacity and computation remain bottlenecks, though data efficiency can be helpful in reducing simulation time. On the other hand, in a real environment, data efficiency often becomes the gating factor.

Paper, Developed a framework for studying costs and benefits associated with information. I am highlighting major components of Reinforcement concepts, as the paper has in detail information of each of it.

Agents

coin tossing
dialogue

Agent-Environment Interface

Policies and Rewards

Agent State

Sources of Uncertainty

The agent should be designed to operate effectively in the face of uncertainty. It is useful to distinguish three potential sources of uncertainty:

Algorithmic uncertainty may be introduced through computations carried out by the agent. For example, the agent could apply a randomized algorithm to select actions in a manner that depends on internally generated random numbers.
Aleatoric uncertainty is associated with the unpredictability of observations that persists even when the ρ is known. In particular, given a history h and action a, while ρ(·|h, a) assigns probabilities to possible immediate observations, the realization is randomly drawn.

Epistemic uncertainty is due to not knowing the environment – this amounts to uncertainty about the observation probability function ρ, since the action and observation sets are inherent to the agent design.

Environment Proxies

Learning Targets

Cost-Benefit Analysis

Paper highlighted a number of design decisions. These determine the components of agent state, the environment proxy, the learning target, and how actions are selected to balance between immediate reward and information acquisition. Choices are constrained by memory and per-timestep computation, and they influence expected return in complex ways. In this section, we formalize the design problem and establish a regret bound that can facilitate cost-benefit analysis.

Sample-Based Action Selection
We specialize our discussion to feed-forward variants of DQN with aleatoric state St = Ot and epistemic state Pt = (θt, Bt). Here θt represents parameters of an ENN f and Bt an experience replay buffer. The epistemic state is updated according to (14) for θt and the first-in-first-out (FIFO) rule for Bt. To complete our agent definition we need to define the action selection policy from agent state Xt = (Zt, St, Pt). With this notation we can concisely review three approaches to action selection:

• -greedy: algorithmic state Zt = ∅; select At ∈ arg maxa fθt (St, Zt)[a] with probability 1 − , and a uniform random action with probability (Mnih et al., 2013).

• Thompson sampling (TS): algorithmic state Zt = Ztk , resampled uniformly at random at the start of each episode k; select At ∈ arg maxa fθt (St, Zt)[a] (Osband et al., 2016).

• Information directed sampling (IDS): algorithmic state Zt = ∅; compute action distribution vt that minimizes a sample-based estimate of the information ratio with nIDS samples; sample action At from νt.

Conclusion

The concepts and algorithms we have introduced are motivated by an objective to minimize regret. They serve to guide agent design. The resulting agents are unlikely to attain minimal regret, though these concepts may lead to lower regret than otherwise.

We have taken the learning target to be fixed and treated the target policy as a baseline. An alternative could be to prescribe a class of learning targets, with varying target policy regret. The designer might then balance between the number of bits required, the cost of acquiring those bits, and regret of the resulting target policy. This balance could also be adapted over time to reduce regret further. While the work presents an initial investigation pertaining to very simple bandit environments, leveraging concepts from rate-distortion theory, much remains to be understood about this subject. More broadly, one could consider simultaneous optimization of proxies and learning targets. In particular, for any reward function and distribution over environments, the designer could execute an algorithm that automatically selects a learning target and proxy, possibly from sets she specifies. This topic could be thought of as automated architecture design.

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

- By Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang Microsoft Research, Beijing 100080, China. Beihang University, Beijing 100191, China Paper Link Abstract Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for...

SRI Blog

Search This Blog

Reinforcement Learning, Bit by Bit

Labels

Comments

Popular posts from this blog

ABOD and its PyOD python module

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

Why Should I Trust You?. . LIME