Skip to main content

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

 He guo, Xiameng Qin, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding Department of Computer Vision Technology(VIS), Baidu Inc

Paper link

Abstract

Extracting entity from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts. Most of the existing works employ classical detection and recognition paradigm. Paper proposes an Entity-aware Attention Text Extraction Network called EATEN, which is an end-to-end trainable system to extract the entities without any post-processing. In the proposed framework, each entity is parsed by its corresponding entity-aware decoder, respectively. Moreover, we innovatively introduce a state transition mechanism which further improves the robustness of entity extraction. In consideration of the absence of public benchmarks, we construct a dataset of almost 0.6 million images in three realworld scenarios (train ticket, passport and business card), which is publicly available at https://github.com/beacandler/EATEN. To the best of our knowledge, EATEN is the first single shot method to extract entities from images. Extensive experiments on these benchmarks demonstrate the state-of-the-art performance of EATEN.

Recently, scene text detection and recognition, two fundamental tasks in the field of computer vision, have become increasingly popular due to their wide applications such as scene text understanding, image and video retrieval. Among these applications, extracting Entity of Interest (EoI) is one of the most challenging and practical problems, which needs to identify texts that belong to certain entities. Taking passport for example, there are many entities in the image, such as Country, Name, Birthday and so forth. In practical applications, we only need to output the texts for some predefined entities, e.g. “China” or “USA” for the entity “Country”, “Jack” or “Rose” for the entity “Name”. Previous approaches mainly adopt two steps, in which text information is extracted firstly via OCR (Optical Character Recognition), and then EoIs are extracted by handcrafted rules or layout analysis. Nevertheless, in the detection and recognition paradigm, engineers have to develop post-processing steps, which are handcrafted rules to determine which part of the recognized text belongs to the predefined EoIs. 


It’s usually the post-processing steps, rather than the ability of detection and recognition, restraints the performance of EoIs extraction. For example, if the positions of entities have a slight offset to the standard positions, inaccurate entities will be extracted due to sensitive template representation. In this paper, a single shot Entity-aware Attention Text Extraction Network (EATEN) is proposed to extract EoIs from images within a single neural network. we use a CNN-based feature extractor to extract feature maps from original image. Then we design an entity-aware attention network, which is composed of multiple entity-aware decoders, initial state warm up and state transition between decoders, to capture all entities in the image. Compared with traditional methods, EATEN is an end-to-end trainable framework instead of multi-stage procedures. EATEN is able to cover most of the corner cases with arbitrary shapes, projective/affine transformations, position drift without any correction due to the introduction of spatial attention mechanism.



The generation processes contain four steps: 
  • Text preparing. To make the synthetic images more general, we collected a large corpus including Chinese name, address, etc. by crawling from the Internet.
  • Font rendering. We select one font for a specific scenario, and the EoIs are rendered on the background template images using an open image library. Especially, in the business card scenario, we prepared more than one hundred template images containing 85 simple background and pure images with random color to render text. 
  • Transformation. We rotate the image randomly in a range of [-5, +5] degree, then resize the image according to the longer side. Elastic transformation is also employed. 
  • Noise. Gaussian noise, blur, average blur, sharpen, brightness, hue, and saturation are applied.


Compared methods We compare several baseline methods with our approach: 

(1) General OCR. A typical paradigm, OCR and matching, that firstly detects and reads all the text by OCR engine1 , and then extracts EoIs if the content of text fits predefined regular expressions or the position of text fits in designed templates. 

(2) Attention OCR . It reads multiple lines of scene text by attention mechanism and has achieved state-of-the-art performance in several datasets. We adapt it to transcribe the EoIs sequentially, using tokens to separate different EoIs. 

(3) EATEN without state transition. This method is for ablation study, to verify the efficiency of proposed state transition.



Conclusion

paper, we proposed an end-to-end framework called EATEN for extracting EoIs in images. A dataset with three real-world scenarios was established to verify the efficiency of the proposed method and to complement the research of EoI extraction. In contrast to traditional approaches based on text detection and text recognition, EATEN is efficiently trained without bounding box and full-text annotations, and directly predicts target entities of an input image in one shot without any bells and whistles. It shows superior performance in all the scenarios and shows the full capacity of extracting EoIs from images with or without a fixed layout. This study provides a new perspective on text recognition, EoIs extraction, and structural information extraction.


Comments

Popular posts from this blog

ABOD and its PyOD python module

Angle based detection By  Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek  Ludwig-Maximilians-Universität München  Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By  Yue Zhao   Zain Nasrullah   Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada  Zheng Li jk  Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.  Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based similarity measures for text data. Object o is an out

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

 - By Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang Microsoft Research, Beijing 100080, China. Beihang University, Beijing 100191, China Paper Link Abstract Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for tab

DEEP LEARNING FOR ANOMALY DETECTION: A SURVEY

-By  Raghavendra Chalapathy  University of Sydney,  Capital Markets Co-operative Research Centre (CMCRC)  Sanjay Chawla  Qatar Computing Research Institute (QCRI),  HBKU  Paper Link Anomaly detection also known as outlier detection is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions Hawkins defines an outlier as an observation that deviates so significantly from other observations as to arouse suspicion that it was generated by a different mechanism. Aim of this paper is two-fold, First is a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore the adoption of these methods