Skip to main content

Science journalism meets artificial intelligence "Robotic Journalism"

By - Raghuram Vadapalli, Bakhtiyar Syed, Nishant Prabhu , Balaji Vasan Srinivasan, Vasudeva Varma.



Summary of Research papers from IIT Hyd





Since couple of years an exciting topic is getting attraction in Machine learning and Artificial world that is "Robot Reporter".  Today's paper got inspired by the concept. Application to science journalism is non-trivial, as that would entail understanding scientific content and translating it to simpler language without distorting underlying semantics. paper heads infant steps towards answering few challenges.

Authors came out with a tool, which, given the title and abstract of a research paper will generate a blog title by mimicking a human science journalist. The tool uses model trained on 87,328 pairs of research papers and their related blogs.





Contributions can be summed up as follows


1. A new parallel corpus of 87, 328 pairs of research paper titles and abstracts and their corresponding blog titles.

2. Demonstrating the web application, which uses a pipeline-based architecture that can generate blog titles in a step-by-step fashion,while enabling the user to choose between various heuristic functions as well as the neural model to be used for generating the blog title.

3. Analyzing the outcomes of the experiments conducted to find the best heuristic function as well as network architecture.


Architecture 








Stage 1 


uses heuristic function to analyse and extract sequence

What is Heuristic function...

The Heuristic function is a way to inform the search about the direction to a goal. It provides an informed way to guess which neighbor of a node will lead to goal. There is nothing magical about heuristic function. It must use only information that can be readily obtained about node.


Stage 2 


The pointer-generator model used to generate the output sequence from the intermediate sequences.

Sequence to Sequence (seq2seq) is a learning model that converts an input sequence into output sequence. Seq2Seq model has achieved great success in fields such as machine translation, dialog systems, question-answering.


Blog Title Generation

Heuristic functions takes title and abstract of research paper as input H(pt, abs) where pt is paper title and abs is paper abstract. Various heuristic functions were explored and are outlined below
1)pt
2)RP (TF-IDF based)
3)RD (Flesch reading ease based)
4)RPD (normalized of RD and RP)


The output of the previous step is fed into a sequence-to-sequence neural generation model in order to generate the title of the blog post.

System provides a baseline attention network which defines 'attention' over the input sequence to allow the network to focus on specific parts of the input text and the pointer-generator
The sequence s obtained from the first stage is the input to the neural natural language generation model which generates bt' as output with loss function  L(bt, bt'), given by sum of cross entropy loss at all time-steps:










Working prototype gives opportunity to play around with combination of heuristic functions and model types for generating blog title

Link for working site


Comments

Popular posts from this blog

ABOD and its PyOD python module

Angle based detection By  Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek  Ludwig-Maximilians-Universität München  Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By  Yue Zhao   Zain Nasrullah   Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada  Zheng Li jk  Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.  Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based sim...

Ownership at Large

 Open Problems and Challenges in Ownership Management -By John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Shan He, Ralf Lämmel, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers Facebook Inc.  Software-intensive organizations rely on large numbers of software assets of different types, e.g., source-code files, tables in the data warehouse, and software configurations. Who is the most suitable owner of a given asset changes over time, e.g., due to reorganization and individual function changes. New forms of automation can help suggest more suitable owners for any given asset at a given point in time. By such efforts on ownership health, accountability of ownership is increased. The problem of finding the most suitable owners for an asset is essentially a program comprehension problem: how do we automatically determine who would be best placed to understand, maintain, ev...

Hybrid Approach to Automation, RPA and Machine Learning

- By Wiesław Kopec´, Kinga Skorupska, Piotr Gago, Krzysztof Marasek  Polish-Japanese Academy of Information Technology Paper Link Courtesy DZone   Abstract One of the more prominent trends within Industry 4.0 is the drive to employ Robotic Process Automation (RPA), especially as one of the elements of the Lean approach.     The full implementation of RPA is riddled with challenges relating both to the reality of everyday business operations, from SMEs to SSCs and beyond, and the social effects of the changing job market. To successfully address these points there is a need to develop a solution that would adjust to the existing business operations and at the same time lower the negative social impact of the automation process. To achieve these goals we propose a hybrid, human-centred approach to the development of software robots. This design and  implementation method combines the Living Lab approach with empowerment through part...