Skip to main content

Word Rotator's Distance

 Sho Yokoi, Ryo Takahashi ,Reina Akama, Jun Suzuki, Kentaro Inui


Abstract

A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically inferior to the simple cosine similarity between general-purpose sentence vectors. To address this issue, we focus on and demonstrate the fact that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity. Alignment-based approaches do not distinguish them, whereas sentence-vector approaches automatically use the norm as the word importance. Accordingly, we propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity using earth mover’s distance (i.e., optimal transport cost), which we refer to as word rotator’s distance. Besides, we find how to “grow” the norm and direction of word vectors (vector converter), which is a new systematic approach derived from sentence-vector estimation methods. On several textual similarity datasets, the combination of these simple proposed methods outperformed not only alignment-based approaches but also strong baselines. 


Semantic textual similarity (STS) is the task of measuring the degree of semantic equivalence between two sentences. 

For example, the sentences 

    “Two boys on a couch are playing video games.” and

     “Two boys are playing a video game.” 

are mostly equivalent (the similarity score of 4 out of 5) 

while the sentences 

    “The woman is playing the violin.” and 

    “The young lady enjoys listening to the guitar.” 

are not equivalent but on the same topic (score of 1). 


System predictions are customarily evaluated by Pearson correlation with the gold scores. Hence, systems are only required to predict relative similarity rather than absolute scores.


There are two major approaches to tackling STS. One is to measure the degree of semantic overlap between texts by considering the word alignment, which we refer to as alignment-based approaches. The other approach involves generating general-purpose sentence vectors from two texts (typically comprising word vectors), and then calculating their similarity, which we refer to as sentence-vector approaches. Alignment-based approaches are consistent with human intuition about textual similarity, and their predictions are interpretable. However, the performance of such approaches is lower than that of sentence-vector approaches.

 STS method that first decouples word vectors into their norms and direction vectors and then aligns the direction vectors using earth mover’s distance (EMD). Here, the key idea is to map the norm and angle of the word vectors to the EMD parameters probability mass and transportation cost, respectively. The proposed method is natural from both optimal transport and word embeddings perspectives, preserves the features of alignment-based methods, and can directly incorporate sentence-vector estimation methods, which results in fairly high performance.

 contributions are as follows. 

• Norm of a word vector implicitly encodes the importance weight of a word and that the angle between word vectors is a good proxy for the dissimilarity of words. 

• A new textual similarity measure, i.e., word rotator’s distance, that separately utilizes the norm and direction of word vectors. 

• To enhance the proposed WRD, we utilize a new word-vector conversion mechanism, which is formally induced from recent sentence-vector estimation methods. 

• Demonstrates the proposed methods achieve high performance compared to strong baseline methods on several STS tasks


Word Mover’s Distance and its Issues

Earth Mover’s Distance 

    Intuitively, earth mover’s distance is the minimum cost required to turn one pile of dirt into another pile of dirt (Figure 1). 



Word Mover’s Distance

Word mover’s distance (WMD) is a dissimilarity measure between texts and is a pioneering work that introduced EMD to the natural language processing (NLP) field. This study is strongly inspired by this work. We introduce WMD prior to presenting the proposed method. WMD is the cost of transporting a set of word vectors in an embedding space (Euclidean space) (Figure 2). 

Proposed a simple yet powerful sentence similarity measure using EMD. The proposed method considers each sentence as a discrete distribution on the unit hypersphere and calculates EMD on this hypersphere (Figure 5). Here, the alignment of the direction vectors corresponds to a rotation on the unit hypersphere; thus, we refer to the proposed method as word rotator’s distance (WRD). Formally, we consider each sentence s as a discrete distribution νs comprising direction vectors weighted by their norm (bag-of-direction-vectors distribution)


Experiment and Predictions


Conclusion 
In this paper, we first indicated 
(i) that the norm and angle of word vectors are good proxies for the importance of a word and dissimilarity between words, respectively, and 
(ii) that some previous alignment-based STS methods inappropriately “mix up” them. With these findings, we have proposed word rotator’s distance (WRD), which is a new unsupervised, EMD-based STS metric. 

WRD was designed so that the norm and angle of word vectors correspond to the probability mass and transportation cost in EMD, respectively. In addition, we found that the latest powerful sentence vector estimation methods implicitly improve the norm and angle of word vectors and we can exploit this effect as a word vector converter (VC). In experiments on multiple STS tasks, the proposed methods outperformed not only alignment-based methods such as word mover’s distance, but also powerful addition-based sentence vectors.


Comments

Popular posts from this blog

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

 - By Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang Microsoft Research, Beijing 100080, China. Beihang University, Beijing 100191, China Paper Link Abstract Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for...

ABOD and its PyOD python module

Angle based detection By  Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek  Ludwig-Maximilians-Universität München  Oettingenstr. 67, 80538 München, Germany Ref Link PyOD By  Yue Zhao   Zain Nasrullah   Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada  Zheng Li jk  Northeastern University Toronto, Toronto, ON M5X 1E2, Canada I am combining two papers to summarize Anomaly detection. First one is Angle Based Outlier Detection (ABOD) and other one is python module that  uses ABOD along with over 20 other apis (PyOD) . This is third part in the series of Anomaly detection. First article exhibits survey that covered length and breadth of subject, Second article highlighted on data preparation and pre-processing.  Angle Based Outlier Detection. Angles are more stable than distances in high dimensional spaces for example the popularity of cosine-based sim...

Bike sharing Dynamic Re-positioning

-By Xinghua Zheng1, Ming Tang1, Hankz Hankui Zhuo1*, Kevin X. Wen Paper Link Abstract Bike Sharing Systems (BSSs) have been adopted in many major cities of the world due to traffic congestion and carbon emissions. Although there have been approaches to exploiting either bike trailers via crowdsourcing or carrier vehicles to reposition bikes in the “right” stations in the “right” time, they do not jointly consider the usage of both bike trailers and carrier vehicles. In this paper, we aim to take advantage of both bike trailers and carrier vehicles to reduce the loss of demand with regard to the crowdsourcing of bike trailers and the fuel cost of carrier vehicles. In the experiment, we exhibit that our approach outperforms baselines in several datasets from bike sharing companies. Bike-sharing systems (BSSs) typically have a set of base stations that are strategically placed throughout a city and each station has a fixed number of docks, e.g., Capital Bike-share, ...