Word Rotator's Distance

Sho Yokoi, Ryo Takahashi ,Reina Akama, Jun Suzuki, Kentaro Inui

Abstract

A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically inferior to the simple cosine similarity between general-purpose sentence vectors. To address this issue, we focus on and demonstrate the fact that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity. Alignment-based approaches do not distinguish them, whereas sentence-vector approaches automatically use the norm as the word importance. Accordingly, we propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity using earth mover’s distance (i.e., optimal transport cost), which we refer to as word rotator’s distance. Besides, we find how to “grow” the norm and direction of word vectors (vector converter), which is a new systematic approach derived from sentence-vector estimation methods. On several textual similarity datasets, the combination of these simple proposed methods outperformed not only alignment-based approaches but also strong baselines.

Semantic textual similarity (STS) is the task of measuring the degree of semantic equivalence between two sentences.

For example, the sentences

“Two boys on a couch are playing video games.” and

“Two boys are playing a video game.”

are mostly equivalent (the similarity score of 4 out of 5)

while the sentences

“The woman is playing the violin.” and

“The young lady enjoys listening to the guitar.”

are not equivalent but on the same topic (score of 1).

System predictions are customarily evaluated by Pearson correlation with the gold scores. Hence, systems are only required to predict relative similarity rather than absolute scores.

There are two major approaches to tackling STS. One is to measure the degree of semantic overlap between texts by considering the word alignment, which we refer to as alignment-based approaches. The other approach involves generating general-purpose sentence vectors from two texts (typically comprising word vectors), and then calculating their similarity, which we refer to as sentence-vector approaches. Alignment-based approaches are consistent with human intuition about textual similarity, and their predictions are interpretable. However, the performance of such approaches is lower than that of sentence-vector approaches.

STS method that first decouples word vectors into their norms and direction vectors and then aligns the direction vectors using earth mover’s distance (EMD). Here, the key idea is to map the norm and angle of the word vectors to the EMD parameters probability mass and transportation cost, respectively. The proposed method is natural from both optimal transport and word embeddings perspectives, preserves the features of alignment-based methods, and can directly incorporate sentence-vector estimation methods, which results in fairly high performance.

contributions are as follows.

• Norm of a word vector implicitly encodes the importance weight of a word and that the angle between word vectors is a good proxy for the dissimilarity of words.

• A new textual similarity measure, i.e., word rotator’s distance, that separately utilizes the norm and direction of word vectors.

• To enhance the proposed WRD, we utilize a new word-vector conversion mechanism, which is formally induced from recent sentence-vector estimation methods.

• Demonstrates the proposed methods achieve high performance compared to strong baseline methods on several STS tasks

Word Mover’s Distance and its Issues

Earth Mover’s Distance

Intuitively, earth mover’s distance is the minimum cost required to turn one pile of dirt into another pile of dirt (Figure 1).

Word Mover’s Distance

Word mover’s distance (WMD) is a dissimilarity measure between texts and is a pioneering work that introduced EMD to the natural language processing (NLP) field. This study is strongly inspired by this work. We introduce WMD prior to presenting the proposed method. WMD is the cost of transporting a set of word vectors in an embedding space (Euclidean space) (Figure 2).

Proposed a simple yet powerful sentence similarity measure using EMD. The proposed method considers each sentence as a discrete distribution on the unit hypersphere and calculates EMD on this hypersphere (Figure 5). Here, the alignment of the direction vectors corresponds to a rotation on the unit hypersphere; thus, we refer to the proposed method as word rotator’s distance (WRD). Formally, we consider each sentence s as a discrete distribution νs comprising direction vectors weighted by their norm (bag-of-direction-vectors distribution)

Experiment and Predictions

Conclusion

In this paper, we first indicated

(i) that the norm and angle of word vectors are good proxies for the importance of a word and dissimilarity between words, respectively, and

(ii) that some previous alignment-based STS methods inappropriately “mix up” them. With these findings, we have proposed word rotator’s distance (WRD), which is a new unsupervised, EMD-based STS metric.

WRD was designed so that the norm and angle of word vectors correspond to the probability mass and transportation cost in EMD, respectively. In addition, we found that the latest powerful sentence vector estimation methods implicitly improve the norm and angle of word vectors and we can exploit this effect as a word vector converter (VC). In experiments on multiple STS tasks, the proposed methods outperformed not only alignment-based methods such as word mover’s distance, but also powerful addition-based sentence vectors.

SRI Blog

Search This Blog

Word Rotator's Distance

Labels

Comments

Popular posts from this blog

ABOD and its PyOD python module

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

Why Should I Trust You?. . LIME