Sho Yokoi, Ryo Takahashi ,Reina Akama, Jun Suzuki, Kentaro Inui
Abstract
A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically inferior to the simple cosine similarity between general-purpose sentence vectors. To address this issue, we focus on and demonstrate the fact that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity. Alignment-based approaches do not distinguish them, whereas sentence-vector approaches automatically use the norm as the word importance. Accordingly, we propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity using earth mover’s distance (i.e., optimal transport cost), which we refer to as word rotator’s distance. Besides, we find how to “grow” the norm and direction of word vectors (vector converter), which is a new systematic approach derived from sentence-vector estimation methods. On several textual similarity datasets, the combination of these simple proposed methods outperformed not only alignment-based approaches but also strong baselines.
Semantic textual similarity (STS) is the task of measuring the degree of semantic equivalence between two sentences.
For example, the sentences
“Two boys on a couch are playing video games.” and
“Two boys are playing a video game.”
are mostly equivalent (the similarity score of 4 out of 5)
while the sentences
“The woman is playing the violin.” and
“The young lady enjoys listening to the guitar.”
are not equivalent but on the same topic (score of 1).
System predictions are customarily evaluated by Pearson correlation with the gold scores. Hence, systems are only required to predict relative similarity rather than absolute scores.
There are two major approaches to tackling STS. One is to measure the degree of semantic overlap between texts by considering the word alignment, which we refer to as alignment-based approaches. The other approach involves generating general-purpose sentence vectors from two texts (typically comprising word vectors), and then calculating their similarity, which we refer to as sentence-vector approaches. Alignment-based approaches are consistent with human intuition about textual similarity, and their predictions are interpretable. However, the performance of such approaches is lower than that of sentence-vector approaches.
STS method that first decouples word vectors into their norms and direction vectors and then aligns the direction vectors using earth mover’s distance (EMD). Here, the key idea is to map the norm and angle of the word vectors to the EMD parameters probability mass and transportation cost, respectively. The proposed method is natural from both optimal transport and word embeddings perspectives, preserves the features of alignment-based methods, and can directly incorporate sentence-vector estimation methods, which results in fairly high performance.
contributions are as follows.
• Norm of a word vector implicitly encodes the importance weight of a word and that the angle between word vectors is a good proxy for the dissimilarity of words.
• A new textual similarity measure, i.e., word rotator’s distance, that separately utilizes the norm and direction of word vectors.
• To enhance the proposed WRD, we utilize a new word-vector conversion mechanism, which is formally induced from recent sentence-vector estimation methods.
• Demonstrates the proposed methods achieve high performance compared to strong baseline methods on several STS tasks
Word Mover’s Distance and its Issues
Earth Mover’s Distance
Intuitively, earth mover’s distance is the minimum cost required to turn one pile of dirt into another pile of dirt (Figure 1).
Word mover’s distance (WMD) is a dissimilarity measure between texts and is a
pioneering work that introduced EMD to the natural language processing (NLP) field. This study
is strongly inspired by this work. We introduce
WMD prior to presenting the proposed method.
WMD is the cost of transporting a set of word
vectors in an embedding space (Euclidean space)
(Figure 2).
Proposed a simple yet powerful sentence similarity measure using EMD. The proposed method considers each sentence as a discrete distribution on the unit hypersphere and calculates EMD on this hypersphere (Figure 5). Here, the alignment of the direction vectors corresponds to a rotation on the unit hypersphere; thus, we refer to the proposed method as word rotator’s distance (WRD). Formally, we consider each sentence s as a discrete distribution νs comprising direction vectors weighted by their norm (bag-of-direction-vectors distribution)
Experiment and Predictions
Comments