
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

poster
Vector Spaces for Quantifying Disparity of Multiword Expressions in Annotated Text
keywords:
vector space
multiword expression
disparity
diversity
Multiword Expressions (MWEs) make a good case study for linguistic diversity due to their idiosyncratic nature. Defining MWE canonical forms as types, may be measured notably through disparity, based on pairwise distances between types. To this aim, we train static MWE-aware word embeddings for verbal MWEs in 14 languages, and we show interesting properties of these vector spaces. Then use these vector spaces to implement the so-called functional diversity measure. We apply this measure to the results of several MWE identification systems. We find that, although MWE vector spaces are meaningful at a local scale, the disparity measure aggregating them at a global scale strongly correlates with the number of types, which questions its usefulness in presence of simpler diversity metrics such as variety. We make the vector spaces we generated available.