
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

workshop paper
Evaluating Semantic Relations in Predicting Textual Labels for Images of Abstract and Concrete Concepts
keywords:
vlm
abstractness
concreteness
semantic relations
multi-modal
psycholinguistics
This study investigates the performance of SigLIP, a state-of-the-art Vision-Language Model (VLM), in predicting labels for images depicting 1,278 concepts. Our analysis across 300 images per concept shows that the model frequently predicts the exact user-tagged labels, but similarly, it often predicts labels that are semantically related to the exact labels in various ways: synonyms, hypernyms, co-hyponyms, and associated words, particularly for abstract concepts. We then zoom into the diversity of the user tags of images and word associations for abstract versus concrete concepts. Surprisingly, not only abstract but also concrete concepts exhibit significant variability, thus challenging the traditional view that representations of concrete concepts are less diverse.