CogSci 2025

August 01, 2025

San Francisco, United States

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

keywords:

concepts and categories

representation

artificial intelligence

vision

neural networks

Whether language is essential, sufficient, or a tool for numerical cognition has been hotly debated. Here, we investigate the influence of language on quantity representations by comparing embeddings from vision-only Transformer models (ViTs) and vision-language models (VLMs) exposed to image pairs depicting either the same or different stimulus quantities. If linguistic exposure stabilises quantity representations, VLMs should produce more distinct representations for image pairs with differing numerosity and more similar representations for those with identical numerosity than ViTs. We operationalized this as the variance in Cosine Similarity in response to either categorical (same/different) or continuous differences in stimulus numerosity. We find that VLMs and ViTs are sensitive to the numerosity of visual stimuli, that this sensitivity increases with layer depth, and that VLMs exhibit slightly more sensitivity to image numerosity than ViTs. This work provides initial support for the claim that linguistic exposure can, in principle, stabilise quantity representations.

Downloads

Paper

Next from CogSci 2025

Perceived legitimacy of authority influences rule endorsement and intent to comply
poster

Perceived legitimacy of authority influences rule endorsement and intent to comply

CogSci 2025

Peng Qian
Setayesh Radkani and 2 other authors

01 August 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved