United States

Research has begun exploring the performance of large vision language models (LVLMs) in recognizing illusions. However, studies often have not distinguished actual and apparent features, leading to ambiguous assessments of machine cognition.  
We introduce a visual question answering (VQA) dataset, categorized into genuine and fake illusions. Genuine illusions present discrepancies between actual and apparent features, whereas fake illusions have the same actual and apparent features even though they look illusory. We evaluate the performance of LVLMs for genuine and fake illusion VQA tasks and investigate whether the models discern actual and apparent features. Our findings indicate that although LVLMs may appear to recognize illusions by correctly answering questions about both feature types, they predict the same answers for both Genuine Illusion and Fake Illusion VQA questions. This suggests that their responses might be based on prior knowledge of illusions rather than genuine visual understanding.

CogSci 2025

Do Large Vision-Language Models Distinguish between the Actual and Apparent Features of Illusions?

computer science

perception

vision

neural networks

natural language processing

poster

### Welcome to CogSci Conference 2025!

The 47th Annual Meeting of the Cognitive Science Society was a hybrid meeting held in San Francisco. 

<div style="position:relative;padding-top:0;width:900px;height:500px;"><iframe style="position:absolute;border:none;width:100%;height:100%;left:0;top:0;" src="https://online.fliphtml5.com/ebtyf/amvr/"  seamless="seamless" scrolling="no" frameborder="0" allowtransparency="true" allowfullscreen="true" ></iframe></div>

#### About

The Cognitive Science Society brings together researchers from around the world who hold a common goal: understanding the nature of the human mind. The mission of the Society is to promote Cognitive Science as a discipline, and to foster scientific interchange among researchers in various areas of study, including Artificial Intelligence, Linguistics, Anthropology, Psychology, Neuroscience, Philosophy, and Education.

The Society is a non-profit professional organization and its activities include sponsoring an annual conference and publishing the journals Cognitive Science and TopiCS.

#### Our History 

* **Society Creation**<br>
The Society was incorporated as a 501(c)(3) non-profit professional organization in Massachusetts in 1979. The organizing committee included Roger Schank, Allan Collins, Donald Norman, and a number of other scholars from psychology, linguistics, computer science, and philosophy. 
<br><br>
* **Conference Creation**<br>
The first conference on cognitive science was held at La Jolla, California in August, 1979, and has occurred annually since then. The proceedings of each conference are published, and those from most years are available through Lawrence Erlbaum Associates, Inc. The annual proceedings of the Cognitive Science Conference represent a major source of information on new work and new ideas in the scientific study of thinking. In 1990, the Society, with help from an anonymous donor, established the David Marr Prize for the best student paper at each annual meeting.
<br><br>
* **Journal Creation**<br>
The Journal, Cognitive Science, began publication in 1976, and is now published by Wiley-Blackwell. The Executive Editor is currently Richard P. Cooper of Birkbeck, University of London, and there are 18 Associate Editors and a 30-member editorial board. It serves as the premier outlet for research reports that intersect two or more disciplines. Copyrights for articles published in the journal are held by the Society. The Governing Board of the Cognitive Science Society voted in late 2006 to found a new journal, Topics in Cognitive Science (topiCS). The Editor in Chief is Wayne Gray, Cognitive Science Department, Rensselaer Polytechnic Institute. The journal seeks to fill a niche not occupied by Cognitive Science Journal or other cognitive science journals. Membership in the Society includes a subscription to Cognitive Science and TopiCS. Copyrights for articles published in the journal are held by the Society.
<br><br>

#### Code of Conduct

By attending the CogSci 2025 Conference, you are required to adhere to the society’s **[Code of Conduct](https://drive.google.com/file/d/1ChPuihLy6jE_BWqfO7J2KKgX35JW2zsM/view?usp=sharing)**.
<br><br>


You need to log in with the email address you registered with. 

Login credentials were sent to you from Underline -  subject line "Welcome to the CogSci 2025 Conference". Please be sure to check your spam/promotional inbox  if you do not see an email confirmation right away.





Please log in to join this event.

To access the site, please register [**here**](https://cognitivesciencesociety.org/registration/).

If you are registered and feel like you are seeing this message by mistake, please make sure you are logged in with the same email that you registered with. 

Please register!

The 47th Annual Meeting of the Cognitive Science Society presents the latest research across cognitive science and highlights the theme of Cognition in Context.

We explore how time pressure affects accuracy at various stages of learning in a complex, dynamic task: the game of Tetris. We emulate human decision-making processes under time pressure in several reinforcement learning models by training them under time pressures present on humans. Subsequently, we compare the performance and the behavior of human players against the ones demonstrated by AI players of equivalent skill. At the surface level, the AI models are able to achieve human-like performance levels at different stages of expertise. However, when probed at lower levels, we find that their behavior and strategies are considerably different from the ones employed by human experts. Examining why and how the models differ from humans highlights the promise of using AI models to study the nuances of human decision-making in dynamic tasks, along with the need to explain both human and AI performance at multiple performance levels for accurate understanding.

The Need for Speed? Exploring the Contribution of Motor Speed to Expertise in a Complex, Dynamic Task

Humans who use a left-to-right writing system often associate smaller numbers with the left side of space and larger numbers with the right. Whether this left-to-right number-space mapping is innate or culturally learned is unclear. Here, we test whether monkeys who lack human cultural practices show a left-to-right number-space mapping. Previous work in monkeys has found mixed evidence on whether monkeys show a left-to-right bias in their number-space mappings. Replicating the methods of Drucker and Brannon (2014), monkeys were trained to touch the fourth circle from the bottom in a vertical array of five circles. Then, they were tested with a horizontal array of five circles. Overall, monkeys showed no preference for the fourth circle from the left compared to the fourth from the right. This suggests monkeys may not have a directionality bias for number-space mappings. Therefore, the left-to-right bias in humans may be due to specific cultural practices.

Rhesus monkeys show no preference for a left-to-right number-space mapping

Studying large language models (LLMs) can provide valuable insights into their strengths and limitations. This study explores problem-solving capabilities of GPT-4 by comparing the model’s performance in solving Black Stories riddles, to human performance. The study utilized a set of 12 adjusted Black Stories, each tested twice within the human and GPT-4 group. The experiment was conducted through text messaging for a comparable set-up. The primary measure of performance was the number of questions and hints needed to solve the riddle. Results indicated no significant difference between the groups. Qualitative results showed that GPT-4 excelled in precise questioning and creativity but often fixated on details. Humans covered broader topics and adapted the focus quickly but struggled with uncommon details. This research suggests that despite different approaches, GPT-4’s performance was comparable to that of humans, demonstrating its potential as a capable participant in these types of problem solving games.

The Black Stories Experiment: Two Groups are Trying to Solve a Riddle Game Behind a Screen, Only One Group Is Alive

Deepfake videos challenge the quality of information in deliberative democracies. In a mixed-methods study, we examine the role of emotions in the detection of political deepfakes by focusing on trust, empathy, and inspiration to assess how deepfakes influence public perception and engagement with political communication. The research unfolds in two phases: an initial qualitative investigation through 3 focus groups (N = 13), followed by a quantitative survey (N = 261) where focus group insights inform the design and interpretation of the quantitative study. Participants were exposed to real, ChatGPT-generated, and historical speeches presented in modern contexts to gauge perceived authenticity and emotional responses, including trust, empathy, and inspiration. Results indicate no significant difference in perceived authenticity between real and deepfake content, with both eliciting comparable emotional reactions. The quantitative analysis reveals a marginal negative correlation between exposure to deepfakes and trust in political communication. Qualitative findings emphasize the influence of contextual cues and pre-existing biases, showing participants often prioritized emotional resonance over technical accuracy when evaluating content. The study highlights the intricate relationship between AI-generated media and public perception, underscoring the necessity for nuanced regulatory policies and improved media literacy to mitigate the impact of Deepfakes on public trust.

Unmasking political deception: Investigating the Discernment and Emotional Impact of Deepfake Political Speeches Featuring American Presidential Candidates

Negation is a linguistic universal, and its processing is often assumed to require extra cognitive steps: representing an idea and then suppressing it (Kaup et al., 2006).
Recently, it has been shown that linguistic negation processing resembles basic conflict processing in both behavioral and electrophysiological data (Dudschig & Kaup,
2018, 2020), in line with standard conflict tasks (i.e. Stroop Task, see Botvinick
et al., 2001). The present study implements mouse tracking to allow the analysis
of fine-grained changes in responses during negation processing (e.g., trajectories,
deviations from the ideal path, and partial errors). Participants responded to affirmative and negated phrases. The key dependent measures were influenced by the
polarity (affirmative vs. negated) of the phrases on current trials (indicating the
activation of the to-be-negated information) and by the polarity of preceding trials
(indicating negation processing is context dependent). The theoretical implications
in light of negation and conflict processing accounts will be discussed.

Investigating the parallels of Negation and Conflict using a Mouse Tracking Paradigm

Math is all around us, but propensity to notice the role it plays in everyday life might differ from person to person. Here, we test whether children with broader conceptions of math experience lower levels of math anxiety. In Study 1, we gathered data from 98 Indian middle schoolers in Vadodara, Gujarat. Children who categorized more activities in a provided list as “math” demonstrated more positive attitudes towards math on a math anxiety scale. We also found that breadth of math category predicted how skilled children believed themselves on activities they included in their math conception. In Study 2, we explore when these effects emerge. We tested 94 children aged 7-10. We found that while children in this range exhibit significant variability in math conception, their breadth of math conception does not predict their math anxiety. We discuss implications of our findings for interventions to mitigate math anxiety in children.

Does cooking involve ‘math’?: The relationship between math conception and math anxiety in Indian elementary and middle-school students

Recent analyses of human creativity and curiosity have identified the existence of three styles of exploration: busybody, hunter, and dancer. These styles were recognized largely by observing participants’ explorations within a task, converting those observations into networks, and measuring networks’ properties. But do these exploration styles still appear across different tasks? And when graph-based descriptors of an individual’s exploration style are identified, how well do they transfer to similar tasks? We study inter- and intra-individual differences in two similar, but distinct, word association tasks: Chain Free Association and Semantic Fluency. We demonstrate that in some cases, graph-theoretic features do seem to capture individual semantic exploration patterns across tasks. Furthermore, our results provide evidence supporting the existence of the dancer style and its relationship to the Busybody-Hunter score. These findings highlight the potential of graph analysis as a tool for characterizing and exploring individual cognitive styles in semantic tasks.

Curiosity Exploration Styles in Word Association Tasks

Understanding how data visualizations shape reader takeaways is critical for designing effective displays, but measuring these affordances remains a challenge. While free-response studies provide a rich source of human interpretations, they are costly to analyze and often contain ambiguities. We investigate alternative elicitation methods, including ranking charts, ranking conclusions, and rating salience, to determine their effectiveness in capturing visualization affordances. Alternative approaches varied in their sensitivity to chart familiarity and specific affordance factors. Salience ratings aligned well with gold-standard affordances collected from free-responses but failed to capture chart-specific insights, while ranking methods overemphasized familiar chart types. Additionally, we compared human responses across all elicitation methods to outputs from GPT-4o to evaluate the extent to which large language models (LLMs) could replicate human-derived affordances. These findings underscore the importance of evaluating multiple elicitation methods and clarify the potential and limitations of LLMs as proxies for human interpretation.

Elicitation Strategies for Capturing Information Visualization Affordances

If a person answers a question correctly, how can we tell if the answer reflects an underlying understanding of the phenomenon, or if it is based on merely surface-level associations? Cognitive science has developed multiple tests, such as Winograd Schemas, that ostensibly require a respondent to use some kind of world/situation model rather than just associations. What then are we to make of large language models (LLMs) successes on some of these tasks? We present a series of probes to LLMs and people about everyday situations, finding that models sometimes respond correctly for the wrong reason and in other cases make seemingly 'catastrophic' mistakes by applying the wrong model--often in human-like ways. Our results suggest that probing the basis of LLMs' successes and failures can help inform human problem solving and in some cases call into question our previous tests of human understanding.

Wrong for the Right Reason? Using Successes and Failures of Large Language Models to Understand Human Thinking

Large language models (LLMs) such as ChatGPT have replaced conventional interface designs with prompt-based natural language interactions. LLMs exhibit dynamic capabilities to fulfill a broad range of tasks and ad-hoc functionalities (e.g., “rewrite these appliance installation instructions for a five-year-old”). However, their open-ended interface replaces Norman’s gulf of execution with a new cognitive challenge for end-users; namely, the gulf of envisioning clear intentions and task descriptions in prompts to obtain a desired LLM response. To address this gap, we propose a cognitive model of the Envisioning process based on protocols of generative AI prompt-based interactions. The model highlights three cognitive challenges people face when requesting help from LLMs: (1) what the task should be (intentionality gap), (2) how to give instructions to do the task (instruction gap), and (3) what to expect in the LLM’s output (capability gap). We make recommendations to narrow the gulf of envisioning in human-LLM interactions.

Downloads

Next from CogSci 2025

The Need for Speed? Exploring the Contribution of Motor Speed to Expertise in a Complex, Dynamic Task

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from CogSci 2025

The Need for Speed? Exploring the Contribution of Motor Speed to Expertise in a Complex, Dynamic Task

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads