United States

Large Language Models (LLMs) display an impressive set of capabilities in linguistic understanding. While advanced models outperform humans on certain tasks, LLM reasoning and linguistic competency differs from that of humans (Felin &amp; Holweg, 2024; Mahowald et al., 2024; Niu et al., 2024). In this study, we evaluate humans and GPT-4o on the Winograd Schema Challenge, a pronoun resolution task. We focus on Japanese, a relatively understudied language in the emergent field of human-LLM evaluation. To assess human vs. LLM performance, we manipulate task demands and content. We report three findings: (i) Humans outperform LLMs in the baseline condition, i.e. the standard pronoun resolution task. (ii) As task demands increase, both human and LLM performance on the task declines (cf. Hu &amp; Frank, 2024). (iii) We find evidence for content effects (cf. Lampinen et al., 2024): LLMs surpass humans as the content of the task is manipulated to favor LLMs.

CogSci 2025

Human and LLM performance on linguistic test: Content effect and task demands

behavioral science

comparative studies

language understanding

linguistics

Large Language Models (LLMs) display an impressive set of capabilities in linguistic understanding. While advanced models outperform humans on certain tasks, LLM reasoning and linguistic competency differs from that of humans (Felin & Holweg, 2024; Mahowald et al., 2024; Niu et al., 2024). In this study, we evaluate humans and GPT-4o on the Winograd Schema Challenge, a pronoun resolution task. We focus on Japanese, a relatively understudied language in the emergent field of human-LLM evaluation. To assess human vs. LLM performance, we manipulate task demands and content. We report three findings: (i) Humans outperform LLMs in the baseline condition, i.e. the standard pronoun resolution task. (ii) As task demands increase, both human and LLM performance on the task declines (cf. Hu & Frank, 2024). (iii) We find evidence for content effects (cf. Lampinen et al., 2024): LLMs surpass humans as the content of the task is manipulated to favor LLMs.

poster

### Welcome to CogSci Conference 2025!

The 47th Annual Meeting of the Cognitive Science Society was a hybrid meeting held in San Francisco. 

<div style="position:relative;padding-top:0;width:900px;height:500px;"><iframe style="position:absolute;border:none;width:100%;height:100%;left:0;top:0;" src="https://online.fliphtml5.com/ebtyf/amvr/"  seamless="seamless" scrolling="no" frameborder="0" allowtransparency="true" allowfullscreen="true" ></iframe></div>

#### About

The Cognitive Science Society brings together researchers from around the world who hold a common goal: understanding the nature of the human mind. The mission of the Society is to promote Cognitive Science as a discipline, and to foster scientific interchange among researchers in various areas of study, including Artificial Intelligence, Linguistics, Anthropology, Psychology, Neuroscience, Philosophy, and Education.

The Society is a non-profit professional organization and its activities include sponsoring an annual conference and publishing the journals Cognitive Science and TopiCS.

#### Our History 

* **Society Creation**<br>
The Society was incorporated as a 501(c)(3) non-profit professional organization in Massachusetts in 1979. The organizing committee included Roger Schank, Allan Collins, Donald Norman, and a number of other scholars from psychology, linguistics, computer science, and philosophy. 
<br><br>
* **Conference Creation**<br>
The first conference on cognitive science was held at La Jolla, California in August, 1979, and has occurred annually since then. The proceedings of each conference are published, and those from most years are available through Lawrence Erlbaum Associates, Inc. The annual proceedings of the Cognitive Science Conference represent a major source of information on new work and new ideas in the scientific study of thinking. In 1990, the Society, with help from an anonymous donor, established the David Marr Prize for the best student paper at each annual meeting.
<br><br>
* **Journal Creation**<br>
The Journal, Cognitive Science, began publication in 1976, and is now published by Wiley-Blackwell. The Executive Editor is currently Richard P. Cooper of Birkbeck, University of London, and there are 18 Associate Editors and a 30-member editorial board. It serves as the premier outlet for research reports that intersect two or more disciplines. Copyrights for articles published in the journal are held by the Society. The Governing Board of the Cognitive Science Society voted in late 2006 to found a new journal, Topics in Cognitive Science (topiCS). The Editor in Chief is Wayne Gray, Cognitive Science Department, Rensselaer Polytechnic Institute. The journal seeks to fill a niche not occupied by Cognitive Science Journal or other cognitive science journals. Membership in the Society includes a subscription to Cognitive Science and TopiCS. Copyrights for articles published in the journal are held by the Society.
<br><br>

#### Code of Conduct

By attending the CogSci 2025 Conference, you are required to adhere to the society’s **[Code of Conduct](https://drive.google.com/file/d/1ChPuihLy6jE_BWqfO7J2KKgX35JW2zsM/view?usp=sharing)**.
<br><br>


You need to log in with the email address you registered with. 

Login credentials were sent to you from Underline -  subject line "Welcome to the CogSci 2025 Conference". Please be sure to check your spam/promotional inbox  if you do not see an email confirmation right away.





Please log in to join this event.

To access the site, please register [**here**](https://cognitivesciencesociety.org/registration/).

If you are registered and feel like you are seeing this message by mistake, please make sure you are logged in with the same email that you registered with. 

Please register!

The 47th Annual Meeting of the Cognitive Science Society presents the latest research across cognitive science and highlights the theme of Cognition in Context.

Across two experiments (N = 256), we test children’s ability to recognize similar causal structures among mechanical systems. In Experiment 1, 4- to 7-year-olds were shown unique sets of three machine types (a causal chain, a common effect, and a common cause) and asked to judge which machines were most similar. We find that 6- to 7-year-olds, but not 4 to 5-year-olds, spontaneously match machines that share the same causal structure. However, all children relied primarily on timing cues when making similarity judgments. In Experiment 2, we control for timing cues, instead asking children to discriminate causal structure by observing an intervention on each machine. We find that, in the absence of perceptual cues, only 8- and 9-year-olds successfully matched machines based on structural similarity. We discuss potential explanations for these findings and consider ways to support recognition of common causal structure in the learning environment.

Reasoning about similar causal structures among mechanical systems

Animacy is a fundamental yet difficult to define notion in cognitive science. For example, people judging whether words refer to entities that are alive are faster and more accurate for animals (e.g., tiger) than plants (e.g., petunia) and slower and less accurate for natural abiotic entities than artifacts (e.g., cave and ocean vs. slipper and bicycle). The current study demonstrates the reliability and validity of individuals’ aliveness judgments. 169 English-speaking Americans completed the aliveness study twice.  Individual d-scores representing the difference in aliveness judgments between animals and plants at Session 1 predicted d-scores at Session 2 (r = .87, p < .001), as did d-scores representing the difference between natural abiotic entities and artifacts (r = .84, p < .001). These measures also predicted attitudes such as humans having the right to extract natural resources. Future research must address how differences in environment/culture contribute to differences in animacy cognition.

Individual differences in animacy cognition are reliable and externally valid

For recent contractualist accounts of moral cognition, moral judgments should coincide with what rational agents would agree to in a negotiation, accounting for their relative bargaining positions. But past research documents widespread egalitarian moral intuitions; impartiality may also require abstracting away from power asymmetries. How can these perspectives be reconciled? We suggest a key difference lies in whether the logic of bargaining drives the interaction, turning existing asymmetries into bargaining power differences. In Study 1, two parties engage in a take-it-or-leave-it negotiation. In Study 2, they can trade with a third party. In both cases, third-party moral judgments about the morally best split of a fixed amount overwhelmingly favor the advantaged party. They can be precisely predicted using classic models from bargaining theory. By contrast, moral intuitions are completely reversed—reflecting redistributive or egalitarian concerns—in a donation setting where the logic of bargaining does not apply.

How the logic of bargaining shapes moral judgments about resource divisions

Both humans and large language models (LLMs) perform better on some reasoning tasks when they are encouraged to think step by step. However, it is unclear whether these performance gains are based on similar principles. In this work, we investigate two hypotheses: (1) that these benefits arise due to the presence of local statistical structure in the training data, where intermediate steps of reasoning may be common but any specific reasoning trajectory is rare, and (2) that sequential processing improves reasoning by mitigating interference. Using LLMs and transformers trained on a synthetic dataset, we show how analogical distance effects previously observed in humans and LLMs may be explained by the presence of local statistical structure. Testing both humans and LLMs on a novel word analogy task, we find that interference caused by semantic similarity can hurt performance and drives humans to engage in a sequential reasoning process. Our findings show that both locality structure and interference may be key principles underlying the benefits of step-by-step thinking.

Step-by-step analogical reasoning in humans and neural networks

When conducting experimental research, the research questions are often inherently linked (and limited) to the paradigm that is used. In this paper, we present a new experimental tool -- GRIS (Generating Representations in Space) -- that builds experiments where participants can manipulate objects on a screen. Through a series of three experiments on sentence acceptability, category typicality, and multi-dimensional similarity, we demonstrate how GRIS-based experiments allow cognitive scientists to approximate representational spaces for a variety of cognitive phenomena, expanding the set of possible research questions that cognitive scientists may ask.

Generating Representations In Space with GRIS

Recent advances in generative language models, such as ChatGPT have demonstrated an uncanny ability to produce texts that appear to be comparable to those produced by humans. Nevertheless, machine generated texts differ from those produced by humans in important aspects, such as routinely including references to nonexistent sources. In this paper, we use both psycholinguistic measurements and participant responses to compare texts generated by machine with equivalent texts generated by humans. Our analysis demonstrates some of the ways in which machine-generated texts differ from human-generated ones in both style (e.g., increased use of positive connectives) and content (e.g., increased confidence). We also note multiple ways in which texts generated by these models are similar to those generated by humans (e.g., their use of emotion words). We believe this research provides insights that can be useful to understanding how language is generated by both humans and machines.

Man or Machine: Evaluations of Human and Machine-Generated Movie Reviews

Animated films can provide a context for caregivers and children to discuss death, potentially furthering children’s understanding of this concept (Bridgewater et al., 2021). However, white caregivers in the United States tend to shield their children from this topic (Rosengren et al., 2014). The purpose of the study was to delve into the dynamics of child-caregiver discussions about death. We recruited 29 children (ages 4-6) and their caregivers and observed their discussion about a death depicted in an animated film. We found that most families (92.6%) discussed death, and often mentioned affective topics (e.g., sadness; 81.5%), but some mentioned biological (33.3%) or spiritual topics (11.1%). Children's age or caregivers' reports of shielding were not linked to the content or frequency of these discussions. This study highlights how media can serve as a context for the development of spiritual and biological concepts.

Once Upon a Goodbye: Exploring How Animated Films Spark Child-Caregiver Conversations About Death

We feel uncanniness for human-looking robots (uncanny valley). Regarding the mechanism, we hypothesized that the visual system automatically tries but misses to categorize a registered object, and therefore negatively evaluates it. However, it is unclear whether the automatic categorization induced the negative evaluation since all previous studies directed participants to evaluate the object in terms of a previously categorized dimension. The present study examined whether the uncanny valley is automatically triggered by the categorization failure in a task-irrelevant dimension. Participants categorized a morphed figure, the shape of which has been known to trigger the uncanny valley, in terms of color, and evaluated the likeability of it. The uncanny valley occurred based on the task-irrelevant shape dimension of the objects, even though the preceding color categorization was successful. These findings suggest that the visual system evaluates the likeability of the registered objects in response to the automatic categorization and its failure.

The uncanny valley phenomenon triggered by a task-irrelevant dimension of objects

Conversations are intricately structured forms of social interaction in which talkers move through interconnected topics with nested levels of semantic specificity. What principles govern how conversational partners jointly navigate an expansive topic space? To characterize these dynamics, we introduce a new dataset of annotated topic shifts from N=1,505 annotators on 200 distinct video call conversations between strangers (Reece et al., 2023). Conversational dyads made stochastic but systematic transitions between topics, and within individual topics, we find that dyads begin concentrated in semantic space before dispersing to more idiosyncratic regions as topics progress. The same dispersion pattern also holds over entire conversations, providing quantitative evidence for nested levels of increasing specificity over conversations. Overall, our findings suggest that strangers get to know one another through systematic exploration of topic space, revealing hierarchical structure in idle talk.

Dynamics of topic exploration in conversation

How well students learn depends on their ability to sustain attention. However, it is currently unclear how to measure sustained attention in the classroom and relate those underlying attentional dynamics to academic engagement and performance. Here we leverage a suite of sustained attention instruments to explore how individual differences in sustained attention account for differences in learning outcomes in a university STEM course (N=248). We found that a student's ability to sustain attention predicted their subsequent academic achievement in the course. Sustaining attention was also associated with STEM-related stress, anxiety, and students' confidence in their ability to learn the course material. We are additionally exploring interaction logs from the digital textbook students used to investigate the mechanisms linking sustained attention to subsequent achievement. Together, these findings highlight the promise of studying attention and learning across timescales to advance mechanistic understanding of human cognition in real-world environments.

Downloads

Next from CogSci 2025

Reasoning about similar causal structures among mechanical systems

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from CogSci 2025

Reasoning about similar causal structures among mechanical systems

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads