United States

Creativity assessment in science and engineering is increasingly based on both human and AI judgment, but the cognitive processes and biases behind these evaluations remain poorly understood. We conducted two experiments examining how including example solutions with ratings impact creativity evaluation, using a finegrained annotation protocol where raters were tasked with explaining their originality scores and rating for the facets of remoteness (whether the response is ``far&#39;&#39; from everyday ideas), uncommonness (whether the response is rare), and cleverness. In Study 1, we analyzed creativity ratings from 72 experts with formal science or engineering training, comparing those who received example solutions with ratings (example) to those who did not (no example). Computational text analysis revealed that, compared to experts with examples, no-example experts used more comparative language (e.g., ``better/worse&#39;&#39;) and emphasized solution uncommonness, suggesting they may have relied more on memory retrieval for comparisons. In Study 2, parallel analyses with state-of-the-art LLMs revealed that models prioritized uncommonness and remoteness of ideas when rating originality, suggesting an evaluative process rooted around the semantic similarity of ideas. In the example condition, while LLM accuracy in predicting the true originality scores improved, the correlations of remoteness, uncommonness, and cleverness with originality also increased substantially --- to upwards of $0.99$ --- suggesting a homogenization in the LLMs evaluation of the individual facets. These findings highlight important implications for how humans and AI reason about creativity and suggest diverging preferences for what different populations prioritize when rating.

CogSci 2025

How do Humans and Language Models Reason About Creativity? A Comparative Analysis

comparative analysis

creativity

psychology

artificial intelligence

natural language processing

Creativity assessment in science and engineering is increasingly based on both human and AI judgment, but the cognitive processes and biases behind these evaluations remain poorly understood. We conducted two experiments examining how including example solutions with ratings impact creativity evaluation, using a finegrained annotation protocol where raters were tasked with explaining their originality scores and rating for the facets of remoteness (whether the response is ``far'' from everyday ideas), uncommonness (whether the response is rare), and cleverness. In Study 1, we analyzed creativity ratings from 72 experts with formal science or engineering training, comparing those who received example solutions with ratings (example) to those who did not (no example). Computational text analysis revealed that, compared to experts with examples, no-example experts used more comparative language (e.g., ``better/worse'') and emphasized solution uncommonness, suggesting they may have relied more on memory retrieval for comparisons. In Study 2, parallel analyses with state-of-the-art LLMs revealed that models prioritized uncommonness and remoteness of ideas when rating originality, suggesting an evaluative process rooted around the semantic similarity of ideas. In the example condition, while LLM accuracy in predicting the true originality scores improved, the correlations of remoteness, uncommonness, and cleverness with originality also increased substantially --- to upwards of $0.99$ --- suggesting a homogenization in the LLMs evaluation of the individual facets. These findings highlight important implications for how humans and AI reason about creativity and suggest diverging preferences for what different populations prioritize when rating.

poster

### Welcome to CogSci Conference 2025!

The 47th Annual Meeting of the Cognitive Science Society was a hybrid meeting held in San Francisco. 

<div style="position:relative;padding-top:0;width:900px;height:500px;"><iframe style="position:absolute;border:none;width:100%;height:100%;left:0;top:0;" src="https://online.fliphtml5.com/ebtyf/amvr/"  seamless="seamless" scrolling="no" frameborder="0" allowtransparency="true" allowfullscreen="true" ></iframe></div>

#### About

The Cognitive Science Society brings together researchers from around the world who hold a common goal: understanding the nature of the human mind. The mission of the Society is to promote Cognitive Science as a discipline, and to foster scientific interchange among researchers in various areas of study, including Artificial Intelligence, Linguistics, Anthropology, Psychology, Neuroscience, Philosophy, and Education.

The Society is a non-profit professional organization and its activities include sponsoring an annual conference and publishing the journals Cognitive Science and TopiCS.

#### Our History 

* **Society Creation**<br>
The Society was incorporated as a 501(c)(3) non-profit professional organization in Massachusetts in 1979. The organizing committee included Roger Schank, Allan Collins, Donald Norman, and a number of other scholars from psychology, linguistics, computer science, and philosophy. 
<br><br>
* **Conference Creation**<br>
The first conference on cognitive science was held at La Jolla, California in August, 1979, and has occurred annually since then. The proceedings of each conference are published, and those from most years are available through Lawrence Erlbaum Associates, Inc. The annual proceedings of the Cognitive Science Conference represent a major source of information on new work and new ideas in the scientific study of thinking. In 1990, the Society, with help from an anonymous donor, established the David Marr Prize for the best student paper at each annual meeting.
<br><br>
* **Journal Creation**<br>
The Journal, Cognitive Science, began publication in 1976, and is now published by Wiley-Blackwell. The Executive Editor is currently Richard P. Cooper of Birkbeck, University of London, and there are 18 Associate Editors and a 30-member editorial board. It serves as the premier outlet for research reports that intersect two or more disciplines. Copyrights for articles published in the journal are held by the Society. The Governing Board of the Cognitive Science Society voted in late 2006 to found a new journal, Topics in Cognitive Science (topiCS). The Editor in Chief is Wayne Gray, Cognitive Science Department, Rensselaer Polytechnic Institute. The journal seeks to fill a niche not occupied by Cognitive Science Journal or other cognitive science journals. Membership in the Society includes a subscription to Cognitive Science and TopiCS. Copyrights for articles published in the journal are held by the Society.
<br><br>

#### Code of Conduct

By attending the CogSci 2025 Conference, you are required to adhere to the society’s **[Code of Conduct](https://drive.google.com/file/d/1ChPuihLy6jE_BWqfO7J2KKgX35JW2zsM/view?usp=sharing)**.
<br><br>


You need to log in with the email address you registered with. 

Login credentials were sent to you from Underline -  subject line "Welcome to the CogSci 2025 Conference". Please be sure to check your spam/promotional inbox  if you do not see an email confirmation right away.





Please log in to join this event.

To access the site, please register [**here**](https://cognitivesciencesociety.org/registration/).

If you are registered and feel like you are seeing this message by mistake, please make sure you are logged in with the same email that you registered with. 

Please register!

The 47th Annual Meeting of the Cognitive Science Society presents the latest research across cognitive science and highlights the theme of Cognition in Context.

Work on human reference processing has shown that, in sentences like “Mary asked her daughter Sally if she understood the assignment”, readers overwhelmingly interpret “she” as co-referring with “Sally”. This reflects perspective inference, or reasoning about who possesses at-issue information, and is inconsistent with a statistically-learned bias toward subject antecedent selections. The flexibility of inferencing is evident from the effect of manipulating the object character description (“Mary asked her tutor…”), where readers now prefer Mary as the antecedent. Until recently, these patterns have been largely unaccounted for by large language models (LLMs). Leveraging advancements in LLM interpretability techniques, the present study systematically examines how LLMs fare in relation to human judgments. We determine which layer activations impact these inferences and perturb them to causally link activations to model performance. Finally, we examine performance across training iterations, analyzing the point where subjecthood biases become evident and when more nuanced inferencing emerges.

The emergence of flexible perspective reasoning in large language models

Classifier choice has been widely studied, with previous research highlighting the influence of semantic features such as shape and animacy. This study, however, demonstrates that classifier choice—specifically the selection between general and specific classifiers—is also influenced by taxonomic categorization, where nouns are divided into three levels based on specificity: basic (e.g., “apple”), superordinate (e.g., “fruit”), and subordinate (e.g., “golden apple”). A picture naming task was conducted and our findings reveal a tendency for individuals to favor specific classifiers when nouns are at the basic than at the subordinate level. This challenges the prevalent assumption that general classifiers are predominantly chosen. We attribute this tendency to a cognitive economy principle and propose a novel explanation for classifier choice based on the theory of Uniform Information Density, a perspective previously rejected in previous studies. Overall, this research suggests new directions for investigating the cognitive and linguistic factors influencing classifier choice.

The Impact of Taxonomic Levels on Classifier Choice in Mandarin Chinese

Computational modelling offers a powerful tool for formalising psychological theories, making them more transparent, testable, and applicable in digital contexts. Yet, the question often remains: how should one computationally model a theory? We provide a demonstration of how formalisms taken from artificial intelligence can offer a fertile starting point. Specifically, we focus on the "need for competence", postulated as a key basic psychological need within Self-Determination Theory (SDT)—arguably the most influential framework for intrinsic motivation (IM) in psychology. Recent research has identified multiple distinct facets of competence in key SDT texts: effectance, skill use, task performance, and capacity growth. We draw on the computational IM literature in reinforcement learning to suggest that different existing formalisms may be appropriate for modelling these different facets. Using these formalisms, we reveal underlying preconditions that SDT fails to make explicit, demonstrating how computational models can improve our understanding of IM. More generally, our work can support a cycle of theory development by inspiring new computational models, which can then be tested empirically to refine the theory. Thus, we provide a foundation for advancing competence-related theory in SDT and motivational psychology more broadly.

Towards a Formal Theory of the Need for Competence via Computational Intrinsic Motivation

How we update our beliefs when encountering new evidence is the basis of evidential reasoning. Often, this will involve weighing up multiple pieces of evidence communicated to us by several sources (i.e., testimony). However, the testimonies of multiple sources are rarely truly independent; they may have used the same data or evidence, have the same training or background, or simply be repeating the same story as another. The nature of these dependencies among our evidence items is normatively impactful on the conclusions we should draw. Here we investigate whether participants are sensitive to such complex, yet impactful influences on their reasoning. We find a general preference for source diversity that heuristically gels with normative assertions. To our knowledge, it is the first paradigm that integrates shared background, shared evidence, and corroboration in the same design. We discuss challenges with developing and testing the intricacies of this.

Impact of sequential reports with different source dependencies

What drives an agent to explore the world while also maintaining control over the environment? 
From a child at play to scientists in the lab, intelligent agents must balance curiosity (the drive to seek knowledge) with competence (the drive to master and control the environment). 
Bridging cognitive theories of intrinsic motivation with reinforcement learning, we ask how evolving internal representations mediate the trade-off between curiosity (novelty or information gain) and competence (empowerment). 
We compare two model-based agents using handcrafted state abstractions (Tabular) or learning an internal world model (Dreamer). 
The Tabular agent shows curiosity and competence guide exploration in distinct patterns, while prioritizing both improves exploration. 
The Dreamer agent reveals a two-way interaction between exploration and representation learning, mirroring  the developmental co-evolution of curiosity and competence. 
Our findings formalize adaptive exploration as a balance between pursuing the unknown and the controllable, offering insights for cognitive theories and efficient reinforcement learning.

From Curiosity to Competence: How World Models Interact with the Dynamics of Exploration

When things are perceived clearly they can be detected with confidence. But under what conditions can one be confident that something is absent? Here we use a meta-perceptual illusion to show that confidence in absence scales not with visibility itself, but with the subjective belief that a stimulus would have been visible, if present. In two pre-registered experiments, participants detected the presence or absence of letters in frames of dynamic noise, and rated their decision confidence. Across trials, stimuli could appear bigger or smaller. Critically, while perceptual sensitivity was increased for smaller stimuli, participants’ meta-perceptual beliefs  (measured with post-experiment debriefing and prospective confidence ratings) were that larger letters were easier to detect. Accordingly, while confidence in presence scaled with objective visibility (and was therefore higher for smaller stimuli), confidence in absence scaled with beliefs about counterfactual visibility (and was therefore higher for bigger stimuli). This dissociation between the effect of stimulus size on confidence in presence and absence diminished as the experiment progressed: a sign of meta-perceptual learning. Furthermore, the effect of size on confidence in absence, but not in presence, correlated with a meta-perceptual parameter from an ideal observer model of perceptual detection, fitted to decision and response time data alone. Overall, we conclude that confidence in absence closely tracked participants’ model-derived expectations about the visibility of counterfactual stimuli.

Confidence in absence as confidence in counterfactual visibility

Number-line estimation tasks (NLETs) have been used to assess symbolic numerical skills (SNS; Booth & Siegler, 2008; Lyons & Ansari, 2015) and have also been associated with the approximate number system (ANS; Khanum et al., 2016; Wong et al., 2016). A recent study with 6–7-year-old children in Sweden (Morell-Ruiz et al., 2025) provided evidence that training NLE abilities can help bridge these two numerical systems, suggesting that the ANS may actively scaffold the development of the SNS. Building on this, we designed a novel two-choice NLET compatible with Drift Diffusion Model (DDM) fitting, allowing us to decompose children’s estimation processes into interpretable parameters. Our results show that DDM parameters significantly correlate with performance in both symbolic and nonsymbolic tasks, and that performance on the two-choice and standard NLETs is strongly correlated. These findings validate our paradigm, offering new insights into the cognitive mechanisms linking numerical representations via number-line estimation.

On the Role of Nonsymbolic and Symbolic Numeracy Skills in Number-Line Estimation Processes

Building on Shin (2022), the present study examines how Korean monolingual children comprehend suffixal passive constructions by employing a webcam eye-tracking method, aiming to test two theoretical accounts of grammatical generalisation (gradual vs. early abstraction). Twenty-eight children aged three to six, alongside 20 adults, joined picture-selection experiments paired with eye-gaze measurements. The findings indicate that children’s utilisation of passive-voice heuristics remains limited yet developing, overshadowed by well-entrenched active-voice knowledge. In particular, the eye-gaze data reveal processing challenges related to the passive voice, mainly interpretive difficulties arising from passive morphology. These results replicate those of Shin (2022), offering further support for a moderate version of each account that emphasises the pivotal role of linguistic exposure in mastering linguistic knowledge. From a methodological standpoint, this study enhances the accessibility of webcam eye-tracking research for understudied languages in the field.

Downloads

Next from CogSci 2025

The emergence of flexible perspective reasoning in large language models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES