United States

When presented with a yes-no question, humans tend to say &#39;yes’ regardless of the ground truth. This &#39;yes-bias&#39; can be attributed either to the social pressure to agree with an interlocutor or simply to the tendency to mimic the distribution of the input data. Here, we estimate &#39;yes-no’ response bias in language models (LMs), with the goal of distinguishing the two theories, and explore two strategies for bias correction. We develop two yes-no question datasets derived from existing world knowledge datasets, and test 16 open-weight LMs. We find that LMs often show response bias on yes-no questions, but that it is highly variable, deviating from bias observed in humans. We further present a novel bias correction method, which eliminates bias and improves model performance. Evidence of non-humanlike response bias in LMs informs us on the source of yes-bias in humans, and the efficacy of our bias correction method holds promise for LM evaluation.

CogSci 2025

Estimating and Correcting Yes-No Bias in Language Models

language and thought

computational modeling

artificial intelligence

machine learning

reasoning

When presented with a yes-no question, humans tend to say 'yes’ regardless of the ground truth. This 'yes-bias' can be attributed either to the social pressure to agree with an interlocutor or simply to the tendency to mimic the distribution of the input data. Here, we estimate 'yes-no’ response bias in language models (LMs), with the goal of distinguishing the two theories, and explore two strategies for bias correction. We develop two yes-no question datasets derived from existing world knowledge datasets, and test 16 open-weight LMs. We find that LMs often show response bias on yes-no questions, but that it is highly variable, deviating from bias observed in humans. We further present a novel bias correction method, which eliminates bias and improves model performance. Evidence of non-humanlike response bias in LMs informs us on the source of yes-bias in humans, and the efficacy of our bias correction method holds promise for LM evaluation.

poster

### Welcome to CogSci Conference 2025!

The 47th Annual Meeting of the Cognitive Science Society was a hybrid meeting held in San Francisco. 

<div style="position:relative;padding-top:0;width:900px;height:500px;"><iframe style="position:absolute;border:none;width:100%;height:100%;left:0;top:0;" src="https://online.fliphtml5.com/ebtyf/amvr/"  seamless="seamless" scrolling="no" frameborder="0" allowtransparency="true" allowfullscreen="true" ></iframe></div>

#### About

The Cognitive Science Society brings together researchers from around the world who hold a common goal: understanding the nature of the human mind. The mission of the Society is to promote Cognitive Science as a discipline, and to foster scientific interchange among researchers in various areas of study, including Artificial Intelligence, Linguistics, Anthropology, Psychology, Neuroscience, Philosophy, and Education.

The Society is a non-profit professional organization and its activities include sponsoring an annual conference and publishing the journals Cognitive Science and TopiCS.

#### Our History 

* **Society Creation**<br>
The Society was incorporated as a 501(c)(3) non-profit professional organization in Massachusetts in 1979. The organizing committee included Roger Schank, Allan Collins, Donald Norman, and a number of other scholars from psychology, linguistics, computer science, and philosophy. 
<br><br>
* **Conference Creation**<br>
The first conference on cognitive science was held at La Jolla, California in August, 1979, and has occurred annually since then. The proceedings of each conference are published, and those from most years are available through Lawrence Erlbaum Associates, Inc. The annual proceedings of the Cognitive Science Conference represent a major source of information on new work and new ideas in the scientific study of thinking. In 1990, the Society, with help from an anonymous donor, established the David Marr Prize for the best student paper at each annual meeting.
<br><br>
* **Journal Creation**<br>
The Journal, Cognitive Science, began publication in 1976, and is now published by Wiley-Blackwell. The Executive Editor is currently Richard P. Cooper of Birkbeck, University of London, and there are 18 Associate Editors and a 30-member editorial board. It serves as the premier outlet for research reports that intersect two or more disciplines. Copyrights for articles published in the journal are held by the Society. The Governing Board of the Cognitive Science Society voted in late 2006 to found a new journal, Topics in Cognitive Science (topiCS). The Editor in Chief is Wayne Gray, Cognitive Science Department, Rensselaer Polytechnic Institute. The journal seeks to fill a niche not occupied by Cognitive Science Journal or other cognitive science journals. Membership in the Society includes a subscription to Cognitive Science and TopiCS. Copyrights for articles published in the journal are held by the Society.
<br><br>

#### Code of Conduct

By attending the CogSci 2025 Conference, you are required to adhere to the society’s **[Code of Conduct](https://drive.google.com/file/d/1ChPuihLy6jE_BWqfO7J2KKgX35JW2zsM/view?usp=sharing)**.
<br><br>


You need to log in with the email address you registered with. 

Login credentials were sent to you from Underline -  subject line "Welcome to the CogSci 2025 Conference". Please be sure to check your spam/promotional inbox  if you do not see an email confirmation right away.





Please log in to join this event.

To access the site, please register [**here**](https://cognitivesciencesociety.org/registration/).

If you are registered and feel like you are seeing this message by mistake, please make sure you are logged in with the same email that you registered with. 

Please register!

The 47th Annual Meeting of the Cognitive Science Society presents the latest research across cognitive science and highlights the theme of Cognition in Context.

In order to fulfill goals, humans make use of cognitive control, which is a suite of processes to plan and manage thoughts and actions. One such process is response inhibition, which entails stopping a response when an action becomes inappropriate. Traditionally, response inhibition is measured in experimental settings in which humans have unilateral responsibility for inhibiting the action. However, in the real world, humans are increasingly sharing control with artificial intelligence (AI), with the paradigmatic case being partially automated vehicles. We designed an experiment that includes some aspects of partially automated vehicles and found that when humans share control with an AI that often but does not always stop, human response inhibition is significantly slowed even when the AI does not intervene. This reveals a cost of sharing control to human cognitive control, suggesting that the benefits of partial automation should be weighed against the costs of impaired human control.

Shared control impairs cognitive control: Human responses inhibition slows when machines fail to inhibit 

Many languages mark either accusative case (for objects of transitives) or ergative case (for subjects of transitives), but some `split ergative' languages mix the two systems depending on the type of nominal. It has been noted that these languages tend towards marking the less frequent case for each nominal type. This raises the question of what mechanism could underlie the emergence of such an efficient system. We propose a model that can provide an explanation, based on a simple reinforcement learning framework and simple assumptions about asymmetries between the kinds of nominals (e.g., pronouns vs. full noun phrases) that appear in subject vs. object position.

Reinforcement learning produces efficient case-marking systems

Learning language requires learning not only the content of language, but also how to use language to communicate. Iterated reference games provide a window into such skills, requiring rich communication as participants converge on mutually understandable names for initially novel referents. Some early experiments are interpreted as evidence that 4-5-year-old children cannot converge to the mutually understandable names needed to communicate in an iterated reference game. Here, we revisit young children's referential communicative abilities using a simpler, child-friendly paradigm. Across 51 pairs of children, we found that 4-5-year-olds successfully established reference with each other. Children were 85% accurate, and they often used descriptions similar to their partner's. These findings suggest that children’s capacity to construct effective referring expressions in novel contexts emerges earlier than once thought, consistent with the view that children show early pragmatic competence in supportive contexts.

Preschoolers can form conventional pacts with each other to communicate about novel referents

Previous studies observed that neural network models develop numerosity-selective units when trained to perform object classification, without explicit training on numerosity. However, the emergentist view was challenged by the finding that selectivity disappears with larger sample sizes for model evaluation. Here, we investigate whether this finding was due to the qualitative visual mismatch between training and evaluation data. We present experiments with three types of neural networks, optimized either for object classification, numerosity, or both. Using a novel dataset in which both training and evaluation images include daily-life objects, we analyze layer and single-unit selectivity on a range of conditions, varying the visual properties of our evaluation images. Our results suggest that numerosity classification performance is exclusive to numerosity trained networks. Moreover, we observe a discrepancy between single-unit numerosity selectivity, compared to overall network performance. This suggests that numerosity may be represented through different encoding patterns than previously assumed.

CNNs Generalize Numerosity Across Naturalistic Stimuli Without Single-Unit Selectivity

When first meeting somebody, we’re faced with the challenge of “getting to know them.” Why do some questions seem to enable this better than others? In Experiment 1, participants (N=185) evaluated a large bank of conversational questions. We found that questions varied along a reliable latent dimension of interpersonal depth ranging from “small talk” to “deep” questions. In Experiment 2 (N=188), participants answered a subset of these questions along with a number of self-report personality scales. Using a language model to estimate how informative participants’ free responses were, we find that individualized personality predictions were more accurate when incorporating free responses; furthermore, responses to deeper questions supported more accurate personality inferences than small talk. Taken together, results suggest not only that responses contained the statistical information necessary to make abstract social inferences, but also that people have accurate intuitions about which conversational topics enable learning about and connecting with others.

How do we get to know someone? Diagnostic questions for inferring personal traits

Coordination studies reveal that groups can achieve performance exceeding the sum of individual contributions (Bahrami et al., 2010). Further evidence suggests that weak coupling maximizes the benefits of coordinated problem-solving (Abney et al., 2015; Schloesser et al., 2021). This work develops a computational framework to study coordination in coupled systems. We trained two echo state networks (ESNs) to classify cepstrum-coded speech signals from nine native Japanese speakers (Kudo et al., 1999). Coupling ESN feedback during testing reveals a nonlinear relationship between joint performance and coupling: moderate coupling (feedback integrates readout states from both networks) enhances performance, whereas full coupling (feedback is swapped between networks) returns performance to that of independent networks. These results suggest that while interaction between networks can enhance performance, excessive integration may diminish the benefits of independent contributions (cf. Fusaroli et al., 2012). Our model provides a novel, formal framework for explaining interaction dynamics in collective intelligences.

Coupled echo state networks as a model of task-oriented alignment

Recent theories suggest that metacognitive development is affected by cultural context. However, cross-cultural research on metacognition is sparse and often involves verbal assessment (e.g., "How sure are you that your answer is correct?"), which might not have cross-cultural validity. The present study assessed metacognition by coding children’s naturalistic behavior in a problem-solving task. Participants had to assemble objects to build a track according to a model. We compared Kenyan, Chinese, and US children’s metacognitive strategies (N=95; 6-10-year-olds). Results revealed that Chinese children relied more on monitoring strategies (e.g., checking the model) than Kenyan and US children, whereas Kenyan children relied more on control strategies (e.g., organizing workspace) than US and Chinese children. Moreover, in all cultures, the number of metacognitive strategies used increased with age. The results suggest differences and similarities in the preferred metacognitive strategies of children across diverse societies.

Kenyan, Chinese and US children rely on different metacognitive strategies when solving a problem

An important issue in event cognition concerns how activities come to mind when people think about events (eat at a restaurant). Linear theories suggest that people think of activities in a temporally linear order, whereas hierarchical theories suggest that activities come to mind based on their centrality (i.e., importance). The current study used five network science centrality measures (CheiRank, PageRank, 2D Rank, Betweenness, and Closeness) derived from 80 temporally structured event networks to predict participants’ centrality and standardness rankings and ratings. Participants were provided with 40 events and 4-10 activities per event, and ranked or rated each activity’s centrality or standardness. Linear mixed-effect regression showed that CheiRank, which assigns importance to activities that have many influential outgoing links, was the strongest predictor. This suggests that people’s understanding of centrality relates to the degree to which an activity leads to other activities, supporting hierarchical models and the Event Horizon Model.

Using Network Science to Measure Centrality and Standardness in Event Knowledge

Humans often infer the state of the world by observing how others interact with it—when crossing a street, for instance, we may follow the movement of others without directly seeing the traffic. This ability to extract hidden information from human interactions with the environment is crucial for adaptive behavior. In this study, we explore how people make such inferences in Spot the Ball, a task where participants predict the location of a masked soccer ball in single-frame images. We created a large dataset by scraping YouTube videos, identifying compelling images using CLIP, and masking the soccer ball through inpainting. Our findings show that human participants rely heavily on pose and gaze cues to infer the ball’s location. While providing this information improves GPT-4o’s performance, it remains significantly below human accuracy. These results highlight the significance of intention inference, with potential applications in self-driving cars, assistive AI, and humanoid robotics.

Spot the ball: Inferring Hidden Information from Human Behavioral Cues

Thought experiments have been credited with generating new knowledge in the history of science. Although many parallels have been drawn between the thinking of scientists and children, it is not clear if children can generate new knowledge via thought experiments. We tested if the use of an extreme case thought experiment can help 6- to 9-year-olds to overcome the misconception that heavier rather than larger objects displace more water. A total of 70 children (MAge = 88.94 months) were assigned to a Control condition and to an Extreme Case condition designed to elicit children’s existing understanding of solidity, namely that two material objects cannot occupy the same space at the same time. Children received no feedback in either condition. We found that children in the Extreme Case condition performed better on both the Learning and Far Transfer trials, suggesting that thought experiments can serve as a learning tool in childhood.

Downloads

Next from CogSci 2025

Shared control impairs cognitive control: Human responses inhibition slows when machines fail to inhibit

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from CogSci 2025

Shared control impairs cognitive control: Human responses inhibition slows when machines fail to inhibit

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads