Singapore

Data filtering strategies are a crucial component to develop safe Large Language Models (LLM), since they support the removal of harmful contents from pretraining datasets. There is a lack of research on the actual impact of these strategies on vulnerable groups to discrimination, though, and their effectiveness has not been yet systematically addressed. In this paper we present a benchmark study of data filtering strategies for harm reduction aimed at providing a systematic evaluation on these approaches. We provide an overview 55 technical reports of English LMs and LLMs to identify the existing filtering strategies in literature and implement an experimental setting to test their impact against vulnerable groups. Our results show that the positive impact that strategies have in reducing harmful contents from documents has the side effect of increasing the underrepresentation of vulnerable groups to discrimination in datasets.

AAAI 2026

What Are They Filtering Out? An Experimental Benchmark of Filtering Strategies for Harm Reduction in Pretraining Datasets

data filtering strategies

pretraining datasets

underrepresentation

bias

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Traffic prediction serves as a cornerstone of modern intelligent transportation systems and the critical task of spatio-temporal forecasting. 
Although advanced Spatio-temporal Graph Neural Networks (STGNNs) and pre-trained models have made significant progress in traffic prediction, two critical challenges persist: (i) limited contextual capacity when handling complex spatio-temporal dependencies, and (ii) low predictability at fine-grained spatio-temporal points caused by heterogeneous patterns.
Inspired by Retrieval-Augmented Generation (RAG), we propose **RAST**, a universal framework that integrates retrieval-augmented mechanisms with spatio-temporal modeling to address these challenges.
Our framework consists of three key designs: 1) Decoupled Encoder and Query Generator to capture decoupled spatial and temporal features and construct a fusion query via residual fusion; 2) Spatio-temporal Retrieval Store and Retrievers to maintain and retrieve vectorized fine-grained patterns; and 3) Universal Backbone Predictor that flexibly accommodates pre-trained STGNNs or simple MLP predictors. 
Extensive experiments on 6 real-world traffic networks, including large-scale datasets, demonstrate that RAST achieves superior performance while maintaining computational efficiency.

A Retrieval Augmented Spatio-Temporal Framework for Traffic Prediction

As machine learning systems become increasingly integrated into human-centered domains such as healthcare, ensuring fairness while maintaining high predictive performance is critical. Existing bias mitigation techniques often impose a trade-off between fairness and accuracy, inadvertently degrading performance for certain demographic groups. In high-stakes domains like clinical diagnosis, such trade-offs are ethically and practically unacceptable. In this study, we propose a fairness-without-harm approach by learning distinct representations for different demographic groups and selectively applying demographic experts consisting of group-specific representations and personalized classifiers through a no-harm constrained selection. We evaluate our approach on three real-world medical datasets—covering eye disease, skin cancer, and X-ray diagnosis—as well as two face datasets. Extensive empirical results demonstrate the effectiveness of our approach in achieving fairness without harm.

Achieving Fairness Without Harm via Selective Demographic Experts

Fairness studies of algorithmic decision-making systems often simplify complex decision processes, such as bail or lending decisions, into binary classification tasks (e.g., approve or not approve). However, these approaches overlook that such decisions are not inherently binary; they also involve non-binary treatment decisions (e.g., loan or bail terms) that can influence the downstream outcomes (e.g., loan repayment or reoffending). We argue that treatment decisions are integral to the decision-making process and, therefore, should be central to fairness analyses.
Consequently, we propose a causal framework that extends and complements existing fairness notions by explicitly distinguishing between decision-subjects’ covariates and the treatment decisions. 
Our framework leverages path-specific counterfactual reasoning to: 
(i) measure treatment disparity and its downstream effects in historical data; and (ii) mitigate the impact of past unfair treatment decisions when automating decision-making. We use our framework to empirically analyze four widely used loan approval datasets to reveal potential disparity in non-binary treatment decisions and their discriminatory impact on outcomes, highlighting the need to incorporate treatment decisions in fairness assessments. Finally, by intervening in treatment decisions, we show that our framework effectively mitigates treatment discrimination from historical loan approval data to ensure fair risk score estimation and (non-binary) decision-making processes that benefit all stakeholders.

A Causal Framework to Measure and Mitigate Non-binary Treatment Discrimination

As AI moves into high-stakes, human-centered settings, we still lack clear evidence on when and why these systems succeed or fail. This meta-analysis synthesizes all empirical studies published between 2022 and 2025 that use social-media data to predict depression, quantifying pooled accuracy and testing study-level moderators. By showing how data sources and model architecture shape outcomes, we offer an empirical foundation for a more reliable, socially aware deployment of AI in mental health.

Across 67 studies, overall performance is strong (pooled r ≈ 0.80) and climbs even higher in 2024, driven by deep, transformer-based and multimodal systems. The gains, however, are uneven: post-level binary detectors improve the most, user-level severity estimation still lags, and results hinge as much on label provenance and platform context as on model size—highlighting a persistent gap between leaderboard success and clinically meaningful reliability.

To address that gap, we propose a Psych-Aligned Evaluation Framework that maps predictions onto validated symptom dimensions and adds three deployment-critical tests—PHQ error, temporal stability, and clinician agreement. This framework converts single-number benchmarks into a multidimensional yardstick for real-world, psychologically meaningful depression detection.

AI in the Wild: A Meta-Analytic Evaluation of Depression Detection from Social Media Data

Real-world adoption of closed-loop insulin delivery systems (CLIDS) in type 1 diabetes remains low, driven not by technical failure, but by diverse behavioral, psychosocial, and social barriers. We introduce ChatCLIDS, the first benchmark to rigorously evaluate LLM–driven persuasive dialogue for health behavior change. Our framework features a library of expert-validated virtual patients, each with clinically grounded, heterogeneous profiles and realistic adoption barriers, and simulates multi-turn interactions with nurse agents equipped with a diverse set of evidence-based persuasive strategies. ChatCLIDS uniquely supports longitudinal counseling and adversarial social influence scenarios, enabling robust, multi-dimensional evaluation. Our findings reveal that while larger and more reflective LLMs adapt strategies over time, all models struggle to overcome resistance, especially under realistic social pressure. These results highlight critical limitations of current LLMs for behavior change, and offer a high-fidelity, scalable testbed for advancing trustworthy persuasive AI in healthcare and beyond.

ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care

Culture shapes the objects people use and for what purposes, yet mainstream Vision-Language (VL) datasets frequently exhibit cultural biases, disproportionately favoring higher-income, Western contexts. This imbalance reduces model generalizability and perpetuates performance disparities, especially impacting lower-income and non-Western communities. To address these disparities, we propose a novel function-centric framework that categorizes objects by the functions they fulfill, across diverse cultural and economic contexts. We implement this framework by creating the Culture Affordance Atlas, a re-annotated and culturally grounded restructuring of the Dollar Street dataset spanning 46 functions and 288 objects. Through extensive empirical analyses using the CLIP model, we demonstrate that function-centric labels substantially reduce socioeconomic performance gaps between high and low-income groups by a median of 6 pp (statistically significant), improving model effectiveness for lower income contexts. Furthermore, our analyses reveals numerous culturally essential objects that are frequently overlooked in prominent VL datasets. Our contributions offer a scalable pathway toward building inclusive VL datasets and equitable AI systems.

Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping

Detecting AI-generated images remains a formidable challenge due to the difficulty of generalizing across novel generative models and paradigms. This generalization gap mainly stems from overfitting to semantic content and model-specific patterns. Moreover, many state-of-the-art detectors employ complex architectures and heavy computational procedures, limiting their practicality in real-world deployments. We propose **RealNet**, a novel unsupervised framework that learns a disentangled, forgery-aware representation space solely from real images, mitigating overfitting to both semantic and model-specific information. Our approach extracts semantic-agnostic representations via a dual adversarial denoising mechanism, yielding compact, low intra-class variance features. These are perturbed in feature space to produce pseudo-negative samples for training a lightweight discriminator, enabling robust detection without dependence on fake samples. Extensive evaluation across diverse generative paradigms, including an expanded benchmark of state-of-the-art VAR-based models, demonstrates RealNet’s superior generalization capabilities and robustness. It delivers remarkable **4.51\%** and **3.93\%** average improvements in accuracy and average precision over current state-of-the-art methods, all while incurring low computational cost through its lightweight and unsupervised design. Additionally, we introduce a medically-relevant forged image dataset, confirming RealNet’s effectiveness in high-stakes, domain-shifted scenarios. These advantages make RealNet a practical and scalable solution for AI-generated image detection with strong potential for real-world and social impact.

RealNet: Efficient and Unsupervised Detection of AI-Generated Images via Real-Only Representation Learning

Large language models (LLMs) are increasingly used to simulate human behavior in high-stakes social settings such as legal mediation, negotiation, and dispute resolution. However, it remains unclear if the LLM-based models simulating human behavior accurately represent the underlying psychological mechanisms. Human personality, for instance, may shape how individuals navigate social interactions, including strategic choices and behaviors in emotionally charged interactions. This raises a critical question: Can LLMs, when prompted with personality traits, reproduce personality-driven differences in human conflict behavior? To explore this, we introduce an evaluation framework that compares human-human and LLM-LLM behaviors in dispute resolution dialogues concerning Big Five Inventory personality traits. Our contributions include introducing a novel methodology to create a dataset of human and LLM dialogs with matched scenarios and personality traits and a set of interpretable metrics capturing strategic and conflict outcome dynamics. We analyze three recent closed-source LLMs and show significant divergences in how personality manifests in conflict across different LLMs compared to human data, challenging the assumption that personality-prompted agents can serve as reliable behavioral proxies in socially impactful applications. Our work highlights the need for psychological grounding and rigorous validation in AI simulations before real-world use.

Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution

Predictive modeling in high-stakes domains often suffers from limited observed features due to ethical and practical constraints. To address this challenge, we propose a novel approach that formulates latent feature mining as a text-to-text propositional logic reasoning task, facilitating domain knowledge integration and improving the interpretability of latent features. We design FLAME, a domain knowledge-augmented reasoning framework for latent feature mining, offering an efficient training paradigm to strengthen the domain-specific reasoning capabilities of large language models (LLMs) for latent feature extraction. The goal of our framework is to augment observed features with inferred latent features, enhancing the performance of predictive models in downstream machine learning tasks. We validate our approach through two case studies: (1) the criminal justice system, where data collection is ethically challenging and inherently limited, and (2) the healthcare domain, where patient privacy concerns and the complexity of medical data restrict comprehensive feature collection. Experimental results demonstrate that the inferred latent features significantly enhance the performance of downstream classifiers by over 10%.

Enhancing Predictive Model Learning via Domain-Knowledge Augmented Latent Feature Mining

Clinical case reports encode temporal patient trajectories that are often underexploited by traditional machine learning methods relying on structured data. In this work, we introduce the forecasting problem from textual time series, where timestamped clinical findings--extracted via an LLM-assisted annotation pipeline--serve as the primary input for prediction. We systematically evaluate a diverse suite of models, including fine-tuned decoder-based large language models and encoder-based transformers, on tasks of event occurrence prediction, temporal ordering, and survival analysis. Our experiments reveal that encoder-based models consistently achieve higher F1 scores and superior temporal concordance for short- and long-horizon event forecasting, while fine-tuned masking approaches enhance ranking performance. In contrast, instruction-tuned decoder models demonstrate a relative advantage in survival analysis, especially in early prognosis settings. Our sensitivity analyses further demonstrate the importance of time ordering, which requires clinical time series construction, as compared to text ordering, the format of the text inputs that LLMs are classically trained on. This highlights the additional benefit that can be ascertained from time-ordered corpora, with implications for temporal tasks in the era of widespread LLM use.

Downloads

Next from AAAI 2026

A Retrieval Augmented Spatio-Temporal Framework for Traffic Prediction

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

A Retrieval Augmented Spatio-Temporal Framework for Traffic Prediction

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads