Singapore

As AI moves into high-stakes, human-centered settings, we still lack clear evidence on when and why these systems succeed or fail. This meta-analysis synthesizes all empirical studies published between 2022 and 2025 that use social-media data to predict depression, quantifying pooled accuracy and testing study-level moderators. By showing how data sources and model architecture shape outcomes, we offer an empirical foundation for a more reliable, socially aware deployment of AI in mental health.

Across 67 studies, overall performance is strong (pooled r ≈ 0.80) and climbs even higher in 2024, driven by deep, transformer-based and multimodal systems. The gains, however, are uneven: post-level binary detectors improve the most, user-level severity estimation still lags, and results hinge as much on label provenance and platform context as on model size—highlighting a persistent gap between leaderboard success and clinically meaningful reliability.

To address that gap, we propose a Psych-Aligned Evaluation Framework that maps predictions onto validated symptom dimensions and adds three deployment-critical tests—PHQ error, temporal stability, and clinician agreement. This framework converts single-number benchmarks into a multidimensional yardstick for real-world, psychologically meaningful depression detection.

AAAI 2026 Main Conference

AI in the Wild: A Meta-Analytic Evaluation of Depression Detection from Social Media Data

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Real-world adoption of closed-loop insulin delivery systems (CLIDS) in type 1 diabetes remains low, driven not by technical failure, but by diverse behavioral, psychosocial, and social barriers. We introduce ChatCLIDS, the first benchmark to rigorously evaluate LLM–driven persuasive dialogue for health behavior change. Our framework features a library of expert-validated virtual patients, each with clinically grounded, heterogeneous profiles and realistic adoption barriers, and simulates multi-turn interactions with nurse agents equipped with a diverse set of evidence-based persuasive strategies. ChatCLIDS uniquely supports longitudinal counseling and adversarial social influence scenarios, enabling robust, multi-dimensional evaluation. Our findings reveal that while larger and more reflective LLMs adapt strategies over time, all models struggle to overcome resistance, especially under realistic social pressure. These results highlight critical limitations of current LLMs for behavior change, and offer a high-fidelity, scalable testbed for advancing trustworthy persuasive AI in healthcare and beyond.

ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care

Culture shapes the objects people use and for what purposes, yet mainstream Vision-Language (VL) datasets frequently exhibit cultural biases, disproportionately favoring higher-income, Western contexts. This imbalance reduces model generalizability and perpetuates performance disparities, especially impacting lower-income and non-Western communities. To address these disparities, we propose a novel function-centric framework that categorizes objects by the functions they fulfill, across diverse cultural and economic contexts. We implement this framework by creating the Culture Affordance Atlas, a re-annotated and culturally grounded restructuring of the Dollar Street dataset spanning 46 functions and 288 objects. Through extensive empirical analyses using the CLIP model, we demonstrate that function-centric labels substantially reduce socioeconomic performance gaps between high and low-income groups by a median of 6 pp (statistically significant), improving model effectiveness for lower income contexts. Furthermore, our analyses reveals numerous culturally essential objects that are frequently overlooked in prominent VL datasets. Our contributions offer a scalable pathway toward building inclusive VL datasets and equitable AI systems.

Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping

Detecting AI-generated images remains a formidable challenge due to the difficulty of generalizing across novel generative models and paradigms. This generalization gap mainly stems from overfitting to semantic content and model-specific patterns. Moreover, many state-of-the-art detectors employ complex architectures and heavy computational procedures, limiting their practicality in real-world deployments. We propose **RealNet**, a novel unsupervised framework that learns a disentangled, forgery-aware representation space solely from real images, mitigating overfitting to both semantic and model-specific information. Our approach extracts semantic-agnostic representations via a dual adversarial denoising mechanism, yielding compact, low intra-class variance features. These are perturbed in feature space to produce pseudo-negative samples for training a lightweight discriminator, enabling robust detection without dependence on fake samples. Extensive evaluation across diverse generative paradigms, including an expanded benchmark of state-of-the-art VAR-based models, demonstrates RealNet’s superior generalization capabilities and robustness. It delivers remarkable **4.51\%** and **3.93\%** average improvements in accuracy and average precision over current state-of-the-art methods, all while incurring low computational cost through its lightweight and unsupervised design. Additionally, we introduce a medically-relevant forged image dataset, confirming RealNet’s effectiveness in high-stakes, domain-shifted scenarios. These advantages make RealNet a practical and scalable solution for AI-generated image detection with strong potential for real-world and social impact.

RealNet: Efficient and Unsupervised Detection of AI-Generated Images via Real-Only Representation Learning

Large language models (LLMs) are increasingly used to simulate human behavior in high-stakes social settings such as legal mediation, negotiation, and dispute resolution. However, it remains unclear if the LLM-based models simulating human behavior accurately represent the underlying psychological mechanisms. Human personality, for instance, may shape how individuals navigate social interactions, including strategic choices and behaviors in emotionally charged interactions. This raises a critical question: Can LLMs, when prompted with personality traits, reproduce personality-driven differences in human conflict behavior? To explore this, we introduce an evaluation framework that compares human-human and LLM-LLM behaviors in dispute resolution dialogues concerning Big Five Inventory personality traits. Our contributions include introducing a novel methodology to create a dataset of human and LLM dialogs with matched scenarios and personality traits and a set of interpretable metrics capturing strategic and conflict outcome dynamics. We analyze three recent closed-source LLMs and show significant divergences in how personality manifests in conflict across different LLMs compared to human data, challenging the assumption that personality-prompted agents can serve as reliable behavioral proxies in socially impactful applications. Our work highlights the need for psychological grounding and rigorous validation in AI simulations before real-world use.

Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution

Predictive modeling in high-stakes domains often suffers from limited observed features due to ethical and practical constraints. To address this challenge, we propose a novel approach that formulates latent feature mining as a text-to-text propositional logic reasoning task, facilitating domain knowledge integration and improving the interpretability of latent features. We design FLAME, a domain knowledge-augmented reasoning framework for latent feature mining, offering an efficient training paradigm to strengthen the domain-specific reasoning capabilities of large language models (LLMs) for latent feature extraction. The goal of our framework is to augment observed features with inferred latent features, enhancing the performance of predictive models in downstream machine learning tasks. We validate our approach through two case studies: (1) the criminal justice system, where data collection is ethically challenging and inherently limited, and (2) the healthcare domain, where patient privacy concerns and the complexity of medical data restrict comprehensive feature collection. Experimental results demonstrate that the inferred latent features significantly enhance the performance of downstream classifiers by over 10%.

Enhancing Predictive Model Learning via Domain-Knowledge Augmented Latent Feature Mining

Clinical case reports encode temporal patient trajectories that are often underexploited by traditional machine learning methods relying on structured data. In this work, we introduce the forecasting problem from textual time series, where timestamped clinical findings--extracted via an LLM-assisted annotation pipeline--serve as the primary input for prediction. We systematically evaluate a diverse suite of models, including fine-tuned decoder-based large language models and encoder-based transformers, on tasks of event occurrence prediction, temporal ordering, and survival analysis. Our experiments reveal that encoder-based models consistently achieve higher F1 scores and superior temporal concordance for short- and long-horizon event forecasting, while fine-tuned masking approaches enhance ranking performance. In contrast, instruction-tuned decoder models demonstrate a relative advantage in survival analysis, especially in early prognosis settings. Our sensitivity analyses further demonstrate the importance of time ordering, which requires clinical time series construction, as compared to text ordering, the format of the text inputs that LLMs are classically trained on. This highlights the additional benefit that can be ascertained from time-ordered corpora, with implications for temporal tasks in the era of widespread LLM use.

Forecasting Clinical Risk from Textual Time Series: Structuring Narratives for Temporal AI in Healthcare

Automated classification of complex social survey questionnaires is crucial for large-scale social science research but faces significant reliability challenges due to intricate hierarchical label structures, severe class imbalance, semantic ambiguity, and incomplete data coverage. Conventional classification methods often struggle with these combined complexities, yielding results that lack trustworthiness. We introduce HOCM, a framework designed for trustworthy classification in complex, real-world taxonomies. It features two synergistic components: (1) memory-enhanced contrastive learning, tailored to learn robust representations from noisy, imbalanced data by leveraging quality-aware category memory banks; and (2) hierarchical uncertainty calibration, which enforces taxonomic consistency while providing reliable confidence estimates and identifying inputs falling outside well-represented known categories. Our evaluation on a large-scale, real-world social survey dataset—a challenging exemplar of our target problem class—demonstrates that HOCM maintains strong accuracy on known classes while effectively identifying uncertain cases, significantly boosting accuracy on confident predictions. Furthermore, it adeptly detects low-resource/unknown categories. HOCM provides a more reliable automated classification tool, enabling efficient expert review and enhancing the trustworthiness of analysis in domains with complex, hierarchical data.

Trustworthy Classification for Complex Social Surveys: A Memory-Enhanced Hierarchical Framework with Calibrated Uncertainty

Photovoltaic (PV) power forecasting is critical for the operation of solar power plants and the coordination of energy within power grids. This work aims to predict future PV power time series by leveraging multimodal data. While recent studies have incorporated numerical modalities such as satellite image sequences and numerical weather prediction (NWP) time series, they often overlook textual modalities—such as the spatio-temporal context of PV plants—and the potential of pretrained large language models (LLMs). In this paper, we build upon existing numerical inputs and further explore the use of spatio-temporal text prompts, generated based on plant coordinate and forecast start time, to enhance the forecasting process. We propose PV-LLM, a satellite-text-prompted framework that integrates a pretrained LLM to improve PV power forecasting. The framework consists of three key components: Text Prompt Construction, Modality-Specific Encoding, and Adaptive Prompt Tuning. First, the Text Prompt Construction module generates spatio-temporal prompts that offer high-level semantic guidance. Next, the Modality-Specific Encoding module encodes each modality according to its unique characteristics, capturing modality-specific patterns while managing varying context lengths. Finally, the Adaptive Prompt Tuning module fine-tunes the LLM to integrate multimodal embeddings, while an adaptive gating mechanism retains its pretrained knowledge. We validate the effectiveness of our proposed framework on a real-world dataset containing multiple PV plants. Experimental results demonstrate that our approach outperforms existing state-of-the-art methods.

Satellite-Text-Prompted Large Language Model for Photovoltaic Power Forecasting

Scaling long-context and agentic LLMs is increasingly
limited by memory capacity and bandwidth rather than FLOPs.
I propose an algorithmic framework for context engineering
that models placement, compression, and scheduling as
coupled optimization problems with explicit
accuracy-efficiency trade-offs. Concretely, I will develop
(1) salience-aware retention/eviction policies with
provable approximation guarantees relative to an ideal
oracle; (2) tier-dependent compression schemes that bound
error propagation across memory levels; and (3)
probabilistic prefetch/scheduling that controls tail
latency. I will evaluate on long-context language modeling
and reasoning benchmarks, isolating each component via
ablations and comparing against heuristic baselines under
controlled bandwidth/capacity regimes. Results target
improved throughput and energy metrics at near-baseline
quality, advancing principled, hardware-aware inference
without requiring custom hardware.

Algorithms for Context Engineering in LLM Inference: Optimization of Placement, Compression, and Scheduling

This paper proposes an AI-driven framework for real-time
acoustic modelling that enhances audio perception in
dynamic environments. The system combines feedback
microphones, deep learning models, and adaptive acoustic
panels to monitor and optimize room acoustics continuously.
Convolutional and recurrent neural networks estimate
reverberation and clarity metrics, while a reinforcement
learning controller adjusts panel states for optimal
intelligibility. Unlike static treatments, this closed-loop
approach adapts to changing occupancy, noise, and source
locations. The expected outcome is a robust, intelligent
acoustic system with significant applications in education,
healthcare, and immersive audio experiences.

Content not yet available

Downloads

Next from AAAI 2026 Main Conference

ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Content not yet available

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026 Main Conference

ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads