United States

Previous research has shown that constraining the gradient of loss function w.r.t. model-predicted probabilities can enhance the model robustness against noisy labels.
These methods typically specify a fixed optimal threshold for gradient clipping through validation data to obtain the desired robustness against noise.
However, this common practice overlooks the dynamic distribution of gradients from both clean and noisy-labeled samples at different stages of training, significantly limiting the model capability to adapt to the variable nature of gradients throughout the training process.
To address this issue, we propose a simple yet effective approach called Optimized Gradient Clipping (OGC), which dynamically adjusts the clipping threshold based on the ratio of noise gradients to clean gradients after clipping, estimated by modeling the distributions of clean and noisy samples. 
This approach allows us to modify the clipping threshold at each training step, effectively controlling the influence of noise gradients.
Additionally, we provide statistical analysis to certify the noise-tolerance ability of OGC.
Our extensive experiments across various types of label noise, including symmetric, asymmetric, instance-dependent, and real-world, demonstrate the effectiveness of our approach.
The code and a technical appendix for better digital viewing are included as supplementary materials and scheduled to be open-sourced upon publication.

AAAI 2025

Optimized Gradient Clipping for Noisy Label Learning

learning optimization for cv

poster

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



Graph Neural Networks (GNNs) has been widely used in a variety of fields because of their great potential in representing graph-structured data. However, lacking of rigours uncertainty estimations limits their application in high-stakes. Conformal Prediction (CP) can produce statistically guaranteed uncertainty estimates by using the classifier's probability estimates to obtain prediction sets, which contains the true class with a user-specified probability. In this paper, we propose a Rank-based CP framework to GNNs (RCP-GNN) for reliable uncertainty estimates. By exploiting rank information of the classifier's outcome, prediction sets with desired coverage rate can be efficiently constructed. The strategy of CP during training with differentiable rank-based conformal loss function is further explored to adapt prediction sets according to graph topology. In this way, the composition of prediction sets can be guided by the aiming of jointly reducing inefficiency and probability estimation errors. Extensive experiments on several real-world datasets show that our model achieves any predefined target marginal coverage while significantly reducing the prediction set size (inefficiency) compared with baselines.

Trustworthy Graph Neural Networks Through Rank-Based Conformal Prediction

Masked autoencoders (MAEs) have recently demonstrated effectiveness in tabular data imputation. However, due to the inherent heterogeneity of tabular data, the uniform random masking strategy commonly used in MAEs can disrupt the distribution of missingness, leading to suboptimal performance. To address this, we propose a proportional masking strategy for masked autoencoders. Specifically, we first compute the statistics of missingness based on the observed proportions in the dataset, and then generate masks that align with these statistics, ensuring that the distribution of missingness is preserved after masking. Furthermore, we argue that simple MLP-based token mixing offers competitive or often superior performance compared to attention mechanisms while being more computationally efficient, especially given the heterogeneity of tabular data. Experimental results validate the effectiveness of the proposed proportional masking strategy across various missing data patterns in tabular datasets.
Code will be released.

To Predict or Not to Predict? Proportionally Masked Autoencoders for Tabular Data Imputation

Large multimodal language models (MLLMs) have revolutionized natural language processing and visual understanding, but often contain outdated or inaccurate information. Current multimodal knowledge editing evaluations are limited in scope and potentially biased, focusing on narrow tasks and failing to assess the impact on in-domain samples. To address these issues, we introduce ComprehendEdit, a comprehensive benchmark comprising eight diverse tasks from multiple datasets. We propose two novel metrics: Knowledge Generalization Index (KGI) and Knowledge Preservation Index (KPI), which evaluate editing effects on in-domain samples without relying on AI-synthetic samples. Based on insights from our framework, we establish Hierarchical In-Context Editing (HICE), a baseline method employing a two-stage approach that balances performance across all metrics. This study provides a more comprehensive evaluation framework for multimodal knowledge editing, reveals unique challenges in this field, and offers a baseline method demonstrating improved performance. Our work opens new perspectives for future research and provides a foundation for developing more robust and effective editing techniques for MLLMs.

ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing

Current programming models for agents lack support for engineering with information protocols. We propose Orpheus, a novel programming model for communicating agents based on protocols that is compatible with the Belief, Desire, Intention (BDI) style of programming agents. Whereas traditional models are focused on reactions to handle incoming messages, Orpheus supports organizing the business logic of an agent based on its goals.

We give an operational semantics for Orpheus and describe its implementation in Jason, the well-known BDI agent programming framework. We demonstrate how Orpheus simplifies the programming of decentralized multiagent systems compared to the reactive programming model.

Orpheus: Engineering Multiagent Systems via Communicating Agents

Confidence calibration of classification models is a technique to estimate the true posterior probability of the predicted class, which is critical for ensuring reliable decision-making in practical applications. Existing confidence calibration methods mostly use statistical techniques to estimate the calibration curve from data or fit a user-defined calibration function, but often overlook fully mining and utilizing the prior distribution behind the calibration curve. However, a well-informed prior distribution can provide valuable insights beyond the empirical data under the limited data or low-density regions of confidence scores. To fill this gap, this paper proposes a new method that integrates the prior distribution behind the calibration curve with empirical data to estimate a continuous calibration curve, which is realized by modeling the sampling process of calibration data as a binomial process and maximizing the likelihood function of the binomial process. We prove that the calibration curve estimating method is Lipchitz continuous with respect to data distribution and requires only $O(3/{{\varepsilon ^2}})$ samples. Also, a new calibration metric ($TCE_{bmp}$), which leverages the estimated calibration curve to estimate the true calibration error (TCE), is designed. $TCE_{bmp}$ is proven to be a consistent calibration measure. Furthermore, realistic calibration datasets can be generated by the binomial process modeling from a preset true calibration curve and confidence score distribution, which can serve as a benchmark to measure and compare the discrepancy between existing calibration metrics and the true calibration error. The effectiveness of our calibration method and metric are verified in real-world and simulated data. We believe our exploration of integrating prior distributions with empirical data will guide the development of better-calibrated models, contributing to trustworthy AI.

Combining Priors with Experience: Confidence Calibration Based on Binomial Process Modeling

Zero-shot Natural Language Video Localization (NLVL) aims to automatically generate moments and corresponding pseudo queries from raw videos for the training of the localization model without any manual annotations. Existing approaches typically produce pseudo queries as simple words, which overlooks the natural complexity of queries in real-world scenarios. Considering the powerful text modeling capabilities of large language models (LLMs), leveraging LLMs to generate complete queries that are closer to human descriptions is a potential solution. However, directly integrating LLMs into existing approaches introduces several issues, including insensitivity, isolation, and lack of regulation, which prevent the full exploitation of LLMs to enhance zero-shot NLVL performance. To address these issues, we propose BTDP, an innovative Boundary-aware Temporal Dynamic Pseudo-supervision framework for the pseudo pairs generation. Our method contains two crucial operations: 1) Boundary Segmentation that identifies both visual boundaries and semantic boundaries to generate the atomic segments and activity descriptions, tackling the issue of insensitivity issue. 2) Context Aggregation that employs the LLMs with a self-evaluation process to aggregate and summarize global video information for optimized pseudo moment-query pairs, tackling the issue of isolation and lack of regulation. Comprehensive experimental results on the Charades-STA and ActivityNet Captions datasets demonstrate the effectiveness of our BTDP method.

Boundary-aware Temporal Dynamic Pseudo-supervision pairs Generation for Zero-shot Natural Language Video Localization

Reservoir Computing (RC) models, a subclass of recurrent neural networks, are distinguished by their fixed, non-trainable input layer and dynamically coupled reservoir, with only the static readout layer being trained. This design circumvents the issues associated with backpropagating error signals through time, thereby enhancing both stability and training efficiency. RC models have been successfully applied across a broad range of application domains. Crucially, they have been demonstrated to be universal approximators of time-invariant dynamic filters with fading memory, under various settings of approximation norms and input driving sources.

Simple Cycle Reservoirs (SCR) represent a specialized class of RC models with a highly constrained reservoir architecture, characterized by uniform ring connectivity and binary input-to-reservoir weights with an aperiodic sign pattern. For linear reservoirs, given the reservoir size, the reservoir construction has only one degree of freedom -- the reservoir cycle weight. Such architectures are particularly amenable to hardware implementations without significant performance degradation in many practical tasks. In this study we endow these observations with solid theoretical foundations by proving that SCRs operating in real domain are universal approximators of time-invariant dynamic filters with fading memory. Our results supplement recent research showing that SCRs in the complex domain can approximate, to arbitrary precision, any unrestricted linear reservoir with a non-linear readout. We furthermore introduce a novel method to drastically reduce the number of SCR units, making such highly constrained architectures natural candidates for low-complexity hardware implementations. Our findings are supported by empirical studies on real-world time series datasets.

Universality of Real Minimal Complexity Reservoir

Mental disorders, such as anxiety and depression, have become a global issue that affects the regular lives of people across different ages. Without proper detection and treatment, anxiety and depression can hinder the sufferer’s study, work, and daily life. Fortunately, the fast advancement of digital and AI technologies provides new opportunities for better mental health care and many efforts have been made in developing automatic anxiety and depression detection techniques. However, this field still lacks a publicly available large-scale
dataset that can facilitate the development and evaluation of AI-based techniques. To address this limitation, we have constructed a new large-scale Multi-Modal Psychological assessment corpus (MMPsy) on anxiety and depression assessment of Mandarin-speaking adolescents. The MMPsy contains audios and extracted transcripts of responses from automated anxiety/depression assessment interviews along with the self-reported anxiety/depression evaluations of the participants using standard psychological assessment questionnaires. Our dataset contains 7,758 post-processed recordings of interviews for anxiety assessment and 4,266 recordings for depression assessment. Using this dataset, we have developed a novel deep-learning-based mental disorder estimation model, named
Mental-Perceiver, to detect anxious/depressive mental states given recorded audio and transcripts data. Extensive experiments on our MMPsy and the commonly-used DAIC-WOZ datasets have shown the effectiveness and superiority of our proposed Mental-Perceiver model in anxiety and depression detection. The MMPsy dataset and model will be released with the acceptance of our work.

Mental-Perceiver: Audio-textual Multi-modal Learning for Estimating Mental Disorders

Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade the performance of downstream tasks like summarization. This issue is particularly pronounced in clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning is scarce, necessitating the use of ASR models as black-box solutions. Employing conventional data augmentation for enhancing the noise robustness of summarization models is not feasible either due to the unavailability of sufficient medical dialogue audio recordings and corresponding ASR transcripts. To address this challenge, we propose \frameworkname{}, an approach for generating synthetic samples for data augmentation using Large Language Models (LLMs). Specifically, we leverage the in-context learning capabilities of LLMs and instruct them to generate ASR-like errors based on a few available medical dialogue examples with audio recordings. Experimental results show that LLMs can effectively model ASR noise, and incorporating this noisy data into the training process significantly improves the robustness and accuracy of medical dialogue summarization systems. This approach addresses the challenges of noisy ASR outputs in critical applications, offering a robust solution to enhance the reliability of clinical dialogue summarization.

MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues

The capability of In-Context Learning (ICL) is crucial for large language models to generalize across a wide range of tasks. By utilizing prompts, these models can accurately predict outcomes for previously unseen tasks without necessitating retraining. However, this generalization ability does not extend to the length of the inputs; the effectiveness of ICL likely diminishes with excessively long inputs, resulting in errors in the generated text. To investigate this issue, we propose a study using a dataset of In-Context functions to understand the operational mechanisms of Transformer models in ICL and length generalization. We generated data using regression and Boolean functions and employed meta-learning techniques to endow the model with ICL capabilities. Our experimental results indicate that position encodings can significantly mitigate length generalization issues, with the most effective encoding extending the maximum input length to over eight times that of the original training length. However, further analysis revealed that while position encoding enhances length generalization, it compromises the model's inherent capabilities, such as its ability to generalize across different data types. Overall, our research illustrates that position encodings have a pronounced positive effect on length generalization, though it necessitates a careful trade-off with data generalization performance.

Premium content

Next from AAAI 2025

Trustworthy Graph Neural Networks Through Rank-Based Conformal Prediction

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES