United States

In real life, many dynamic events, such as major disasters and large-scale sports events, evolve continuously over time. Obtaining an overview of these events can help people quickly understand the situation and respond more effectively. This is challenging because the key information of the event is often scattered across multiple documents, involving complex event knowledge understanding and reasoning, which is under-explored in previous work. 
Therefore, we proposed the Event-Centric Multi-Document Summarization task, which aims to generate concise and comprehensive summaries of a given event based on multiple related news documents. Based on this, we constructed the **EventSum** dataset, which was constructed using Baidu Baike entries and underwent extensive human annotation, to facilitate relevant research. It is the first large-scale Chinese multi-document summarization dataset, containing 5,100 events and a total of 57,984 news documents, with an average of 11.4 input news documents and 13,471 characters per event. To ensure data quality and mitigate potential data leakage, we adopted a multi-stage annotation approach for manually labeling the test set. Given the complexity of event-related information, existing metrics struggle to comprehensively assess the quality of generated summaries. We designed specific metrics including Event Recall, Argument Recall, Causal Recall, and Temporal Recall along with corresponding calculation methods for evaluation. We conducted comprehensive experiments on EventSum to evaluate the performance of advanced long-context Large Language Models (LLMs) on this task. 
Our experimental results indicate that: 1) The event-centric multi-document summarization task remains challenging for existing long-context LLMs; 2) The recall metrics we designed are crucial for evaluating the comprehensiveness of the summary information. Relevant data and code will be released after review.

AAAI 2025

EventSum: A Large-Scale Event-Centric Summarization Dataset for Chinese Multi-News Documents

snlp

summarization

poster

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



Image restoration, which reconstructs high-quality images from degraded ones, has been extensively studied using various datasets in computer vision.
However, existing datasets are based on images captured with professional equipment such as digital cameras and action cameras.
This limits their applicability to everyday situations, as they focus on specialized scenarios.
This paper focuses on the degradation of smartphone camera lenses and proposes a dataset of degraded images commonly encountered in daily smartphone use.
To simulate realistic image degradation, we employed a novel approach of applying physical degradation directly to the source.
Furthermore, through experiments with existing image restoration models, we demonstrated that our dataset is extremely challenging.
\fi
Smartphone cameras are ubiquitous in daily life, yet their performance can be severely impacted by dirty lenses, leading to degraded image quality. 
This issue is often overlooked in image restoration research, which typically focuses on ideal or controlled conditions. 
To address this gap, we introduce SIDL (Smartphone Images with Dirty Lenses), a novel dataset designed specifically for the task of restoring images captured through contaminated smartphone lenses. 
SIDL contains a diverse collection of real-world images taken under various lighting conditions and environments. 
These images feature a wide range of lens contaminants, including water drops, fingerprints, and dust. 
Each contaminated image is paired with a clean reference image, enabling supervised learning approaches for restoration tasks.

To evaluate the challenge posed by SIDL, we trained and compared the performance of various state-of-the-art restoration models on this dataset. 
Our results indicate that while these models can achieve some level of restoration, the diverse and realistic nature of the contaminations in SIDL presents significant challenges that are not adequately addressed by existing methods. 
This highlights the need for more robust and adaptable image restoration techniques.

SIDL: A Real-World Dataset for Restoring Smartphone Images with Dirty Lenses

In this study, we focus on $\textit{heterogeneous}$ knowledge transfer across entirely different model architectures, tasks, and modalities. Existing knowledge transfer methods ($\textit{e.g.}$, backbone sharing, knowledge distillation) often hinge on shared elements within model structures or task-specific features/labels, limiting transfers to complex model types or tasks. To overcome these challenges, we present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models, facilitating the direct interaction, extraction, and application of knowledge within these parameter spaces. The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters and adeptly learning to identify and map parameters into the target model. MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage, including the training trajectory knowledge of the source model. Extensive experiments on heterogeneous knowledge transfer demonstrate significant improvements in challenging settings, where representative approaches may falter or prove less applicable.

MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities

In Hotelling's model of spatial competition, a unit mass of voters is distributed in the interval $[0,1]$ (with their location corresponding to their political persuasion), and each of $m$ candidates selects as a strategy their distinct position in this interval. Each voter votes for the nearest candidate, and candidates choose their strategy to maximize their votes. It is known that if there are more than two candidates, equilibria may not exist in this model. It was unknown, however, how close to an equilibrium one could get. Our work studies approximate equilibria in this model, where a strategy profile is an (additive) $\epsilon$-equilibria if no candidate can increase their votes by $\epsilon$, and provides tight or nearly-tight bounds on the approximation $\epsilon$ achievable.

We show that for 3 candidates, for any distribution of the voters, $\epsilon \ge 1/12$. Thus, somewhat surprisingly, for any distribution of the voters and any strategy profile of the candidates, at least $1/12$th of the total votes is always left ``on the table.'' Extending this, we show that in the worst case, there exist voter distributions for which $\epsilon \ge 1/6$, and this is tight: one can always compute a $1/6$-approximate equilibria in polynomial time. We then study the general case of $m$ candidates, and show that as $m$ grows large, we get closer to an exact equilibrium: one can always obtain a $1/(m+1)$-approximate equilibria in polynomial time. We show this bound is asymptotically tight, by giving voter distributions for which $\epsilon \ge 1/(m+3)$.

Nearly Tight Bounds on Approximate Equilibria in Spatial Competition on the Line

Recent advances in text-to-image diffusion models have spurred research on personalization, particularly focusing on customized image synthesis of subjects within reference images. Although existing personalization methods can modify the subjects' positions or personalize multiple subjects simultaneously, they often struggle with altering the behaviors of subjects or their dynamic interactions. This challenge arises from overfitting to reference images, which becomes more problematic when only a single reference image is available. To address these challenges, we propose DynASyn, an effective multi-subject personalization method that works from a single reference image. DynASyn preserves subject identity during the personalization process by aligning concept-based priors with subject appearances and actions. This alignment is achieved by regularizing the attention maps between the subject token and images through concept-based priors. Furthermore, we introduce concept-based prompt-and-image augmentation to enhance the trade-off between identity preservation and action diversity. We also propose an SDE-based editing technique guided by augmented prompts to generate diverse appearances and actions while maintaining identity consistency in the augmented images. Experiments demonstrate that DynASyn is capable of synthesizing highly realistic images of subjects in novel contexts with dynamic interactions with their surroundings and outperforms baseline methods in both quantitative and qualitative aspects.

DynASyn: Multi-Subject Personalization Enabling Dynamic Action Synthesis

LLM have achieved success in many fields but still troubled by problematic content in the training corpora. LLM unlearning aims at reducing their influence and avoid undesirable behaviours. However, existing unlearning methods remain vulnerable to adversarial queries and the unlearned knowledge resurfaces after the manually designed attack queries. As part of a red-team effort to proactively assess the vulnerabilities of unlearned models, we design \textbf{D}ynamic \textbf{U}nlearning \textbf{A}ttack  (\textbf{DUA}), a dynamic and automated framework to attack these models and evaluate their robustness. It optimizes adversarial suffixes to reintroduce the unlearned knowledge in various scenarios. We find that unlearned knowledge can be recovered in $55.2\%$  of the questions, even without revealing the unlearned model's parameters. In response to this vulnerability, we propose \textbf{L}atent \textbf{A}dversarial \textbf{U}nlearning (\textbf{LAU}), a universal framework that effectively enhances the robustness of the unlearned process. It formulates the unlearning process as a min-max optimization problem and resolves it through two stages: an attack stage, where perturbation vectors are trained and added to the latent space of LLMs to recover the unlearned knowledge, and a defense stage, where previously trained perturbation vectors are used to enhance unlearned model's robustness. With our LAU framework, we obtain two robust unlearning methods, \textbf{AdvGA} and \textbf{AdvNPO}. We conduct extensive experiments across multiple unlearning benchmarks and various models, and demonstrate that they improve the unlearning effectiveness by over $53.5\%$, cause only less than a $11.6\%$ reduction in neighboring knowledge, and have almost no impact on the model's general capabilities.

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models

Anomaly detection aims to identify deviations from normal patterns within data. This task is particularly crucial in dynamic graphs, which are common in applications like social networks and cybersecurity, due to their evolving structures and complex relationships. Although recent deep learning-based methods have shown promising results in anomaly detection on dynamic graphs, they often lack of generalizability. 
In this study, we propose GeneralDyG, a method that samples temporal ego-graphs and sequentially extracts structural and temporal features to address the three key challenges in achieving generalizability: Data Diversity, Dynamic Feature Capture, and Computational Cost.  Extensive experimental results demonstrate that our proposed GeneralDyG significantly outperforms state-of-the-art methods on four real-world datasets.

A Generalizable Anomaly Detection Method in Dynamic Graphs

Recently, a number of effective methods have been proposed to tackle the challenging task of Few-Shot Fine-Grained Image Classification (FS-FGIC). However, how to fully leverage the backbone network to discover and extract detailed features to generate more discriminative class prototypes, as well as how to accurately model the similarity relationship between query samples and the class prototypes, are still issues to be further considered. Therefore, we propose a novel progreSsively featUre refInement and conTinuous rElationship moDeling method, SUITED for short, to address these two issues existing in the State-of-the-Art FS-FGIC methods. Specifically, we design the Progressive Feature Refinement Module (PFRM) to fully exploit the backbone network's progressive feature extraction capabilities, forming multi-scale feature representations to further enhance discriminative features. Then, the Continuous Relationship Modeling Module (CRMM) is proposed to capture the dependencies between query samples and the corresponding class prototypes, achieving precise optimization of the distances among corresponding sample points in the feature space. We conducted extensive experiments on five fine-grained benchmark datasets, and the experimental results demonstrate that the proposed method is comprehensively ahead of the existing State-of-the-Art methods.

Few-Shot Fine-Grained Image Classification with Progressively Feature Refinement and Continuous Relationship Modeling

Recent advancements in open-source code large language models (LLMs) have been driven by fine-tuning on the data generated from powerful closed-source LLMs, which are expensive to obtain. This paper explores whether it is possible to use a fine-tuned open-source model to generate additional data to augment its instruction-tuning dataset. We make two observations: (1) A code snippet can serve as the response to different instructions. (2) Instruction-tuned code LLMs perform better at translating code into instructions than the reverse.
Based on these observations, we propose Inverse-Instruct, a data augmentation technique that uses a fine-tuned LLM to generate additional instructions of code responses from its own training dataset. The additional instruction-response pairs are added to the original dataset, and a stronger code LLM can be obtained by fine-tuning on the augmented dataset. We empirically validate Inverse-Instruct on a range of open-source code models (e.g. CodeLlama-Python and DeepSeek-Coder) and benchmarks (e.g., HumanEval(+), MBPP(+), DS-1000 and MultiPL-E), showing it consistently improves the base models.

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

Multi-modal Federated Learning (MFL) is a distributed machine learning paradigm that enables multiple participants with multi-modal data to collaboratively train a global model for multi-modal tasks without sharing their local data.
MFL typically deploys the trained global model as an EaaS, allowing participants to obtain embeddings for downstream tasks. However, it increases the risk of unauthorized copying and leakage of the model.
Protecting the ownership of the MFL global model while maintaining model performance is challenging.
In this paper, we propose the first general model ownership protection framework for MFL, named MFL-Owner. 
MFL-Owner decouples the watermark embedding process from the model training process and addresses both ownership verification and traceability, effectively safeguarding the interests of the MFL collective.
MFL-Owner leverages the concept of orthogonal transformations by incorporating a linear transformation matrix with orthogonal constraints into the model, achieving high-quality ownership verification and traceability with minimal impact on model performance.
To enhance the practicality of the watermark and prevent conflicts among multiple clients during tracing, we propose a trigger dataset selection method based on out-of-distribution data combined with Gaussian noise perturbation.
Our experiments on multiple datasets demonstrate that MFL-Owner is effective for model ownership verification and traceability for MFL.
Our code is available at https://anonymous.4open.science/r/MFL-Owner-2A91.

MFL-Owner: Ownership Protection for Multi-modal Federated Learning via Orthogonal Transform Watermark

We consider the contextual combinatorial bandit setting where in each round, the learning agent, e.g., a recommender system, selects a subset of "arms,'' e.g., products, and observes rewards for both the individual base arms, which are a function of known features (called "context''), and the super arm (the subset of arms), which is a function of the base arm rewards. The agent's goal is to simultaneously learn the unknown reward functions and choose the highest-reward arms. For example, the "reward'' may represent a user's probability of clicking on one of the recommended products. Conventional bandit models, however, employ restrictive reward function models in order to obtain performance guarantees. We make use of deep neural networks to estimate and learn the unknown reward functions and propose Neural UCB Clustering (NeUClust), which adopts a clustering approach to select the super arm in every round by exploiting underlying structure in the context space. Unlike prior neural bandit works, NeUClust uses a neural network to estimate the super arm reward and select the super arm, thus eliminating the need for a known optimization oracle. We non-trivially extend prior neural combinatorial bandit works to prove that NeUClust achieves $\widetilde{O}(\widetilde{d}\sqrt{T})$ regret, where $\widetilde{d}$ is the effective dimension of a neural tangent kernel matrix, $T$ the number of rounds. Experiments on real world recommendation datasets show that NeUClust achieves better regret and reward than other contextual combinatorial and neural bandit algorithms.

Premium content

Next from AAAI 2025

SIDL: A Real-World Dataset for Restoring Smartphone Images with Dirty Lenses

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES