United States

Talking face generation (TFG), a pivotal technology for digital human creation, allows for producing lifelike talking videos of any character using only facial images and accompanying text. Despite the urgent necessity for targeted detection methods, research in this field has been hindered by the lack of public datasets. In this paper, we construct the first large-scale multi-scenario talking face dataset (MSTF), which contains 22 audio and video forgery techniques, filling the gap of datasets in this field. The dataset covers 11 generation scenarios and more than 20 semantic scenarios, closer to the practical application scenario of TFG. Besides, we also propose a TFG detection framework, which leverages the analysis of both global and local coherence in the multimodal content of TFG videos. Therefore, a region-focused smoothness detection module (RSFDM) and a discrepancy capture-time frame aggregation module (DCTAM) are introduced to evaluate the global temporal coherence of TFG videos, aggregating multi-grained spatial information. Additionally, a visual-audio fusion module (V-AFM) is designed to evaluate audiovisual coherence within a localized temporal perspective. Comprehensive experiments demonstrate the reasonableness and challenges of our datasets, while also indicating the superiority of our proposed method compared to the state-of-the-art deepfake detection approaches.

AAAI 2025

GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection

technical paper

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



We study the first-order definability of progression for situation calculus action theories with a focus on the iterability of progression. Progression, the task of updating a knowledge base according to actions' effect so that proper information is retained, in the situation calculus is notoriously challenging as it in general requires second-order(SO) logic. Exceptions where progression is first-order like local-effect actions and normal actions impose certain syntax constraints on action theories to eliminate second-order quantifiers in the progressed knowledge base. Unfortunately, the progressed result might not satisfy the constraints again, making it impossible to apply first-order progression iteratively. In this paper, we first lift the existing result on first-order progression for normal actions by allowing disjunctions in the knowledge base. As a result, we obtain an action theory whose type is called disjunctive normal, which is iteratively first-order progressable. Second, we propose a new class of action theories, called PANACK, that strictly subsumes the disjunctive normal ones, and we show that it remains iteratively first-order progressable as well.

On Action Theories with Iterable First-Order Progression

We initiate the study of matching roommates and rooms wherein the preferences of agents over other agents and rooms are complementary and represented by Leontief utilities. In this setting, 2n agents must be paired up and assigned to n rooms. Each agent has cardinal valuations over the rooms as well as compatibility values over all other agents. Under Leontief preferences, an agent’s utility for a matching is the minimum of the two values. We focus on the tradeoff between maximizing utilitarian social welfare and strategyproofness. Our main result shows that—in a stark contrast to the additive case— under binary Leontief utilities, there exist strategyproof mechanisms that maximize the social welfare. We further devise a strategyproof mechanism that implements such a welfare maximizing algorithm and is parameterized by the number of agents. Along the way, we highlight several possibility and impossibility results, and give upper bounds and lower bounds for welfare with or without strategyproofness.

Strategyproof Matching of Roommates and Rooms

By integrating external knowledge, Retrieval-Augmented Generation (RAG) has become an effective strategy for mitigating the hallucination problems that large language models (LLMs) encounter when dealing with knowledge-intensive tasks. However, in the process of integrating external non-parametric supporting evidence with internal parametric knowledge, inevitable knowledge conflicts may arise, leading to confusion in the model's responses. To enhance the knowledge selection of LLMs in various contexts, some research has focused on refining their behavior patterns through instruction-tuning. Nonetheless, due to the absence of explicit negative signals and comparative objectives, models fine-tuned in this manner may still exhibit undesirable behaviors such as contextual ignorance and contextual overinclusion. To this end, we propose a Knowledge-aware Preference Optimization strategy, dubbed KnowPO, aimed at achieving adaptive knowledge selection based on contextual relevance in real retrieval scenarios. Concretely, we proposed a general paradigm for constructing knowledge conflict datasets, which comprehensively cover various error types and learn how to avoid these negative signals through preference optimization methods. Simultaneously, we proposed a rewriting strategy and data ratio optimization strategy to address preference imbalances. Experimental results show that KnowPO outperforms previous methods for handling knowledge conflicts by over 37%, while also exhibiting robust generalization across various out-of-distribution datasets.

KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models

Image forgery detection and localization (IFDL) is of vital importance as forged images can spread misinformation that poses potential threats to our daily life. However, previous methods still struggled to effectively handle forged images processed with diverse forgery operations in real-world scenarios. In this paper, we propose a novel Reinforced Multi-teacher Knowledge Distillation (Re-MTKD) framework for the IFDL task, structured around an encoder-decoder ConvNeXt-UperNet along with Edge-Aware Module, named Cue-Net. First, three Cue-Net models are separately trained for the three main types of image forgeries, i.e., copy-move, splicing and inpainting, which then serve as the multi-teacher models to train the target student model with Cue-Net through self-knowledge distillation. A Reinforced Dynamic Teacher Selection (Re-DTS) strategy is developed to dynamically assign weights to the involved teacher models, which facilitates specific knowledge transfer and enables the student model to effectively learn both the common and specific natures of diverse tampering traces. Extensive experiments demonstrate that, compared with other state-of-the-art methods, the proposed method achieves superior performance on several recently emerged datasets comprised of various kinds of image forgeries.

Reinforced Multi-teacher Knowledge Distillation for Efficient General Image Forgery Detection and Localization

We introduce Limited Rollout Beam Search (LRBS), a beam search strategy for deep reinforcement learning (DRL) based combinatorial optimization improvement heuristics. 
Utilizing pre-trained models on the Euclidean Traveling Salesperson Problem, LRBS significantly enhances both in-distribution performance and generalization to larger problem instances, achieving optimality gaps that outperform existing improvement heuristics and narrowing the gap with state-of-the-art constructive methods.
We also extend our analysis to two pickup and delivery TSP variants to validate our results.
Finally, we employ our search strategy for offline and online adaptation of the pre-trained improvement policy, leading to improved search performance and surpassing recent adaptive methods for constructive heuristics. Our source code is available at anonymous-url.

Scaling Combinatorial Optimization Neural Improvement Heuristics with Online Search and Adaptation

There has been tremendous progress in the past decade in the
field of quantified Boolean formulas (QBF), both in practical
solving as well as in creating a theory of corresponding proof
systems and their proof complexity analysis. Both for solving
and for proof complexity, it is important to have interesting
formula families on which we can test solvers and gauge the
strength of the proof systems. There are currently few such
formula families in the literature.
We initiate a general programme how to transform computationally
hard problems (located in the polynomial hierarchy)
into QBFs hard for the main QBF resolution systems Q-Res
and QU-Res that relate to core QBF solvers. We illustrate
this general approach on three problems from graph theory
and logic. This yields QBF families that are provably hard for
Q-Res and QU-Res (without any complexity assumptions).

Computationally Hard Problems Are Hard for QBF Proof Systems Too

We introduce Neural Conjugate Flows (NCF), a class of neural-network architectures equipped with exact flow structure. By leveraging topological conjugation, we prove that these networks are not only naturally isomorphic to a continuous group, but are also universal approximators for flows of ordinary differential equation (ODEs). Furthermore, topological properties of these flows can be enforced by the architecture in an interpretable manner. We demonstrate in numerical experiments how this topological group structure leads to concrete computational gains over other physics informed neural networks in estimating and extrapolating latent dynamics of ODEs, while training up to five times faster than other flow-based architectures.

Neural Conjugate Flows: A Physics-Informed Architecture with Flow Structure

Gaussian Splatting (GS) has emerged as a crucial technique for representing discrete volumetric radiance fields. It leverages unique parametrization to mitigate computational demands in scene optimization. This work introduces Topology-Aware 3D Gaussian Splatting (Topology-GS), which addresses two key limitations in current approaches: compromised pixel-level structural integrity due to incomplete initial geometric coverage, and inadequate feature-level integrity from insufficient topological constraints during optimization. To overcome these limitations, Topology-GS incorporates a novel interpolation strategy, Local Persistent Voronoi Interpolation (LPVI), and a topology-focused regularization term based on persistent barcodes, named PersLoss. LPVI utilizes persistent homology to guide adaptive interpolation, enhancing point coverage in low-curvature areas while preserving topological structure. PersLoss aligns the visual perceptual similarity of rendered images with ground truth by constraining distances between their topological features. Comprehensive experiments on three novel-view synthesis benchmarks demonstrate that Topology-GS outperforms existing methods in terms of PSNR, SSIM, and LPIPS metrics, while maintaining efficient memory usage. This study pioneers the integration of topology with 3D-GS, laying the groundwork for future research in this area.

Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity

Deep reinforcement learning (DRL) has achieved remarkable success in various domains, yet its reliance on neural networks results in a lack of transparency, which limits its practical applications in safety-critical and human-agent interaction domains. Decision trees, known for their notable explainability, have emerged as a promising alternative to neural networks. However, decision trees often struggle in long-horizon continuous control tasks with high-dimensional observation space due to their limited expressiveness. To address this challenge, we propose SkillTree, a novel hierarchical framework that reduces the complex continuous action space of challenging control tasks into discrete skill space. By integrating the differentiable decision tree within the high-level policy, SkillTree generates diecrete skill embeddings that guide low-level policy execution. Furthermore, through distillation, we obtain a simplified decision tree model that improves performance while further reducing complexity. Experiment results validate SkillTree's effectiveness across various robotic manipulation tasks, providing clear skill-level insights into the decision-making process. The proposed approach not only achieves performance comparable to neural network based methods in complex long-horizon control tasks but also significantly enhances the transparency and explainability of the decision-making process.

SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks

Time Series Supplier Allocation (TSSA) poses a complex NP-hard challenge, aimed at refining future order dispatching strategies to satisfy order demands with maximum supply efficiency fully. 
Traditionally derived from financial portfolio management, the Black-Litterman (BL) model offers a new perspective for the TSSA scenario by balancing expected returns against insufficient supply risks. However, its application within TSSA is constrained by the reliance on manually constructed perspective matrices and spatio-temporal market dynamics, coupled with the absence of supervisory signals and data unreliability inherent to supplier information.
To solve these limitations, we introduce the pioneering Deep Black-Litterman Model (DBLM), which innovatively adapts the BL model from financial roots to supply chain context. Leveraging the Spatio-Temporal Graph Neural Networks (STGNNs), DBLM automatically generates future perspective matrices for TSSA, by integrating spatio-temporal dependency. 
Moreover, a novel Spearman rank correlation distinctively supervises our approach to address the lack of supervisory signals, specifically designed to navigate through the complexities of supplier risks and interactions.
This is further enhanced by a masking mechanism aimed at counteracting the biases from unreliable data, thereby improving the model’s precision and reliability. Extensive experimentation on two datasets unequivocally demonstrates DBLM's~ enhanced performance in TSSA, setting new standards for the field. Our findings and methodology are made available for community access and further development at https://anonymous.4open.science/r/DBLM-7978/.

Premium content

Next from AAAI 2025

On Action Theories with Iterable First-Order Progression

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES