United States

3D semantic scene completion is critical for multiple downstream tasks in autonomous systems. It estimates  missing geometric and semantic information in the acquired scene data. Due to the challenging real-world conditions, this task usually demands complex  models processing multi-modal data to achieve acceptable performance. We proposes a unique neural model leveraging advances from the state space and diffusion generative modeling to achieve remarkable 3D semantic scene completion performance with monocular image input. Our technique processes the data in a conditioned latent space of a variational autoencoder where  diffusion modeling is carried out with an innovative state space technique. Key component of our neural network is the proposed Skimba (Skimba) denoiser, which is   adept at efficient processing of long-sequence data. Meticulously designed using  concepts such as triple Mamba structure, dimensional decomposition residuals and varying dilations along three directions, Skimba diffusion model forms an integral part of our 3D scene completion network. We also adopt a variant of this network for the subsequent semantic segmentation stage of our technique. Extensive evaluation on the standard SemanticKITTI and SSCBench-KITTI360 datasets show that our approach not only outperforms other monocular techniques by a large margin, it also achieves competitive performance against stereo methods. We will release our model and code.

AAAI 2025

Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion

poster

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



The iterative sampling procedure employed by diffusion models (DMs) often leads to significant latency. To address this, we propose Stochastic Consistency Distillation (SCott) to enable accelerated text-to-image generation, where high-quality generations can be achieved with just 2-4 sampling steps or even1 step, and further improvements can be obtained by additional cost, e.g., 4 steps. In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pre-trained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher. SCott is augmented with elaborate strategies to control the noise strength and sampling process of the SDE solver. An adversarial loss is further incorporated to strengthen the sample quality with rare sampling steps. Empirically, on the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID of 21.9, surpassing that of the 1-step InstaFlow (23.4) and the 4-step UFOGen (22.1). Moreover, SCott can yield more diverse samples than other consistency models for high-resolution image generation, with up to 16% improvement in a qualified metric.

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

Current self-supervised methods, such as contrastive learning, predominantly focus on global discrimination, neglecting the critical fine-grained anatomical details required for accurate radiographic analysis. To address this challenge, we propose the Anatomy-driven self-supervised framework for enhancing Fine-grained Representation in radiographic image analysis (AFiRe). The core idea of AFiRe is to align the anatomical consistency with the unique token-processing characteristics of Vision Transformer. Specifically, AFiRe synergistically performs two self-supervised schemes: (i) Token-wise anatomy-guided contrastive learning, which aligns image tokens based on structural and categorical consistency to enhance fine-grained spatial-anatomical discrimination; (ii) Pixel-level anomaly-removal restoration, which particularly focuses on local anomalies, thereby refining the learned discrimination with detailed geometrical information. Additionally, we propose the Synthetic Lesion Mask to enhance anatomical diversity while preserving intra-consistency, which is typically corrupted by traditional data augmentations, such as Cropping and Affine transformations. Experimental results show that AFiRe: (i) provides robust anatomical discrimination, achieving more cohesive feature clusters compared to state-of-the-art contrastive learning methods; (ii) demonstrates superior generalization, surpassing 7 radiography-specific self-supervised methods in multi-label classification tasks with limited labeling; and (iii) integrates fine-grained information, enabling precise anomaly detection using only image-level annotations.

AFiRe: Anatomy-Driven Self-Supervised Learning for Fine-Grained Representation in Radiographic Images

Language-based localization is a crucial task in robotics and computer vision, enabling robots to understand spatial positions through language. Recent methods rely on contrastive learning to establish correspondences between global features of texts and point clouds. However, the inherent ambiguity of textual descriptions makes it difficult to convey geometric information accurately, forcing alignment of them in the feature space may compromise the expressiveness of the point clouds. Unlike previous methods, this paper proposes using language as a filter to distinguish dissimilar locations. To this end, we propose a robust framework of multi-level negative contrastive learning for language-based localization, fully leveraging the descriptive power of language for spatial localization. Our method learns multiple mismatched factors by minimizing the similarity of different locations at different levels, including global-level, instance-level and relation-level，respectively. Extensive experiments conducted on the KITTI360Pose benchmark demonstrate that our method out-
performs better that the state-of-the-art methods. Specifically, we achieve a 56.3% improvement in Top-1 retrieval recall rate and a 45.9% increase in 5-meter localization accuracy recall rate. Our code will be released upon acceptance.

Text to Point Cloud Localization with Multi-Level Negative Contrastive Learning

Existing knowledge distillation (KD) methods have demonstrated their ability in achieving student network performance on par with their teachers. However, the knowledge gap between the teacher and student remains significant and may hinder the effectiveness of the distillation process. In this work, we introduce
 the
structure of Neural Collapse (NC) into the KD framework. NC typically occurs in the
final phase of training, resulting in a graceful geometric structure where the last-layer features form a simplex equiangular tight frame. Such phenomenon has improved the generalization of deep network training. 
We hypothesize that NC can also alleviate the knowledge gap in distillation, thereby enhancing student performance. This paper begins with an empirical analysis to bridge the connection between knowledge distillation and neural collapse. Through this analysis, we establish that transferring the teacher's NC structure to the student benefits the distillation process. Therefore, instead of merely transferring instance-level logits or features, as done by existing distillation methods, we encourage students to learn the teacher's NC structure. Thereby, we propose a 
new distillation paradigm termed Neural Collapse-inspired Knowledge Distillation (NCKD). Comprehensive experiments demonstrate that NCKD is simple yet effective, 
improving the generalization of all distilled student models and achieving state-of-the-art accuracy performance.

Neural Collapse Inspired Knowledge Distillation

Hypergraphs are powerful mathematical structures that can model complex, high-order relationships in various domains, including social networks, bioinformatics, and recommender systems. However, generating realistic and diverse hypergraphs remains challenging due to their inherent complexity and lack of effective generative models. In this paper, we introduce a diffusion-based Hypergraph Generation (HYGENE) method that addresses these challenges through a progressive local expansion approach. HYGENE works on the bipartite representation of hypergraphs, starting with a single pair of connected nodes and iteratively expanding it to form the target hypergraph. At each step, nodes and hyperedges are added in a localized manner using a denoising diffusion process, which allows for the construction of the global structure before refining local details. Our experiments demonstrated the effectiveness of HYGENE, proving its ability to closely mimic a variety of properties in hypergraphs. To the best of our knowledge, this is the first attempt to employ deep learning models for hypergraph generation, and our work aims to lay the groundwork for future research in this area.

HYGENE: A Diffusion-Based Hypergraph Generation Method

A market maker is a specialist who provides liquidity by continuously offering bid and ask quotes for a financial asset. The market maker’s objective is to maximize profit while avoiding the accumulation of a large position in the asset to control inventory risk. To achieve model-free results, online learning has been applied to design market-making strategies that make no assumptions on the dynamics of the limit order book and asset price. However, existing work primarily focuses on profit rather than inventory risk. To address this limitation, this paper develops market-making strategies with inventory constraints within the online learning framework. To manage inventory risk, we propose two classes of market-making strategies with fixed bid-ask spreads that serve as reference strategies. Each reference strategy can ensure that the inventory remains under control, which enables the online learning algorithms designed for each class of reference strategies to satisfy inventory constraints. Different from the standard online learning model where the gain in each period is assumed to lie within a fixed bounded interval, the gain in our model depends on a state variable (i.e., the inventory size). Thus, a key challenge in analyzing the regret bounds is to bound the difference between the gains of any two reference strategies, which becomes significantly more complicated compared with scenarios without inventory constraints. By tackling these difficulties, we show that these algorithms achieve low regrets. Experimental results illustrate the superior performance of our algorithms in inventory risk control.

Adaptive Market Making with Inventory Constraints via Online Learning

Edit-based approaches for Grammatical Error Correction (GEC) have attracted volume attention due to their outstanding explanations of the correction process and rapid inference. Through our exploring the characteristics of the generalized and specific knowledge learned for GEC, we discover efficiently training GEC system with satisfactory generalization capacity prefers more generalized knowledge rather than specific knowledge. Current gradient-based methods for training GEC system, however, usually prioritize minimizing training loss over generalization loss. This paper proposes the strategy of Adjusting Learning Rate Based on Memory Rate to optimize of Adjusting Learning Rate Based on Memory Rate to optimize the edit-based GEC scorer (ALRMR-GEC). Specifically, we introduce the memory rate, a novel metric, to provide an explicit indicator for the model’s state of learning generalized and specific knowledge, which can effectively guide GEC system to timely adjust the learning rate. Extensive experiments, conducted by optimizing the published editscorer (Sorokin 2022) on BEA2019 dataset, have shown our ALRMR-GEC significantly enhances the model generalization ability with stable and satisfactory performance irrespective of the initial learning rate selection. Also, our method can accelerate the training over tenfold faster in certain cases. Finally, the experiments indicate the memory rate induced in our ALRMR-GEC surpasses the gradient as a more informative metric to guide the GEC editscorer to learn more generalized knowledge. Our code will be released as soon as possible.

ALRMR-GEC: Adjusting Learning Rate Based on Memory Rate to Optimize the Edit Scorer for Grammatical Error Correction

The insufficient generalization of adaptive moment estimation (Adam) has hindered its broader application. Recent studies have shown that flat minima in loss landscapes are highly associated with improved generalization.  Inspired by the filtering effect of integration operations on high-frequency signals, we propose multiple integral Adam (MIAdam), a novel optimizer that integrates a multiple integral term into Adam. This multiple integral term effectively filters out sharp minima encountered during optimization, guiding the optimizer towards flatter regions and thereby enhancing generalization capability. We provide a theoretical explanation for the improvement in generalization through the diffusion theory framework and analyze the impact of the multiple integral term on the optimizer's convergence. Experimental results demonstrate that MIAdam not only enhances generalization and robustness against label noise but also maintains the rapid convergence characteristic of Adam, outperforming Adam and its variants in state-of-the-art benchmarks.

A Method for Enhancing Generalization of Adam by Multiple Integrations

In this work, we extend the concept of the $p$-mean welfare objective from social choice theory (Moulin 2004) to study $p$-mean regret in stochastic multi-armed bandit problems. The $p$-mean regret, defined as the difference between the optimal mean among the arms and the $p$-mean of the expected rewards, offers a flexible framework for evaluating bandit algorithms, enabling algorithm designers to balance fairness and efficiency by adjusting the parameter $p$. Our framework encompasses both average cumulative regret and Nash regret as special cases.

We introduce a simple, unified UCB-based algorithm (Explore-Then-UCB) that achieves novel $p$-mean regret bounds. Our algorithm consists of two phases: a carefully calibrated uniform exploration phase to initialize sample means, followed by the UCB1 algorithm of (Auer et al. 2002). Under mild assumptions, we prove that our algorithm achieves a $p$-mean regret bound of $\tilde{O}\left(\sqrt{\frac{k}{T^{\frac{1}{2|p|}}}}\right)$ for all $p \leq -1$, where $k$ represents the number of arms and $T$ the time horizon. When $-1<p<0$, we achieve a regret bound of $\tilde{O}\left(\sqrt{\frac{k^{1.5}}{T^{\frac{1}{2}}}}\right)$. For the range $0< p \leq 1$, we achieve a $p$-mean regret scaling as $\tilde{O}\left(\sqrt{\frac{k}{T}}\right)$, which matches the previously established lower bound up to logarithmic factors (Auer et al. 1995). This result stems from the fact that the $p$-mean regret of any algorithm is at least its average cumulative regret for $p \leq 1$.
In the case of Nash regret (the limit as $p$ approaches zero), our unified approach differs from prior work (Barman et al. 2023), which requires a new Nash Confidence Bound algorithm. Notably, we achieve the same regret bound up to constant factors using our more general method.

p-Mean Regret for Stochastic Bandits

Knowledge distillation transfers "dark knowledge" from a large teacher model to a smaller student model, yielding a highly efficient network. 
To improve network's generalization ability, existing works use a larger temperature coefficient for knowledge distillation. Nevertheless, these methods may lower the target category's confidence and lead to ambiguous recognition of similar samples. To mitigate this issue, some studies introduce intra-batch distillation to reduce prediction discrepancy. However, these methods overlook the inconsistency between background information and the target category, which may increase prediction bias due to noise disturbance. Additionally, label imbalance from random sampling and batch size can undermine network generalization reliability. To tackle these challenges, we propose a simple yet effective Intra-class Knowledge Distillation (IKD) method that facilitates knowledge sharing within the same class to ensure consistent predictions. First, we initialize the matrix and the vector to store logits and class counts provided by the teacher, respectively. Then, in the first epoch, we calculate the sum of logits and sample counts per class and perform KD to prevent knowledge omission. Finally, in subsequent training, we update the matrix to obtain the average logits and compute the KL divergence between the student's output and the updated matrix according to the label index. This process ensures intra-class consistency and improves the student's performance. Furthermore, this method theoretically reduces prediction bias by ensuring intra-class consistency. Extensive experiments on the CIFAR-100, ImageNet-1K, and Tiny-ImageNet datasets validate the superiority of IKD. The code will be made publicly available.

Premium content

Next from AAAI 2025

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES