United States

Multi-camera 3D object detection aims to detect and localize objects in 3D space using multiple cameras, which has attracted more attention due to its cost-effectiveness trade-off. However, these methods often struggle with the lack of accurate depth estimation caused by the natural weakness of the camera in ranging. Recently, multi-modal fusion and knowledge distillation methods for 3D object detection have been proposed to solve this problem, which are time-consuming during the training phase and not friendly to memory cost. In light of this, we propose PromptDet, a lightweight yet effective 3D object detection framework motivated by the success of prompt learning in 2D foundation model. Our proposed framework, PromptDet, comprises two integral components: a general camera-based detection module, exemplified by models like BEVDet and BEVDepth, and a LiDAR-assisted prompter. The LiDAR-assisted prompter leverages the LiDAR points as a complementary signal, enriched with a minimal set of additional trainable parameters. 
Notably, our framework is flexible due to our prompt-like design, which can not only be used as a lightweight multi-modal fusion method but also as a camera-only method for 3D object detection during the inference phase. Extensive experiments on nuScenes validate the effectiveness of the proposed PromptDet. As a multi-modal detector, PromptDet improves the mAP and NDS by at most 22.8\% and 21.1\% with fewer than 2\% extra parameters compared with the camera-only baseline. Without LiDAR points, PromptDet still achieves an improvement of at most 2.4\% mAP and 4.0\% NDS with almost no impact on camera detection inference time. We will release our code.

AAAI 2025

PromptDet: A Lightweight 3D Object Detection Framework with LiDAR Prompts

poster

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



The problem of federated learning (FL) where users are distributed and partitioned into clusters has been addressed through the framework of clustered federated learning (CFL). However, when users are unwilling to share their cluster identities due to privacy concerns, CFL’s training becomes difficult. To address these issues, we introduce an innovative Efficient and Robust Secure Aggregation scheme for CFL, dubbed EBS-CFL. The proposed method supports training CFL while maintaining user cluster identity confidentiality. It detects potential poisoning attacks without compromising individual client gradients by discarding negatively correlated gradients and aggregating positively correlated ones using a weighted approach. The server also authenticates correct gradient encoding by clients. 
Client-side overhead is \(O(ml + m^2)\) for communication and \(O(m^2l)\) for computation. When \(m = 1\), computational efficiency is at least \(\log{n}\) times better than other methods, where \(n\) is the number of clients, \(m\) is the number of cluster identities, and \(l\) is the gradient size.
Our method's theoretical efficiency and security are validated through comprehensive analysis and extensive experiments.

EBS-CFL: Efficient and Byzantine-robust Secure Clustered Federated Learning

Multi-objective optimization (MOO) lies at the core of many machine learning (ML) applications that involve multiple, potentially conflicting objectives (e.g., multi-task learning, multi-objective reinforcement learning, among many others). Despite the long history of MOO, recent years have witnessed a surge in interest within the ML community in the development of gradient manipulation algorithms for MOO, thanks to the availability of gradient information in many ML problems. However, existing gradient manipulation methods for MOO often suffer from long training times, primarily due to the need for computing dynamic weights by solving an additional optimization problem to determine a common descent direction that can decrease all objectives simultaneously. To address this challenge, we propose a new and efficient algorithm called Periodic Stochastic Multi-Gradient Descent (PSMGD) to accelerate MOO. PSMGD is motivated by the key observation that dynamic weights across objectives exhibit small changes under minor updates over short intervals during the optimization process. Consequently, our PSMGD algorithm is designed to periodically compute these dynamic weights and utilizes them repeatedly, thereby effectively reducing the computational overload. Theoretically, we prove that PSMGD can achieve state-of-the-art convergence rates for strongly-convex, general convex, and non-convex functions. Additionally, we introduce a new computational complexity measure, termed backpropagation complexity, and demonstrate that PSMGD could achieve an objective-independent backpropagation complexity. Through extensive experiments, we verify that PSMGD can provide comparable or superior performance to state-of-the-art MOO algorithms while significantly reducing training time.

PSMGD: Periodic Stochastic Multi-Gradient Descent for Fast Multi-Objective Optimization

Quality control is a crucial issue of label data collection by crowdsourcing. Typically, aggregation methods to redundant crowd labels are proposed for estimating high-quality labels from noisy crowd labels. Most of the existing works concentrate on the label aggregation for Single Crowd Tasks (SCTs) which have a single object set with homogeneous question types. However, it is useful for a requester to combine multiple relevant but different crowd tasks into a Composite Crowd Task (CCT) which have heterogeneous question types and (or) multiple object sets for diverse purposes. Instead of the label aggregation on each crowd task respectively, label aggregation methods by bridging multiple SCTs in CCTs can potentially improve the label quality of all tasks. In this paper, we propose a general label aggregation approach for such CCTs by worker ability constraint satisfaction and relaxed optimization. We collected real crowd datasets of CCTs with diverse task settings based on heterogeneous question types, including categorization, pairwise preference comparisons, and pairwise similarity comparisons. The results demonstrate that our approach can effectively bridge the worker information of CCTs to improve the quality of aggregated labels and outperforms the baselines proposed for SCTs.

Label Aggregation for Composite Crowd Tasks by Worker Ability Constraint Satisfaction

Despite empirical risk minimization (ERM) is widely applied in the machine learning community, its performance is limited on data with spurious correlation or subpopulation that is introduced by hidden attributes. Existing literature proposed techniques to maximize group-balanced or worst-group accuracy when such correlation presents, yet, at the cost of lower average accuracy. In addition, many existing works conduct surveys on different subpopulation methods without revealing the inherent connection between these methods, which could hinder the technology advancement in this area. In this paper, we identify important sampling as a simple yet powerful tool for solving the subpopulation problem. On the theory side, we provide a new systematic formulation of the subpopulation problem, and explicitly identify the assumptions that are not clearly stated in the existing works. This helps to uncover the cause of the dropped average accuracy. We provide the first theoretical discussion on the connections of existing methods, revealing the core components that make them different. On the application side, we demonstrate a single estimator is enough to solve the subpopulation problem. In particular, we introduce the estimator in both attribute-known and -unknown scenarios in the subpopulation setup, offering flexibility in practical use cases. And empirically, we achieve state-of-the-art performance on commonly used benchmark datasets.

Boosting Test Performance with Importance Sampling--a Subpopulation Perspective

Offline reinforcement learning has shown tremendous success in behavioral planning by learning from previously collected demonstrations. However, decision-making in multitask missions still presents significant challenges. For instance, a mission might require an agent to explore an unknown environment, discover goals, and navigate to them,
even if it involves interacting with obstacles along the way.
Such behavioral planning problems are difficult to solve due
to: a) agents failing to adapt beyond the single task learned
through their reward function, and b) the inability to generalize to new environments not covered in the training demonstrations, e.g., environments where all doors were unlocked in
the demonstrations. Consequently, state-of-the-art decision making methods are limited to missions where the required
tasks are well-represented in the training demonstrations and
can be solved within a short (temporal) planning horizon. To
address this, we propose GenPlan: a stochastic and adaptive
planner that leverages discrete-flow models for generative sequence modeling, enabling sample-efficient exploration and
exploitation. This framework relies on an iterative denoising procedure to generate a sequence of goals and actions.
This approach captures multi-modal action distributions and
facilitates goal and task discovery, thereby enhancing generalization to out-of-distribution tasks and environments, i.e.,
missions not part of the training data. We demonstrate the effectiveness of our method through multiple simulation environments. Notably, GenPlan outperforms the state-of-the-art
methods by over 10% on adaptive planning tasks, where the
agent adapts to multi-task missions while leveraging demonstrations on single-goal-reaching tasks.

GenPlan: Generative Sequence Models as Adaptive Planners

Although unsupervised multiplex graph representation learning (UMGRL) has been a hot research topic, existing UMGRL methods still has limitations to be addressed. For example, previous works either preserve structural information by ignoring the impact of heterophily in the graph structure or only focus on node-level consistency by ignoring class-level consistency. To address these issues, in this paper, we propose a new UMGRL method to explore both homophily and consistency in the multiplex graph. Specifically, we propose to restructure the multi-order relationships of every graph between every node and its multi-order neighbors to improve the homophily and reduce the impact of the heterophily in the graph structure. We also design a contrastive loss based on a self-expression matrix of the node representation matrix to achieve node-level and class-level consistency. Furthermore, we theoretically prove our method to achieve class-level consistency. Extensive experimental results on real datasets verify the effectiveness of the proposed method with respect to various downstream tasks, compared to SOTA methods.

Multiplex Graph Representation Learning with Homophily and Consistency

Prompt-based approaches offer a cutting-edge solution to data privacy issues in continual learning, particularly in scenarios involving multiple data suppliers where long-term storage of private user data is prohibited. Despite delivering state-of-the-art performance, its impressive remembering capability can become a double-edged sword, raising security concerns as it might inadvertently retain poisoned knowledge injected during learning from private user data. Following this insight, in this paper, we expose continual learning to a potential threat: backdoor attack, which drives the model to follow a desired adversarial target whenever a specific trigger is present while still performing normally on clean samples. We highlight three critical challenges in executing backdoor attacks on incremental learners and propose corresponding solutions: (1) Transferability: We employ a surrogate dataset and manipulate prompt selection to transfer backdoor knowledge to data from other suppliers; (2) Resiliency: We simulate static and dynamic states of the victim to ensure the backdoor trigger remains robust during intense incremental learning processes; and (3) Authenticity: We apply binary cross-entropy loss as an anti-cheating factor to prevent the backdoor trigger from devolving into adversarial noise. Extensive experiments across various benchmark datasets and continual learners validate our continual backdoor framework, with further ablation studies confirming our contributions' effectiveness.

Attack On Prompt: Backdoor Attack in Prompt-Based Continual Learning

Classic algorithms for stochastic bandits typically use hyperparameters that govern their critical properties such as the trade-off between exploration and exploitation. Tuning these hyperparameters is a problem of great practical significance. However this is a challenging problem and in certain cases is information theoretically impossible. To address this challenge, we consider a practically relevant transfer learning setting where one has access to offline data collected from several bandit problems (tasks) coming from an unknown distribution over the tasks. Our aim is to use this offline data to set the hyperparameters for a new task drawn from the unknown distribution. We provide bounds on the inter-task (number of tasks) and intra-task (number of arm pulls for each task) sample complexity for learning near-optimal hyperparameters on unseen tasks drawn from the distribution. Our results apply to several classic algorithms, including tuning the exploration parameters in UCB and LinUCB and the noise parameter in GP-UCB. Our experiments indicate the significance and effectiveness of transfer of hyperparameters from offline problems in online learning with stochastic bandit feedback.

Offline-to-Online Hyperparameter Transfer for Stochastic Bandits

Dynamic Facial Expression Recognition (DFER) is crucial for affective computing but often overlooks the impact of scene context. We have identified a significant issue in current DFER tasks: human annotators typically integrate emotions from various angles, including environmental cues and body language, whereas existing DFER methods tend to consider the scene as noise that needs to be filtered out, focusing solely on facial information. We refer to this as the Rigid Cognitive Problem. The Rigid Cognitive Problem can lead to discrepancies between the cognition of annotators and models in some samples. To align more closely with the human cognitive paradigm of emotions, we propose an Overall Understanding of the Scene DFER method (OUS). OUS effectively integrates scene and facial features, combining scene-specific emotional knowledge for DFER. Extensive experiments on the two largest datasets in the DFER field, DFEW and FERV39k, demonstrate that OUS significantly outperforms existing methods. By analyzing the Rigid Cognitive Problem, OUS successfully understands the complex relationship between scene context and emotional expression, closely aligning with human emotional understanding in real-world scenarios.

OUS: Bridging Scene Context and Facial Features to Overcome the Rigid Cognitive Problem

Importance sampling is a rare event simulation technique used in Monte Carlo simulations to bias the sampling distribution towards the rare event of interest.
By assigning appropriate weights to sampled points, importance sampling allows for more efficient estimation of rare events or tails of distributions. 
However, importance sampling can fail when the proposal distribution does not effectively cover the target distribution. 
In this work, we propose a method for more efficient sampling by updating the proposal distribution in the latent space of a normalizing flow. 
Normalizing flows learn an invertible mapping from a target distribution to a simpler latent distribution. 
The latent space can be more easily explored during the search for a proposal distribution, and samples from the proposal distribution are recovered in the space of the target distribution via the invertible mapping. 
We empirically validate our methodology on simulated robotics applications such as autonomous racing and aircraft ground collision avoidance.

Premium content

Next from AAAI 2025

EBS-CFL: Efficient and Byzantine-robust Secure Clustered Federated Learning

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES