United States

We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step $h$ in dynamic programming (DP) algorithms. To learn this, it is sufficient to evaluate the treatment effect of deviating from the behavioral policy at step $h$ after having optimized the policy for all future steps. Since the policy at any step can affect next-state distributions, the related distributional shift challenges can make this problem far more statistically hard than estimating such treatment effects in the stochastic contextual bandit setting. However, the hardness of many real-world RL instances lies between the two regimes. We develop a flexible and general method called selective uncertainty propagation for confidence interval construction that adapts to the hardness of the associated distribution shift challenges. We show benefits of our approach on toy environments and demonstrate the benefits of these techniques for offline policy learning.

AAAI 2025

Selective Uncertainty Propagation in Offline RL

reinforcement learning

technical paper

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



Predicting roll call votes through modeling political actors has emerged as a focus in quantitative political science and computer science. Widely used embedding-based methods generate vectors for legislators from diverse data sets to predict legislative behaviors. However, these methods often contend with challenges such as the need for manually predefined features, reliance on extensive training data, and a lack of interpretability. Achieving more interpretable predictions under flexible conditions remains an unresolved issue. This paper introduces the Political Actor Agent (PAA), a novel agent-based framework that utilizes Large Language Models to overcome these limitations. By employing role-playing architectures and simulating legislative system, PAA provides a scalable and interpretable paradigm for predicting roll-call votes. Our approach not only enhances the accuracy of predictions but also offers multi-view, human-understandable decision reasoning, providing new insights into political actor behaviors. We conducted comprehensive experiments using voting records from the 117-118th U.S. House of Representatives, validating the superior performance and interpretability of PAA. This study not only demonstrates PAA's effectiveness but also its potential in political science research.

Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models

The satisfiability (SAT) problem of higher-order quantified Boolean formula (HOQBF) emerged as a natural generalization of SAT, quantified SAT (QSAT), and second-order SAT.  It allows succinct encoding of $k$-EXPTIME problems beyond the reach of prior Boolean satisfiability formulations, but its application was hampered by the lack of solvers.  In this paper, we present the first HOQBF solver that leverages techniques from the model-checking community.   Our HOQBF solver is based on reduction to higher-order model checking, which is a generalization from model checking of while-programs to that of higher-order functional programs.  The ability of a higher-order model checker to deal with higher-order functions in a program is used to reason about higher-order quantifiers in HOQBF.

Solving Higher-Order Quantified Boolean Satisfiability via Higher-Order Model Checking

Transfer learning for bio-signals has recently become an important technique to improve prediction performance on downstream tasks with small bio-signal datasets. Recent works have shown that pre-training a neural network model on a large dataset (e.g. EEG) with a self-supervised task, replacing the self-supervised head with a linear classification head, and fine-tuning the model on different downstream bio-signal datasets (e.g., EMG or ECG) can dramatically improve the performance on those datasets. In this paper, we propose a new convolution-transformer hybrid model architecture with masked auto-encoding for low-data bio-signal transfer learning, introduce a frequency-based masked auto-encoding task, employ a more comprehensive evaluation framework, and evaluate how much and when (multimodal) pre-training improves fine-tuning performance. We also introduce a dramatically more performant method of aligning a downstream dataset with a different temporal length and sampling rate to the original pre-training dataset. Our findings indicate that the convolution-only part of our hybrid model can achieve state-of-the-art performance on some low-data downstream tasks. The performance is often improved even further with our full model. In the case of transformer-based models we find that pre-training especially improves performance on downstream datasets, multimodal pre-training often increases those gains further, and our frequency-based pre-training performs the best on average for the lowest and highest data regimes.

CiTrus: Squeezing Extra Performance out of Low-data Bio-signal Transfer Learning

We improve the efficacy of bound-propagation-based neural network verification by reducing the computational effort required by state-of-the-art propagation methods without incurring any loss in precision. We propose a method that infers the stability of ReLU nodes at every step of the back-substitution process, thereby dynamically simplifying the coefficient matrix of the symbolic bounding equations. We develop a heuristic for the effective application of the method and discuss its evaluation on common benchmarks where we show significant improvements in bound propagation times.

Dynamic Back-Substitution in Bound-Propagation-Based Neural Network Verification

A wide variety of goals could cause an AI to disable its off switch because ``you can’t fetch the coffee if you’re dead'' (Russell, 2019). Prior theoretical work on this shutdown problem assumes that humans know everything that AIs do. In practice, however, humans have only limited information. Moreover, in many of the settings where the shutdown problem is most concerning, AIs might have vast amounts of private information. To capture these differences in knowledge, we introduce the Partially Observable Off-Switch Game (POSG), a game-theoretic model of the shutdown problem with asymmetric information. Unlike in the fully observable case, we find that in optimal play, even AI agents assisting perfectly rational humans sometimes avoid shutdown. As expected, increasing the amount of communication or information available always increases (or leaves unchanged) the agents' expected common payoff. But counterintuitively, introducing bounded communication can make the AI defer to the human less in optimal play even though communication mitigates information asymmetry. Thus, designing safe artificial agents in the presence of asymmetric information requires careful consideration of the tradeoffs between maximizing payoffs (potentially myopically) and maintaining AIs’ incentives to defer to humans.

The Partially Observable Off-Switch Game

We introduce a neural-certificate framework for the safety assurance of continuous-time nonlinear stochastic dynamical systems, 
with provable guarantees against quantitative specifications of reachability, avoidance and persistence.
Despite the rising complexity of safety requirements for autonomous learning systems in the physical world—which demand continuous-time reasoning—existing learnable certificates for probabilistic verification and control assume discretization of the time continuum.
Inspired by the success of training neural Lyapunov certificates for deterministic continuous-time systems and neural supermartingale certificates for stochastic discrete-time systems, we propose a framework that bridges the gap between continuous-time and probabilistic neural certification for dynamical systems under complex requirements. 
Our method combines machine learning and symbolic reasoning to produce formally certified bounds on the probabilities that a continuous-time stochastic dynamical system reaches a target region while avoiding unsafe states, with the option to certify that the system is likely to remain within that region.
We present both the theoretical justification and the algorithmic implementation of our framework and showcase its efficacy on popular benchmarks.

Neural Continuous-Time Supermartingale Certificates

In a decision-making scenario, a principal could use conditional predictions from an expert agent to inform their choice. However, this approach would introduce a fundamental conflict of interest. An agent optimizing for predictive accuracy is incentivized to manipulate their principal towards more predictable actions, which prevents that principal from being able to deterministically select their true preference. We demonstrate that this impossibility result can be overcome through the joint evaluation of multiple agents. When agents are made to engage in zero-sum competition, their incentive to influence the action taken is eliminated, and the principal can identify and take the action they most prefer. We further prove that this zero-sum setup is unique, efficiently implementable, and applicable under stochastic choice. Experiments in a toy environment demonstrate that training on a zero-sum objective significantly enhances both predictive accuracy and principal utility, and can eliminate previously learned manipulative behavior.

Joint Scoring Rules: Competition Between Agents Avoids Performative Prediction

This paper introduces a general framework for generate-and-test-based solvers for epistemic logic programs that can be instantiated with different generate and test programs, and it provides sufficient conditions on those programs for the correctness of the solvers built using this framework. It also introduces a new generator program that incorporates the propagation of epistemic consequences and shows that this can exponentially reduce the number of candidates that need to be tested while only incurring a linear overhead. We implement a new solver based on these theoretical findings and experimentally show that it outperforms existing solvers by achieving a ~3.3x speed-up and solving 87\% more instances on well-known benchmarks.

Solving Epistemic Logic Programs Using Generate-and-Test with Propagation

Multi-view clustering (MVC) methods have gained significant attention, most of which adopt centralized data settings. Real-world multi-view data may probably be collected and stored by different organizations, which increases the challenge of practical MVC deployment and motivates the emergence of federated MVC (FMVC).  However, existing FMVC methods require post-processing to obtain clustering labels and struggle to explore complementary and consistent information between multi-view data located in different entities. To address these issues, we propose a novel Scalable Federated One-Step Multi-View Clustering with Tensorized Regularization (SFOMVC-TR), which incorporates anchor graph to improve the clustering efficiency and scalability on high-dimensional data.  SFOMVC-TR performs embedding learning on the anchor graph, and by applying sparse constraints to the projection matrix, it effectively eliminates the redundant information in anchor graphs, thereby allowing clustering in one step. We further introduce a third-order tensor to capture complementary multi-view features and consistent information simultaneously. A federated optimization algorithm is developed to support collaborative and privacy-preserving training under the coordination of a server. Extensive experiments on multiple datasets demonstrate the effectiveness and superiority of our proposed method.

Scalable Federated One-Step Multi-View Clustering with Tensorized Regularization

In this blue sky paper, we seek to stimulate the research community to pursue important new as well as existing (unsolved) AI problems in the context of a challenging, often ignored, socio-sensitive application domain. We outline the key challenges in conducting elections credibly in leading democracies around the world today and identify our vision of a path forward with an overarching goal to increase voter participation with a two-pronged approach of AI-lead technological innovations and interdisciplinary community building. On the technology front, we envisage the need to transform Collation and Distribution of election information, and promote its Comprehensibility for users understanding and trust (CDC). On the community front, we need to invigorate the multi-disciplinary community consisting of, but not limited to, researchers in AI, security, journalism, political science, sociology, and business, to PROMote AI’s Safe usage for Elections (PROMISE) with best-practices. This work is informed by our interdisciplinary research as well as experience in conducting three workshops at leading AI conferences and the AI Magazine special issue on {\em AI and Elections}.

Premium content

Next from AAAI 2025

Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES