United States

Long-context large language models (LLMs) inference is increasingly critical, motivating a number of studies devoted to alleviating the substantial storage and computational costs in such scenarios. Layer-wise skipping methods are promising optimizations but rarely explored in long-context inference. We observe that existing layer-wise skipping strategies have several limitations when applied in long-context inference, including the inability to adapt to model and context variability, disregard for sublayer significance, and inapplicability for the prefilling phase. This paper proposes AdaSkip, an adaptive sublayer skipping method specifically designed for long-context inference. AdaSkip adaptively identifies less important layers by leveraging on-the-fly similarity information, enables sublayer-wise skipping, and accelerates both the prefilling and decoding phases. The effectiveness of AdaSkip is demonstrated through extensive experiments on various long-context benchmarks and models, showcasing its superior inference performance over existing baselines.

AAAI 2025

AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference

poster

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning approaches are resource-intensive, requiring task and dataset-specific training. We present an automated system that utilizes large language models to generate executable code for tasks like missing value imputation, error detection, and error correction. Our system aims to identify inherent patterns in the data while leveraging external knowledge, effectively addressing both memory-dependent and memory-independent tasks.

Data Wrangling Task Automation Using Code-Generating Language Models

The complexity of the shipping industry, dynamic operational drivers, and diverse data sources present significant scalability challenges for digital twins. Agentic Large Language Models (LLMs) augmented with external tools offer a promising solution to accelerate digital twin adoption. Using pre-trained knowledge and reasoning capabilities, these LLMs autonomously select optimal tools and data streams for user-specific queries, enabling language to serve as a universal interface between digital twins and various stakeholders, from technicians to fleet managers. This interface facilitates real-time decision making and insight generation across multiple operational workflows. In this demonstration, we present an interactive agentic digital twin designed to enhance scalability, flexibility, and efficiency in managing the extensive and intricate decision-making requirements of the shipping industry. We showcase the transformative potential of agentic LLMs in reducing complexity and improving the practical application of digital twins, ultimately enabling more efficient operations in real-world settings.

Agentic AI for Digital Twin

Matching markets, in which agents are assigned to one another based on preferences and capacity constraints, are pervasive in various domains. This paper introduces MATWA (https://matwa.optimalmatching.com), a web application that offers the most comprehensive collection to date of algorithms for fundamental matching under preference problem classes. MATWA provides results of algorithm executions and visualisations of structural properties. It is intended to be a resource for the community of researchers, educators and practitioners, supporting experimentation, as well as aiding the understanding of matching algorithms.

MATWA: A Web Toolkit for Matching Under Preferences

We propose a novel system, MathMistake Checker, designed to automate step-by-step mistake finding in mathematical problems with lengthy answers through a two-stage process. The system aims to simplify grading, increase efficiency, and enhance learning experiences from a pedagogical perspective. It integrates advanced technologies, including computer vision and the chain-of-thought capabilities of the latest large language models (LLMs). Our system supports open-ended grading without reference answers and promotes personalized learning by providing targeted feedback. We demonstrate its effectiveness across various types of math problems, such as calculation and word problems.

MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs

Transfer learning enhances model performance in financial time series by leveraging data from related domains. The selection of appropriate source domains is crucial to avoid negative transfer. We propose using Gramian Angular Field (GAF) transformations to improve time series similarity functions for better domain alignment. Extensive experiments with DNN and LSTM models show that GAF-based similarity functions, specifically Coral (GAF) for DNN and CMD (GAF) for LSTM, significantly reduce prediction errors, demonstrating their effectiveness in complex financial environments.

Transfer Learning in Financial Time Series with Gramian Angular Field (Student Abstract)

Esports has rapidly emerged as a global phenomenon with an ever-expanding audience on livestream platforms. However, due to the complex nature of the game, it becomes challenging for newcomers to comprehend the gaming situation. This research introduces a 3M-Game that integrates multi-modal (MM) information from the livestream platform, including chat and livestream, to uncover the event. While conventional MM models typically prioritise aligning MM data through concurrent training towards a unified objective, our framework leverages multiple independent teachers trained on different tasks to accomplish game event detection. The results show the effectiveness of the proposed framework. The code and appendix are in https://github.com/adlnlp/3m_game.

3M-Game: Multi-Modal Multi-Task Multi-Teacher Learning for Game Event Detection (Student Abstract)

High-accuracy image segmentation models require abundant training annotated data which is costly for pixel-level annotations. Our work addresses a high-cost manual annotating process or the lack of detailed annotations via a generative approach. In particular, our approach (1) proposes the conditional instance-level synthesis to enrich the limited data to enhance the segmentation performance, and (2) employs the generative architectures to complete the segmentation task under few-shot learning concepts. The initial results on the Cityscapes benchmark emphasize our potential generative solution on the instance segmentation task given limited data.

A Generative Approach at the Instance-Level for Image Segmentation Under Limited Training Data Conditions (Student Abstract)

This paper proposes extended Long Short-Term Memory (LSTM) networks for the knowledge tracing task and employs explainable AI methods to address interpretability issues. Specifically, we developed an extended LSTM-based model to automatically diagnose students' knowledge states. We then leveraged three interpreting methods—gradient sensitivity, gradient*input, and Deep SHAP—to explain the model's predictions by computing input contributions. The results demonstrate that the proposed model outperforms DKT, and the three methods effectively explain its predictions. Additionally, we identified three key insights into the model's working mechanisms.

Extended LSTMs for Knowledge Tracing: Peeking Inside the Black Box (Student Abstract)

This abstract presents a simulated annealing based approach that constructs hyper-spectral images from the frequency spectrums of a distributed acoustic sensing system and iteratively improves them through the training of learnable filters. The aim is to construct an image that represents features of signals from events while repressing noise. Hyper-spectral images are specifically created for downstream computer vision tasks such as object detection. Hyper-spectral images are images with more than three channels that are derived from a frequency spectrum to obtain the spectrum for each image pixel. Simulated annealing is used to train the filters to automatically select frequencies and bin them into frequency bands. Each frequency band is mapped into an image channel. We fully integrate our filtering method with an object detection network so that filters are trained in conjunction with the neural network. The detection model serves as both the measure and the selector. Our simulated annealing approach significantly outperforms current state-of-the-art methods by a margin of 22%. Limitations include a dependency on randomness and excluding parts of the search space prematuraly due to the design of the local moves.

Acoustic-to-Hyper-Spectral: Hyper-Spectral Image Construction from Frequency Spectrums Through Simulated Annealing (Student Abstract)

Attempting to align AI capabilities and value structures by means of value elicitation from humans, such as through Reinforcement Learning from Human Feedback (RLHF), is a computational challenge that raises both psychological and philosophical questions. Adopting an evolutionary perspective on the emergence of value structures in humans and machine learning systems can offer a bridge between qualitative and quantitative aspects of alignment. Here, evolutionary dynamics are applied to a game-theoretic model of RLHF. This allows for formal reasoning about the process and capabilities that result from alignment training, even where quantitative benchmarks cannot be clearly defined. A simple parametrized game model of RLHF, subject to replicator dynamics, shows how the success of the training method is sensitive to bias in human judgments. Under ideal conditions, RHLF training leads to aligned behavior. If the choice pattern of the human judge is biased, the training instead incentivizes misalignment. This application shows that evolutionary analyses can contribute to improving the prospects for safety and support successful cooperation between humans and AI systems in deployment.

Premium content

Next from AAAI 2025

Data Wrangling Task Automation Using Code-Generating Language Models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES