United States

Large Language Models (LLMs) have shown remarkable performance across various tasks, yet significant disparities remain for non-English languages, and especially native African languages. This paper addresses these disparities by creating approximately 1 million human-translated words of new benchmark data in 8 low-resource African languages, covering a population of over 160 million speakers of: Amharic, Bambara, Igbo, Sepedi (Northern Sotho), Shona, Sesotho (Southern Sotho), Setswana, and Tsonga. Our benchmarks are translations of Winogrande and three sections of MMLU: college medicine, clinical knowledge, and virology. Using the translated benchmarks, we report previously unknown performance gaps between state-of-the-art (SOTA) LLMs in English and African languages. Finally, using results from over 400 fine-tuned models, we explore several methods to reduce the LLM performance gap, including high-quality dataset fine-tuning (using an LLM-as-an-Annotator), cross-lingual transfer, and cultural appropriateness adjustments. Key findings include average mono-lingual improvements of 5.6\% with fine-tuning (with 5.4\% average mono-lingual improvements when using high-quality data over low-quality data), 2.9\% average gains from cross-lingual transfer, and a 3.0\% out-of-the-box performance boost on culturally appropriate questions. The publicly available benchmarks, translations, and code from this study support further research and development aimed at creating more inclusive and effective language technologies.

AAAI 2025

Bridging the Gap: Enhancing LLM Performance for Low-Resource African Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments

technical paper

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



This paper shows that the semantics of programs with aggregates implemented by the solvers clingo and dlv can be characterized as extended First-Order formulas with intensional functions in the logic of Here-and-There. Furthermore, this characterization can be used to study the strong equivalence of programs with aggregates under either semantics. We also present a transformation that reduces the task of checking strong equivalence to reasoning in classical First-Order logic, which serves as a foundation for automating this procedure.

Recursive Aggregates as Intensional Functions in Answer Set Programming: Semantics and Strong Equivalence

The fundamental caching problem in networks asks to find an allocation of contents to a network of caches with the aim of maximizing the cache hit rate. Despite the problem's importance to a variety of research areas - including not only content delivery, but also edge intelligence and inference - and the extensive body of work on empirical aspects of caching, very little is known about the exact boundaries of tractability for the problem beyond its general NP-hardness. We close this gap by performing a comprehensive complexity-theoretic analysis of the problem through the lens of the parameterized complexity paradigm, which is designed to provide more precise statements regarding algorithmic tractability than classical complexity. Our results include algorithmic lower and upper bounds which together establish the conditions under which the caching problem becomes tractable.

Parameterized Complexity of Caching in Networks

Datalog is a logic programming language widely used in knowledge representation and reasoning (KRR), program analysis, and social media mining due to its expressiveness and high performance. 
Traditionally, Datalog engines use either row-oriented or column-oriented storage. Engines like VLog and Nemo favor column-oriented storage for efficiency on limited-resource machines, while row-oriented engines like Soufflé use advanced datastructures with locking to perform better on multi-core CPUs.
The advent of modern datacenter GPUs, such as the NVIDIA H100 with its ability to run over 16k threads simultaneously and high memory bandwidth, has reopened the debate on which storage layout is more effective. This paper presents the first column-oriented Datalog engines tailored to the strengths of modern GPUs. We present VFLog, a CUDA-based Datalog runtime library with a column-oriented GPU datastructure that supports all necessary relational algebra operations. Our results demonstrate over $200\times$ performance gains over SOTA CPU-based column-oriented Datalog engines and a $2.5\times$ speedup over GPU Datalog engines in various workloads, including KRR.

Column-Oriented Datalog on the GPU

Learning control policy from continuous action space by visual observations is a fundamental and challenging task in reinforcement learning (RL). An essential problem is how to accurately map the high-dimensional images to the optimal actions by the policy network. Traditional decision-making modules output actions solely based on the current observation, while the distributions of optimal actions are dependent on specific tasks and cannot be known priorly, which increases the learning difficulty. To make the learning easier, we analyze the action characteristics in several control tasks, and propose Reinforcement Learning with Residual Action (ResAct) to explicitly model the adjustments of actions based on the differences between adjacent observations, rather than learning actions directly from observations. The method just redefines the output of the policy network, and doesn’t introduce any prior assumption to constrain or simplify the vanilla control problem. Extensive experiments on DeepMind Control Suite and CARLA demonstrate that the method could improve different RL baselines significantly, and achieve state-of-the-art performance.

Visual Reinforcement Learning with Residual Action

Proportional representation plays a crucial role in electoral systems. In ordinal elections, where voters rank candidates based on their preferences, the Single Transferable Vote (STV) is the most widely used proportional voting method. STV is considered proportional because it satisfies an axiom requiring that large enough "solid coalitions" of voters are adequately represented. Using real-world data from local Scottish elections, we observe that solid coalitions of the required size rarely occur in practice. This observation challenges the importance of proportionality axioms and raises the question of how the proportionality of voting methods can be assessed beyond their axiomatic performance. We address these concerns by developing quantitative measures of proportionality. We apply these measures to evaluate the proportionality of voting rules on real-world election data. Besides STV, we consider SNTV, the Expanding Approvals Rule, and Sequential Ranked-Choice Voting. We also study the effects of ballot truncation by artificially completing truncated ballots and comparing the proportionality of outcomes under complete and truncated ballots.

Proportional Representation in Practice: Quantifying Proportionality in Ordinal Elections

The effectiveness of satisfiability solvers significantly depends on the quality of the encoding of a given problem into conjunctive normal form. Cardinality constraints are prevalent in numerous problems, prompting the development and study of various encoding types. This paper presents a novel approach to optimizing cardinality constraint encodings by examining the impact of literal orderings within the constraints, independent of the encoding type. Unlike traditional metrics, such as formula size and propagation power, our method leverages formula structure to adjust the meanings of auxiliary variables within encodings, thereby enhancing a solver's learning capabilities. Experimental evaluations on benchmarks from the maximum satisfiability competition reveal that good literal orderings can be more crucial than the choice of encoding type. Moreover, our automatically-generated literal orderings consistently improve performance across all encoding types, demonstrating the robustness of our approach.

The Impact of Literal Sorting on Cardinality Constraint Encodings

We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities. To address the lack of time-synchronized data, we align unpaired music and motion data based on rhythmic patterns to leverage existing large-scale music-only and motion-only datasets. By converting music, motion, and text into token-based representation, our model bridges these modalities through a unified encoder-decoder transformer architecture. To support multiple generation tasks within a single framework, we introduce several architectural improvements. We propose encoding motion with a music codebook, mapping motion into the same feature space as music. We introduce a music-motion parallel generation scheme that unifies all music and motion generation tasks into a single transformer decoder architecture with a single training task of music-motion joint generation. Moreover, the model is designed by fine-tuning existing pre-trained single-modality models, significantly reducing computational demands. Extensive experiments demonstrate that UniMuMo achieves competitive results on all unidirectional generation benchmarks across music, motion, and text modalities. Video demo and generated results are available in the supplementary materials.

UniMuMo: Unified Text, Music, and Motion Generation

Considering the importance of capturing both global conversational topics and local speaker dependencies for multimodal emotion recognition in conversations, current approaches first utilize sequence models like Transformer to extract global context information, then apply Graph Neural Networks to model local speaker dependencies for local context information extraction, coupled with Graph Contrastive Learning (GCL) to enhance node representation learning. However, this sequential design introduces potential biases: the extracted global context information inevitably influences subsequent processing, compromising the independence and diversity of the original local features; current graph augmentation methods in GCL cannot consider both global and local context information in conversations to evaluate the node importance, hindering the learning of key information. Inspired by the human brain excels at handling complex tasks by efficiently integrating local and global information processing mechanisms, we propose an aligned global-local context fusion framework for sequence-based design to address these problems. This design includes a dual-attention Transformer and a dual-evaluation method for graph augmentation in GCL. The dual-attention Transformer combines global attention for overall context extraction with sliding-window attention for local context capture, both enhanced by spiking neuron dynamics. The dual-evaluation method in GCL comprises global importance evaluation to identify nodes crucial for overall conversation context, and local importance evaluation to detect nodes significant for local semantics, generating augmented graph views that preserve both global and local information. This approach ensures balanced information processing throughout the pipeline, enhancing biological plausibility and achieving superior emotion recognition.

BIG-FUSION: Brain-Inspired Global-Local Context Fusion Framework for Multimodal Emotion Recognition in Conversations

\underline{D}etecting and \underline{G}rounding \underline{M}ulti-\underline{m}odal \underline{M}edia \underline{M}anipulation $(\textbf{DGM}^4)$ aims to categorize the type and localize the region of manipulation for image-text pairs in both two modalities.Existing methods have not sufficiently explored the importance of images, which contain both forgery features and content features, leading to their inefficient utilization.To address this problem, we propose an Image-Driven Decoupled Sequential Framework (\textbf{IDSeq}),
designed to decouple image features and rationally integrate them to effectively accomplish different sub-tasks.
Specifically, IDSeq employs two specially designed delicate losses to guide the disentangled learning of forgery and content features.
To efficiently and methodically leverage these features, we propose a Decoupled Image Manipulation Decoder (DIMD) that processes image tasks within a decoupled schema. 
By separating the image tasks into forgery-relevant and content-relevant components and training them without gradient interaction, we effectively mitigate the exclusive competition between these two components. Additionally, for the text tasks, we utilize content features enhanced by Manipulation Indicator Generator (MIG), which provides the maximal visual information as a reference while eliminating interference from unverified image data.
Extensive experiments show the superiority of our IDSeq, where it notably outperforms SOTA methods on the fine-grained classification task by $3.8\%$ in mAP and the forgery face grounding task by $8.7\%$ in IoUmean, even $1.3\%$ in F1 on the most challenging manipulated text grounding task.

IDseq: Decoupled and Sequentially Detecting and Grounding Multi-Modal Media Manipulation

Zeroth-order (ZO) optimization as the gradient-free method has become a powerful tool when the first-order gradient is unavailable or expensive to obtain, especially in decentralized learning scenarios where data and computational resources are distributed across multiple clients. There have been many efforts to analyze the optimization convergence rate of zeroth-order decentralized stochastic gradient descent (ZO-DSGD) algorithms. However, the generalization of these methods has not been well studied. In this paper, we provide a generalization analysis of ZO-DSGD with changing topology, where the clients run zeroth-order SGD with local data and communicate with each other according to time-varying topology. We systematically analyze the generalization error in convex, strongly convex, and non-convex cases. The obtained results in the convex and strongly convex cases with zeroth-order oracles recover the results of SGD. Moreover, the generalization bounds derived in non-convex cases align with that of DSGD. To capture the influence of communication topology on the generalization performance, we analyze local generalization bounds concerning local models held at different clients. The obtained results reflect the influence of the number of clients, local sample size, and topology on the generalization error. To the best of our knowledge, this is the first work to provide a generalization analysis of zeroth-order decentralized stochastic gradient descent methods.

Downloads

Next from AAAI 2025

Recursive Aggregates as Intensional Functions in Answer Set Programming: Semantics and Strong Equivalence

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2025

Recursive Aggregates as Intensional Functions in Answer Set Programming: Semantics and Strong Equivalence

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads