Singapore

Ensuring the safety of Large Language Models (LLMs) in diverse linguistic settings remains challenging, particularly for low-resource languages. Existing safety alignment methods are English-centric, limiting their effectiveness. We systematically compare Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Kahneman-Tversky Optimization (KTO) for aligning SEA-Lion-v2.1-Instruct, a Llama 3-8B variant, to reduce toxicity in Singlish. Our results show that SFT+KTO achieves superior safety alignment with higher sample efficiency than DPO. Additionally, we introduce KTO-S, which enhances stability via improved KL divergence regularization. Our approach reduces Singlish toxicity by 99%, generalizes to TOXIGEN, and maintains strong performance on standard LLM benchmarks, providing a scalable framework for safer AI deployment in multilingual contexts.

AAAI 2026

Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages – A Singlish Case Study

workshop paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The rapid proliferation of large language models (LLMs) in applications targeting children and adolescents necessitates a fundamental reassessment of prevailing AI safety frameworks, which are largely tailored to adult users and neglect the distinct developmental vulnerabilities of minors. This paper highlights key deficiencies in existing LLM safety benchmarks, including their inadequate coverage of age-specific cognitive, emotional, and social risks spanning early childhood (ages 0–6), middle childhood (7–12), and adolescence (13–18). To bridge these gaps, we introduce SproutBench, an innovative evaluation suite comprising 1,283 developmentally grounded adversarial prompts designed to probe risks such as emotional dependency, privacy violations, and imitation of hazardous behaviors. Through rigorous empirical evaluation of 47 diverse LLMs, we uncover substantial safety vulnerabilities, corroborated by robust inter-dimensional correlations (e.g., between Safety and Risk Prevention, p = 0.86) and a notable inverse relationship between Interactivity and Age Appropriateness (p = -0.48). These insights yield practical guidelines for advancing child-centric AI design and deployment.

SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth

Interpretability and robustness remain major challenges for modern Large Language Models (LLMs), especially in settings where conventional evaluation or auditing tools are limited. To address this, we propose Inverse Language Modeling (ILM), a unified training framework that jointly enhances robustness to adversarial perturbations and enables a novel form of gradient-based interpretability. Rather than reconstructing exact input prompts, ILM encourages LLMs to develop gradient-aligned internal representations that allow the model to approximate plausible input patterns underlying a given output. This approximate inversion provides a new mechanism for analyzing model behavior, identifying potential triggers for unsafe generations, and supporting lightweight governance and red-teaming workflows. Our results show that ILM can simultaneously improve robustness and produce meaningful inversion signals, laying a foundation for LLMs that are not only more resilient but also more transparent and analyzable.

Inverse Language Modeling towards Robust and Grounded LLMs

Recent multilingual language models promise support for “100+ languages,” yet speakers of Indigenous and other underrepresented languages still often do not see themselves in these advances. In this work, we take a deliberately simple, secondary-benchmark perspective: rather than proposing a new model or dataset, we re-evaluate an off-the-shelf multilingual natural language inference (NLI) model on public benchmarks that explicitly include Indigenous languages of the Americas. Concretely, we use the AmericasNLI benchmark for ten Indigenous languages and XNLI for English and Spanish, and we evaluate the widely used joeddav/xlm-roberta-large-xnli model under a fixed, zero-shot protocol. Our goal is to answer three questions: (i) How large is the performance gap between high- resource and underrepresented languages under the same model and task? (ii) Are these gaps consistent across languages, or do some communities fare systematically worse than others? (iii) What kinds of qualitative errors arise, and what do they suggest about cultural and linguistic mismatch? Our experiments reveal a striking discrepancy: while English and Spanish reach almost perfect accuracy on XNLI (around 99.8% on our runs), the same model averages only about 43% accuracy across ten Indigenous languages in AmericasNLI, with none exceeding 47%. We also show qualitative NLI failures in Quechua that point to difficulties with morphology, idioms, and discourse-level inference. We argue that even such a simple re-analysis can serve as a low-cost yet high-impact tool for making inequities in multilingual NLP visible, especially for communities that rarely appear in headline benchmarks.

Advancing NLP Equity: A Secondary Benchmark Evaluation of Multilingual Language Models for Underrepresented Languages

Guardian models monitor and regulate the outputs of user-facing AI systems. However, current guardian models fall short in two key ways. First, they are predominantly Western-centric and optimized for high-resource languages, leaving low-resource African languages vulnerable to evolving harms, cross-lingual safety failures, and cultural misalignment. Second, most guardian models rely on rigid, predefined safety categories that do not generalize across diverse linguistic and sociocultural contexts. Ensuring robust safety requires flexible, runtime-enforceable policies and benchmarks that reflect local norms, harm scenarios, and cultural expectations. We introduce UbuntuGuard, the first African policy-based safety benchmark built from adversarial queries authored by 155 domain experts across sensitive fields, including healthcare, education, government, and finance. From these expert-crafted queries, we derive context-specific safety policies and reference responses that capture culturally grounded risk signals, enabling policy-aligned evaluation of guardian models. We evaluate six state-of-the-art guardian models, including static, dynamic, and multilingual variants, under multiple scenarios. Our findings reveal that existing English-centric benchmarks overestimate real-world multilingual safety, cross-lingual transfer provides partial but insufficient coverage, and dynamic models, while better equipped to leverage policies at inference time, still struggle in fully localized African-language contexts. These findings highlight the urgent need for multilingual, culturally grounded safety benchmarks to enable the development of reliable and equitable guardian models for low-resource languages.


UbuntuGuard: A Policy-Based Safety Benchmark for Low-Resource African Languages

Static leaderboards and single turn judgments correlate weakly with deployment outcomes, especially in multilingual and resource constrained settings. This position paper argues that credible evaluation hinges on verifiability: ex ante specifications that permit observable checks, repeatable scoring, and auditable evidence. We propose a minimal standard that makes verifiability first class while remaining compatible with existing workflows. The standard has four artifacts: a task schema, a validator entry point, a run card, and required reporting fields. We ground the proposal in prior work on coverage and transparency and on specification based checks. We present a prototype evaluation task for schema constrained instruction following with robustness probes and a multilingual protocol, and we attach measurement and governance procedures that link scores to validity arguments. The goal is to replace generic win rates with verifiable claims about task success that better predict real use across languages and contexts.

Beyond Static Leaderboards: A Roadmap to Naturalistic, Functional Evaluation of LLMs

Underserved and extremely low-resource languages challenge current language technologies, especially when lexical borrowing and synonymy undermine exact-match assumptions. We study Bahnaric-Vietnamese lexical mapping as a step toward meaning-preserving sentence translation. Unlike prior work based on static embeddings and Mean Squared Error (MSE) alignment, we learn sentence-aware word representations with a small multilingual transformer pretrained on Vietnamese, adapt it with Low-rank adaptation (LoRA) for parameter efficiency, and align Bahnaric-Vietnamese pairs using a two-layer projection trained with InfoNCE contrastive loss. We exploit a new community-sourced lexicon of approximately 10,000 Bahnaric-Vietnamese pairs collected with local partners, capturing one-to-one, one-tomany, and many-to-one anchor relations as well as extensive lexical borrowing. Experiments evaluate retrieval-style alignment with Precision at K (P@K) and Mean Reciprocal Rank (MRR), as well as sentence translation using top-1 accuracy, Bilingual Evaluation Understudy (BLEU) and Character ngram F-score (ChrF). On the ∼1k lexicon, our best model attains P@1 ≈ 0.53 and MRR ≈ 0.62, substantially improving over a static-embedding MSE baseline, while on the richer ∼10k community lexicon it reaches comparable sentencelevel top-1 accuracy despite slightly lower BLEU and chrF, highlighting both the benefits of the expanded resource and the remaining challenges of synonym-rich, low-frequency vocabulary.

Sentence-Aware Bahnaric-Vietnamese Lexical Mapping with Contrastive Contextual Representations

We present ENLIVEN-1000, a unified framework for endangered and low-resource language revitalization that integrates broad-coverage language identification (LID), machine translation (MT), and LLM-generated synthetic data—aimed at expanding safe, equitable NLP support for communities historically excluded from mainstream tools. We compile a text corpus for 1154 languages (1069 endangered or low-resource) from public sources and train a fastText-based LID model covering this vast set. The LID system achieves high detection quality with F1 ≈ 0.99 and FPR ≈ 3×10−6, substantially broadening reliable coverage beyond existing solutions. Focusing on five diverse endangered languages—Carpathian Romani, Chuj, Sunwar, Kapingamarangi, and Inuktitut—we fine-tune a 600M-parameter NLLB-200 model for translation. Our fine-tuned models outperform zero-shot baselines and even proxy models trained on related, high-resource languages, in both directions (endangered -> English and English -> endangered). We further use GPT-4o to generate synthetic parallel data, demonstrating that augmenting limited real data with LLM-generated text yields substantial MT improvements. These results illustrate a practical path toward scaling NLP support to hundreds of under-resourced languages. We discuss implications for language revitalization and ethical considerations in working with endangered language communities.



ENLIVEN-1000: A Comprehensive Revitalization Framework for 1000+ Endangered Languages via Broad-Coverage LID and LLM-Augmented MT

Large language models (LLMs) enable scalable conversational support for postpartum depression (PPD), yet current systems insufficiently account for intra-lingual cultural variation even within high-resource languages such as Chinese. Dialectal phrasing, local idioms, and culturally embedded expressions (e.g., Northeastern Mandarin "zhabayue de teng" (humorous discomfort) or the Southern Min "xin-gua-a-tia
" (deep sorrow)) often produce misinterpretation, safety-critical ambiguity, or emotionally inappropriate responses in PPD-related dialogues. We introduce CAMA (Culturally Adaptive Multi-Agent Co-Design Framework), a lightweight cultural-sensitivity detection and alignment framework that identifies dialect-specific linguistic cues and supplements LLMs with contextual socio-cultural grounding without performing clinical diagnosis. Our approach integrates culturally aware prompting and intervention logic to enhance empathy, safety, relevance, and user trust. This work highlights that cultural fairness in mental-health LLMs must consider intra-language diversity, not only cross-lingual disparity. CAMA provides a practical pathway towards culturally aligned, safe, and trustworthy mental-health dialogue systems.

CAMA: A Culturally Adaptive Multi-Agent Framework for Postpartum Depression Support in Multilingual and Low-Resource Settings

Tokenization serves as a crucial preprocessing step in multilingual language models, affecting performance in both high-resource and low-resource languages. However, current tokenizers seem to adopt language biases due to unbalanced training datasets, leading to a poorly optimized tokenizer for underrepresented languages. This research examines the impact of balanced multilingual datasets on the performance of tokenizers trained with the Byte Pair Encoding, WordPiece, and Unigram Language Model algorithms. We build balanced corpora from various sources to study the impact of vocabulary size on 15k, 30k, 50k dataset scales. The trained tokenizers are assessed through intrinsic metrics, including Subword Fertility and Normalized Sequence Length, as well as through extrinsic performance on downstream tasks like Part-of-Speech tagging, Named Entity Recognition, and Machine Translation. We build custom data sets along with customized evaluation pipelines to enable consistent comparisons across nine languages using models built into standard NLP frameworks. Our observations reinforce the importance of a balanced dataset when training tokenizers and, in turn, advance the development of equitable and robust multilingual NLP systems.

From Bias to Balance: How Multilingual Dataset Composition Affects Tokenizer Performance Across Languages

Neural Machine Translation (NMT) for low-resource and underserved languages remains challenging due to the severe lack of parallel corpora, linguistic tools, and evaluation resources. The issue is evident in Vietnam, where the ethnolinguistic minority languages Tày (Tai–Kadai) and Bahnar (Austroasiatic) hold cultural significance but remain digitally under-represented. Data Augmentation (DA) offers a cost-effective remedy; however, most existing techniques were designed for high-resource analytic languages and are often applied heuristically without assessing their linguistic compatibility. In this work, we present the first systematic study of DA for two minority language pairs, Tày–Vietnamese and Bahnar–Vietnamese, within a three-stage language model pipeline consisting of Vietnamese-based initialization, monolingual adaptation, and supervised fine-tuning. We train two independent encoder–decoder NMT systems to isolate augmentation effects and analyze how linguistic typology shapes augmentation behavior. Our experiments show that meaning-preserving DA methods consistently improve translation adequacy and linguistic faithfulness, whereas several widely used techniques introduce semantic or structural degradation. Through quantitative evaluation and typology-aware linguistic analysis, we derive practical guidelines for selecting DA strategies in extremely low-resource and typologically diverse settings. We additionally release newly digitized high-quality bilingual corpora and trained models to facilitate future research and community-centered NLP development.

Premium content

Next from AAAI 2026

SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES