Singapore

Self-hosting large language models (LLMs) is increasingly appealing for organizations seeking privacy, cost control, and customization. Yet deploying and maintaining in house models poses challenges in GPU utilization, workload routing, and reliability. We introduce Pick and Spin, a practical framework that makes self hosted LLM orchestration scalable and economical. Built on Kubernetes, it integrates a unified Helm based deployment system, adaptive scale-to-zero automation, and a hybrid routing module that balances cost, latency, and accuracy using both keyword heuristics and a lightweight DistilBERT classifier. We evaluate four models Llama 3 (90 B), Gemma 3 (27 B), Qwen 3 (235 B), and DeepSeek R1 (685 B) across eight public benchmark datasets, with five inference strategies, and two routing variants encompassing 3200 prompts and 1,60,000 inference runs. Pick and Spin achieves up to 10% higher accuracy, 30% lower latency, and 33% lower GPU cost per query compared with static deployments. These results show that intelligent orchestration and efficient scaling enable enterprise grade LLM performance on self hosted infrastructure, bringing high capacity AI within practical and affordable reach.

AAAI 2026

Efficient Multi-Model Orchestration for Self-Hosted Large Language Models

workshop paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Meta-learning enables models to rapidly adapt to new tasks by leveraging prior experience, but its adaptation mechanisms remain opaque, especially regarding how past training tasks influence future predictions. We introduce TLXML (Task-Level eXplanation of Meta-Learning), a novel framework that extends influence functions to meta-learning settings, enabling task-level explanations of adaptation and inference. By reformulating influence functions for bi-level optimization, TLXML quantifies the contribution of each meta-training task to the adapted model’s behaviour. To ensure scalability, we propose a Gauss-Newton-based approximation that significantly reduces computational complexity from $O(pq^2)$ to $O(pq)$, where p and q denote model and meta parameters, respectively. Empirical results demonstrate that TLXML effectively ranks training tasks by their influence on downstream performance, offering concise and intuitive explanations aligned with user-level abstraction. This work provides a critical step toward interpretable and trustworthy meta-learning systems.

A Task-Level Explanation Framework for Meta-Learning Algorithms

As Reinforcement Learning (RL) agents are increasingly deployed in real-world applications, ensuring their behavior is transparent and trustworthy is paramount. A key component of trust is explainability, yet much of the work in Explainable RL (XRL) focuses on local, single-step decisions. This paper addresses the critical need for explaining an agent's long-term behavior through trajectory-level analysis. We introduce a novel framework that ranks entire trajectories by defining and aggregating a new state-importance metric. This metric combines the classic Q-value difference with a "radical term" that captures the agent's affinity to reach its goal, providing a more nuanced measure of state criticality. We demonstrate that our method successfully identifies optimal trajectories from a heterogeneous collection of agent experiences. Furthermore, by generating counterfactual rollouts from critical states within these trajectories, we show that the agent's chosen path is robustly superior to alternatives, thereby providing a powerful "Why this, and not that?" explanation. Our experiments in standard OpenAI Gym environments validate that our proposed importance metric is more effective at identifying optimal behaviors compared to classic approaches, offering a significant step towards trustworthy autonomous systems.

Know your Trajectory - Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis

This paper introduces a novel AI-driven approach for extracting actionable insights from corporate communications by quantifying strategic ambiguity in language. While prior work in natural language analysis has largely focused on sentiment or factual content, we explore how organizations deliberately hedge, obscure, or soften information, using linguistic ambiguity as a rich signal of intent and hidden meaning. We propose the Strategic Ambiguity Score (SAS), a machine learning model that captures deliberate vagueness by integrating hedge frequency, negation patterns, and model-based attention to critical phrases. Unlike traditional sentiment models, SAS measures not just what is said, but how and where uncertainty is strategically embedded within the text. We demonstrate that SAS can effectively highlight subtle signals that correlate with subsequent outcomes, and we illustrate its utility through predictive analyses in corporate disclosures. By shifting the focus from simple sentiment interpretation to ambiguity detection, this work provides a generalizable framework for AI applications in decision-making, risk assessment, and strategic communication analysis across diverse domains.

Quantifying Strategic Ambiguity in Corporate Language for AI-Driven Trading Strategies

The rapid integration of AI into education has prioritized capability over trustworthiness, creating significant risks. Real-world deployments reveal that even advanced models are insufficient without extensive architectural scaffolding to ensure reliability. Current evaluation frameworks are fragmented: institutional policies lack technical verification, pedagogical guidelines assume AI reliability, and technical metrics are context-agnostic. This leaves institutions without a unified standard for deployment readiness. This paper introduces TEAS (Trusted Educational AI Standard), an integrated framework built on four interdependent pillars: (1) Verifiability, grounding content in authoritative sources; (2) Stability, ensuring deterministic core knowledge; (3) Auditability, enabling independent institutional validation; and (4) Pedagogical Soundness, enforcing principles of active learning. We argue that trustworthiness stems primarily from systematic architecture, not raw model capability. This insight implies that affordable, open-source models can achieve deployment-grade trust, offering a scalable and equitable path to integrating AI safely into learning environments globally.

TEAS: Trusted Educational AI Standard: A Framework for Verifiable, Stable, Auditable, and Pedagogically Sound Learning Systems

Large Language Models (LLMs) adapted through Low Rank Adaptation (LoRA) often exhibit weakened safety alignment, even when fine tuned on benign datasets. Such degradation poses significant risks for deployable AI systems, where parameter updates can unintentionally introduce unsafe or unstable behaviors. In this work, we propose Directional Deviation Index Guided Pruning (DDI Pruning), a post hoc and data free framework for diagnosing and mitigating unsafe LoRA adaptations. DDI quantifies the spectral and directional deviation of each LoRA updated layer relative to its pretrained baseline, identifying layers that contribute most to instability or misalignment. Layers with high DDI scores are selectively pruned, improving both model robustness and computational efficiency without additional training or supervision. We evaluate the proposed approach on multiple language generation and agent planning benchmarks using several LLM backbones. Results show that DDI Pruning consistently reduces harmful or adversarial behaviors while preserving task accuracy and coherence. Ablation studies further demonstrate that each component of DDI contributes to capturing unsafe adaptation patterns, highlighting its interpretability and generality across domains. Overall, DDI Pruning provides an effective and practical mechanism for enhancing the safety alignment of adapted LLMs and contributes to the development of reliable and deployable AI systems.

Safe and Deployable LLM Adaptation: Directional Deviation Index–Guided Model Pruning

The paper explores how video models trained for classification tasks represent nuanced, hidden semantic information that may not affect the final outcome, a key challenge for Trustworthy AI models. Through Explainable and Interpretable AI methods, specifically mechanistic interpretability techniques, the internal circuit responsible for representing the action's outcome is reverse-engineered in a pre-trained video vision transformer, revealing that the "Success vs Failure" signal is computed through a distinct amplification cascade. While there are low-level differences observed from layer 0, the abstract and semantic representation of the outcome is progressively amplified from layers 5 through 11. Causal analysis, primarily using activation patching supported by ablation results, reveals a clear division of labor: Attention Heads act as "evidence gatherers", providing necessary low-level information for partial signal recovery, while MLP Blocks function as robust "concept composers", each of which is sufficient to generate the entire "success" signal. This distributed and redundant circuit in the model's internals explains its resilience to simple ablations, demonstrating a core computational pattern for processing human-action outcomes. Crucially, the existence of this sophisticated circuit for representing complex outcomes, even within a model trained only for simple classification, highlights the potential for models to develop forms of 'hidden knowledge' beyond their explicit task, underscoring the need for mechanistic oversight for building genuinely Explainable and Trustworthy AI systems intended for deployment.

Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViT

Semantic representations of rhythmic structures are important for AI-driven music generation and choreography. South Asian classical dance, such as Bharatanatyam, relies on intricate rhythms that guide choreography and improvisation. These rhythms are expressed through Nattuvangam, a vocal and percussive form that uses rhythmic syllables (Solkattus) and cymbal cues (Talam). Despite its pedagogical importance, Nattuvangam is rarely documented in digital form, which limits systematic study and teaching. We present the first curated dataset of Nattuvangam recordings that capture diverse Solkattu patterns and cyclic Talam structures. Each clip is analyzed using handcrafted and learned features, including onset envelopes, inter-onset intervals, tempograms, and Mel-spectrogram embeddings. These representations allow machine learning models to identify, cluster, and retrieve rhythmic motifs across performances. The dataset serves as a pedagogical tool and supports computational exploration of Solkattu patterns in relation to Talam, revealing the structural principles underlying Nattuvangam. This work establishes a foundation for studying Nattuvangam as both a standalone and performative art form, bridging cultural teaching with AI-based rhythm analysis in low-resource contexts.

Low-Resource Rhythm Learning of South Asian Beat Structures: Machine Learning Approaches to Nattuvangam

Sleep disorders, particularly insomnia, and mental health conditions affect a significant fraction of adults worldwide, posing serious mental and physical health risk. Music therapy offers promising, low-cost, and non-invasive treatment, but current approaches rely heavily on expert-curated playlists, limiting scalability and personalization. We propose a low-cost generative system leveraging recent advances in diffusion models to synthesize music for therapy. We focus on insomnia and curate a dataset of waveform sleep music to generate audio tailored to sleep. To ensure real-world feasibility, we optimize our system for training and use on a single GPU, balancing quality and efficiency through extensive ablation studies. We show through subjective human evaluations that our generated music matches or outperforms existing baselines in both perceived quality and relevance to sleep therapy, while using only a fraction of the computational cost.

A Novel Diffusion Model Based Approach for Sleep Therapeutic Music Generation

Neural codec language models have revolutionized speech synthesis but face significant challenges when adapted to music generation, particularly in achieving precise timbre con- trol while preserving melodic content. We introduce Neural Code Language Model for Controllable Timbre Transfer (NCLMCTT), a novel architecture that enables zero-shot instrument cloning through direct audio conditioning without explicit timbre learning. Our approach combines a 385M-parameter transformer for coarse musical structure modeling with a specialized upsampler for fine timbral detail, achieving flexible control through 1-5 second reference audio segments. We establish the first comprehensive benchmark dataset for controllable timbre transfer evaluation, comprising 62,500 high-fidelity samples across 50 synthesizer presets with ground truth targets. Extensive experiments demonstrate sub- stantial improvements over the TokenSynth baseline: 27.1% reduction in SI-SDR, 50.9% in Mel Distance, and 59.4% in STFT Distance, while maintaining strong melodic coher- ence (Chroma Similarity: 0.85). Our method achieves robust zero-shot generalization, with performance on unseen instrument presets matching that of seen presets. Ablation stud- ies confirm that extended reference audio duration (40.8% improvement), cross-attention mechanisms (11.9% improvement), and increased model capacity contribute meaningfully to overall performance. By separating melodic content from timbral characteristics and enabling implicit timbre control, NCLMCTT provides both immediate practical value for music creators and a methodological foundation for advancing controllable neural audio synthesis.

Neural Codec Language Model for Controllable Timbre Transfer in Music Synthesis

With recent advances in automatic speech recognition (ASR), large language models (LLMs), and text-to-speech (TTS) technologies, spoken dialogue systems (SDS) have become widely accessible. However, most existing SDS are limited to conventional spoken responses. We present SingingSDS, a cascaded SDS that responds through singing rather than speaking, fostering more affective, memorable, and pleasurable interactions in character-based roleplay and interactive entertainment scenarios. SingingSDS employs a modular ASR-LLM-SVS pipeline and supports a wide range of configurations across character personas, ASR and LLM backends, SVS models, melody sources, and voice profiles, tailored to different needs in terms of latency, quality, and musical style. SingingSDS is available as a plug-and-play web demo, featuring modular, open-source code that supports customization and extension. Demo: https://huggingface.co/spaces/espnet/SingingSDS. Code: https://github.com/SingingSDS/SingingSDS. Video: https://youtube.com/playlist?list=PLZpUJJbwp2WvtPBenG5D3h09qKIrt24ui&si=7CSLWAYWcfkTEdqe.

Premium content

Next from AAAI 2026

A Task-Level Explanation Framework for Meta-Learning Algorithms

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES