Singapore

We present MemoVision, a digital catalog system that captures semantic, spatial, temporal and interaction information as users move around physical environments using client devices such as smart glasses. The system utilizes open-vocabulary semantic segmentation and 3D scans to store objects-of-interest with comprehensive semantic, spatial, temporal and interaction labels. Our demonstration shows multimodal information query and retrieval capabilities, supporting specific queries about object locations, temporal events and user interactions including eye gaze and hand poses, enabling more contextualized responses compared to current multimodal large language models.

AAAI 2026

MemoVision: A Digital Catalog for Everyday Interactions

same as before

demo

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Industrial automation increasingly relies on multi-agent AI, yet evaluation remains difficult due to task complexity and data confidentiality. We present AssetOpsBench-Live, a demo of a competition-ready platform for real-time, privacy-preserving evaluation of multi-agent AI in industrial contexts. The platform integrates AssetOpsBench, which measures six dimensions of multi-agent performance and performs automated failure-mode discovery, with Codabench, which supports reproducible, code-oriented competitions. End users first validate agents locally, then submit containerized code for execution on hidden industrial scenarios. Instead of raw trajectories, the system provides quantitative scores and clustered failure modes (e.g., reasoning--action mismatch, step repetition), enabling participants to identify failures, apply targeted improvements, and iteratively resubmit. By combining competition-based engagement with actionable diagnostics, AssetOpsBench-Live delivers reproducible, real-time insights reflecting real-world industrial constraints.

AssetOpsBench-Live: Privacy-Aware Online Evaluation of Multi-Agent Performance in Industrial Operations

Capturing expertise and enabling efficient information retrieval are critical in the energy sector, where high staff turnover can lead to significant knowledge loss. Retrieval Augmented Generation (RAG) offers a solution by grounding Large Language Model (LLM) outputs in documented sources, but its effectiveness is limited by reliance on general purpose embeddings. We present Wikatoni, an agentic AI system for energy engineering workflows that integrates a novel domain specific embedding model. Wikatoni combines fine tuned embeddings with agentic RAG, metadata filtering, and hybrid retrieval to improve document search, automated reporting, and workflow efficiency. Evaluation on internal offshore energy data shows that the domain adapted embedding improves recall by 10\%, and Wikatoni Agentic further increases answer accuracy by 14\% compared to vanilla RAG with the base embedding model, achieving the best overall performance in context recall, faithfulness, and answer accuracy.

Wikatoni: An Agentic AI System for Energy Engineering Workflows

Deploying AI on microcontrollers is challenging. We introduce MIMaaS, a Microcontroller-as-a-Service platform that enables users to upload a model, select a target device, and receive a detailed performance report. A key innovation is our measurement of real-world power consumption, alongside latency and memory usage, directly from the physical hardware. MIMaaS empowers researchers and developers to easily create and validate hardware-aware AI models without needing physical hardware access.

Elevating AI on the Edge: A Demonstration of MIMaaS (Machine Intelligence with Microcontroller-as-a-Service)

Navigating new indoor spaces and interacting with the environment presents many challenges for people who are blind or have low vision (BLV). To address these challenges, we prototyped a smartphone-based conversational assistant that helps BLV people navigate and interact with their environment. The prototype utilizes a cognitive architecture to integrate three different technologies: (i) augmented-reality spatial anchors for high-precision localization and access to static information about the environment; (ii) real-time object/people detection for information about the environment and obstacle avoidance; and (iii) a conversational agent}that uses large language models (LLMs) for information extraction, conversational interaction, and turn-by-turn navigation. We assess the impact of different technologies on human performance by measuring user task time and errors. We found that conversational interaction holistically integrates the different technologies to deliver a better user experience while significantly reducing task completion time.

Navigation and Interaction for Blind Users via a Cognitive Architecture

According to the United States Internal Revenue Service, “the average American spends $270 and 13 hours filing their taxes”. Even beyond the U.S., Tax filing requires complex reasoning, combining application of overlapping rules with numerical calculations. Because errors can incur costly penalties, any automated system must deliver high accuracy and auditability, making modern large language models (LLMs) poorly suited for this task. We propose an approach that integrates LLMs with a symbolic solver to calculate tax obligations. We evaluate variants of this system on the challenging StAtutory Reasoning Assessment (SARA) dataset, and include a novel method for estimating the cost of deploying such a system based on real-world penalties for tax errors. We further show how combining up-front translation of plain-text rules into formal logic programs, combined with intelligently retrieved exemplars for formal case representations, can dramatically improve performance on this task and reduce costs to well below real-world averages. Our results demonstrate the promise and economic feasibility of neuro-symbolic architectures for increasing equitable access to reliable tax assistance.

Language Models and Logic Programs for Trustworthy Tax Reasoning

Ensuring fairness in machine learning requires understanding how sensitive attributes like race or gender causally influence outcomes. Existing causal discovery (CD) methods often struggle to recover fairness-relevant pathways in the presence of noise, confounding, or data corruption. Large language models (LLMs) offer a complementary signal by leveraging semantic priors from variable metadata. We propose a hybrid LLM-guided CD framework that extends a breadth-first search strategy with active learning and dynamic scoring. Variable pairs are prioritized for querying using a composite score combining mutual information, partial correlation, and LLM confidence, enabling more efficient and robust structure discovery. To evaluate fairness sensitivity, we introduce a semi-synthetic benchmark based on the UCI Adult dataset, embedding domain-informed bias pathways alongside noise and latent confounders. We assess how well CD methods recover both global graph structure and fairness-critical paths (e.g., sex→education→income). Our results show that LLM-guided methods—including our active, dynamically scored variant—outperform baselines in recovering fairness-relevant structure under noisy conditions. We analyze when LLM-driven insights complement statistical dependencies and discuss implications for fairness auditing in high-stakes domains.

Uncovering Bias Paths with LLM-guided Causal Discovery: An Active Learning and Dynamic Scoring Approach

The use of Large Language Models (LLMs) in police opera- tions is growing, yet an evaluation framework tailored to po- lice operations remains absent. While LLM’s responses may not always be legally “incorrect”, their unverified use still can lead to severe issues such as unlawful arrests and improper evidence collection. To address this, we propose PAS (Po- lice Action Scenarios), a systematic framework covering the entire evaluation process. Applying this framework, we con- structed a novel QA dataset from over 8,000 official docu- ments and established key metrics validated through statis- tical analysis with police expert judgements. Experimental results show that commercial LLMs struggle with our new police-related tasks, particularly in providing fact-based rec- ommendations. This study highlights the necessity of an ex- pandable evaluation framework to ensure reliable AI-driven police operations. We release our data and prompt template.

Evaluating LLMs for Police Decision-Making: A Framework Based on Police Action Scenarios

Large language models (LLMs) have achieved remarkable success in many domains, but concerns about data quality and privacy are growing. Federated Learning (FL) offers a privacy-preserving solution by training a model on local clients without sharing data. However, the impact of biased private data on LLMs fine-tuned through FL remains understudied. This work investigates how client-side biased data affects the global model during federated fine-tuning of LLMs. We simulate realistic scenarios where some clients possess datasets containing social biases (stereotypes, discriminatory language) while others have clean data through extensive experiments with popular FL algorithms (FedAvg, FedAdam and FedProx) and popular LLMs (LLaMA, Mistral, Phi-3 and Gemma) across datasets with varying bias proportions (33\%, 66\%, 100\%). Our findings reveal that 1) FedAdam consistently shows the lowest bias propagation, reducing CrowS-Pairs scores by up to 15\% compared to FedAvg; 2) Even small amounts of biased data (33\%) can significantly influence global model bias; 3) Mixed biased and neutral data distributions lead to 5-7\% higher bias scores than segregated distributions. Additionally, we propose Bias-Aware Model Aggregation (BAMA), a novel debiasing method for federated fine-tuning that consistently reduces bias across various models and algorithms.

Investigating Social Bias Propagation in Federated Fine-tuning of Large Language Models

Graph neural networks (GNNs) are widely used in urban spatiotemporal forecasting, e.g., predicting infrastructure problems. In this setting, government officials aim to identify in which neighborhoods incidents like potholes or rodents occur. The true state of incidents is observed via government inspection *ratings*. However, these ratings are only conducted for a sparse set of neighborhoods and incident types. We also observe the state of incidents via crowdsourced *reports*, which are more densely observed but may be biased due to heterogeneous reporting. First, we propose a multiview, multioutput GNN-based model that uses both unbiased rating data and biased reporting data to predict the true latent state of incidents. Second, we investigate a case study of New York City urban incidents and collect a dataset of 9,615,863 crowdsourced reports and 1,041,415 government inspection ratings over 3 years and across 139 types of incidents. We show on both real and semi-synthetic data that our model can better predict the latent state compared to models that use only reporting data or only rating data. Finally, we quantify demographic biases in crowdsourced reporting, e.g., higher-income neighborhoods report problems at higher rates. Our analysis showcases a widely applicable approach for latent state prediction using heterogeneous, sparse, and biased data.

Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports

People who stutter (PWS) face systemic exclusion in today’s voice-driven society, where access to voice assistants, authentication systems, and remote work tools increasingly depends on fluent speech. Current automatic speech recognition (ASR) systems, trained predominantly on fluent speech, fail to serve millions of PWS worldwide. We present STEAMROLLER, a real time system that transforms stuttered speech into fluent output through a novel multi-stage, multi-agent AI pipeline. Our approach addresses three critical technical challenges: (1) the difficulty of direct speech to speech conversion for disfluent input, (2) semantic distortions introduced during ASR transcription of stuttered speech, and (3) latency constraints for real time communication. STEAMROLLER employs a three stage architecture comprising ASR transcription, multi-agent text repair, and speech synthesis, where our core innovation lies in a collaborative multi-agent framework that iteratively refines transcripts while preserving semantic intent. Experiments on the FluencyBank dataset and a user study demonstrates clear word error rate (WER) reduction and strong user satisfaction. Beyond immediate accessibility benefits, fine tuning ASR on STEAMROLLER repaired speech further yields additional WER improvements, creating a pathway toward inclusive AI ecosystems.

Downloads

Next from AAAI 2026

AssetOpsBench-Live: Privacy-Aware Online Evaluation of Multi-Agent Performance in Industrial Operations

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES