Singapore

Event detection is essential for surveillance, particularly in retail loss prevention where accurate, timely monitoring is critical. Large vision–language models (VLMs) provide strong generalization but are inefficient on video streams and prone to hallucinations from redundant frames. We present \textbf{SmartEyes}, a plug-and-play system for real-time retail surveillance. SmartEyes introduces \textbf{Perception–Cognition Focusing (PCF)}, which combines lightweight perception with semantic triggering to isolate two keyframes—customer contact and departure—and constrain the VLM to a focused differencing task. This design reduces hallucination while enabling efficient reasoning. Our demo features a SAM-powered ROI interface and live CCTV monitoring, achieving accurate alerts within 1–2 seconds on a single RTX 4080 GPU.

AAAI 2026

SmartEyes: Plug-and-Play Event Detection for Retail Loss Prevention

behavior monitoring

real-time

demo

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

We present DS SERVE, a framework that transforms large-scale text datasets—comprising half a trillion tokens—into a high-performance neural retrieval system. DS SERVE offers both a web interface and API endpoints, achieving low latency with modest memory overhead on a single node. The framework also supports inference-time tradeoffs between latency, accuracy, and result diversity. We anticipate that DS SERVE will be broadly useful for a range of applications such as large-scale retrieval-augmented generation (RAG), training data attribution, training a search agent, and beyond.

DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Large Language Models (LLMs) have revolutionized the simulation of agent societies, enabling autonomous planning, memory formation, and social interactions. However, existing frameworks often overlook systematic evaluations for event organization and lack visualized integration with physically grounded environments, limiting agents' ability to navigate spaces and interact with items realistically. We develop MiniAgentPro, a visualization platform featuring an intuitive map editor for customizing environments and a simulation player with smooth animations. Based on this tool, we introduce a comprehensive test set comprising eight diverse event scenarios with basic and hard variants to assess agents' ability. Evaluations using GPT-4o demonstrate strong performance in basic settings but highlight coordination challenges in hard variants.

A Visualized Framework for Event Cooperation with Generative Agents

We present AutoTuneX, a system architecture design and implementation for users to interactively fine-tune large language models (LLMs) based on automated hyperparameter optimization particularly built around Bandit Limited Discrepancy Search (Kishimoto et al. 2022). Next to a classical Graphical User Interface (GUI) our system features an agentic runtime to facilitate automated fine-tuning via chat.

AutoTuneX: Interactive Automated Fine-Tuning for Large Language Models

The dynamic nature of cloud spending and pricing structures
pose challenges for practitioners in IT Financial Operations
(FinOps). Recent advances in agentic systems enables them
to instead rely on agents for complex FinOps tasks such as
drawing insights from their data through natural language
queries. In this work, we present an IT FinOps Data Insights
Agent, that implements “chat with your data” approach to
support practitioners in their daily tasks. Our agent achieves
up to 90% accuracy across ITBench FinOps scenarios.

Agentic Solutions for IT Financial Operations

Accurate citation is critical, yet error rates remain high across scientific literature. We present RefLens, an end-to-end system that automates citation verification from PDF parsing to interactive report generation. Unlike summary- or embedding-based approaches, RefLens performs evidence-grounded verification by extracting verbatim spans from original sources and displaying citation-level cards and a paper-level dashboard. In a 35-participant study, users rated value (M=4.34), trust (M=4.15), and usability (M=4.19) highly, with strong adoption intention (M=4.28).

RefLens: End-to-End Evidence-Grounded Citation Verification with LLM Agents

In this paper, we present the development of an automated visual inspection system for detecting defects on the upper airframe surface. The system employs a multi-camera PTZ system to synchronously capture and process images at designated regions. Developed software handles path planning and camera localization, while a hybrid-AI framework is incorporated to detect various defect-types, such as hairline cracks, loose screws and bird strike damage. The demonstration showcases the detection capabilities and prototype functionalities on a large aircraft model, furnished with a user interface to run system features and visualize results. To support this work, performance testing is provided against relevant models.

Automated Multi-Camera Inspection System for Aircraft

The demo presents a tool that visualizes acting of planning agents in dynamic environments that might be modified by ``acts of nature'', The purpose of this tool is to better understand the behavior of the agent, debug agent's behavior, and for making the underlying planning concepts accessible to wider audience.

PANSim: Visualization Tool for Planning and Acting against Nature

Nature is inherently structured! The entities in the real world are naturally organized in rich relationships. For example, dolphins and sharks, despite their striking visual resemblance in body shape and fins, are actually from entirely different branches of the animal hierarchy, i.e., mammals and fishes, respectively. This remarkable similarity is a prime example of ‘convergent evolution’, where unrelated species develop similar features because they face similar environmental challenges. This illustrates how nature’s underlying organization often transcends superficial visual resemblances. Although humans intuitively grasp and utilize these profound natural constraints, they are typically underutilized in most AI systems. As a result, trained AI models tend to align with
statistical patterns in the data, such as sampling biases or class imbalance, rather than adhering to the underlying relational consistency. This thesis argues that AI systems must evolve beyond learning “flat” feature representations, which are domain-agnostic and derived purely from data correlations, to “explicitly model the domain-specific structural relationships”. A key benefit of encoding relational priors in the learning process is that it can inject domain knowledge as an inductive bias, leading to more robust and reliable models. My research investigates incorporating domain knowledge by leveraging “graph-based structural priors” that explicitly model relational constraints in various visual recognition tasks. This work spans three distinct dimensions of visual recognition, progressing from coarse-level (image-level) to fine-grained (scene-level) understanding. My research highlights a crucial limitation in existing AI models: they often fail to incorporate real-world constraints, leading to significant errors. I show that even powerful, pre-trained neural networks can make severe mistakes due to a lack of domain knowledge. I argue that
standard metrics like top-1 accuracy, precision, and recall are insufficient for evaluating model robustness, and propose a new metric based on rank order of the predictions as a better indicator of reliability. The benchmark on various large-scale datasets confirms that existing solutions do not sufficiently capture the domain knowledge, which is often available as a taxonomy tree, motivating our design of better learning frameworks. I also examine complex visual re-identification (Re-ID) tasks, such as monitoring animals in the wild. I find that existing foundational models struggle with new species and environments. This challenge is compounded by the high cost of manual annotation for adapting these systems to new settings. While existing unsupervised learning methods can help reduce the need for extensive labeling, they often suffer from under- and over-segmentation errors, which led me to develop more effective active learning strategies. Finally, I address the limitations of the classic Kalman filter, a widely used tool for dynamic systems. I point out that this filter makes a flawed assumption that the movement of each individual object is independent of its
dynamic surroundings. In the real world, this is rarely the case. I demonstrate the need for a new filtering mechanism that not only considers an object’s past movements but also its spatial relationship with other dynamic entities in its environment. In my analysis, I observed the vision foundation models for all recognition tasks, i.e., classification, detection, and segmentation, lack the domain knowledge. I believe that our learning framework, which was designed specifically for classification, can be adapted for other recognition tasks. I speculate that a unified learning framework can be designed that can be leveraged for making vision foundation models aware of the available taxonomy.

Exploiting Graph-Based Structural Priors for Visual Recognition

My research aims to pioneer efficient and reliable wearable
intelligence algorithms that transform healthcare robotics
into adaptive, patient-centered systems.
I take a four-step approach: (1) design multimodal wearable
sensing platforms to capture human and biometric signals;
(2) train a foundation model that learns from these rich
datasets to reason about human behaviors and health states;
(3) validate the model through large-scale simulation and
principled uncertainty quantification; and (4) deploy it in
rehabilitation and assistive robots for intelligent,
personalized care.
This research not only advances fundamental understanding
of multimodal human behavior, but also opens new pathways
for early disease diagnosis, adaptive treatment, and
accessible digital health.
By bridging AI, wearables, and robotics, my work aspires to
lay the groundwork for the next generation of healthcare
technologies that are proactive, trustworthy, and deeply
aligned with human well-being.

Wearable Intelligence for Healthcare Robotics: From Brain Activity to Body Movements

Deep neural networks (DNNs) have revolutionized machine
learning, driving breakthroughs from image classification
to autonomous vehicles. However, a critical flaw undermines
their reliability. Most DNNs operate under the unrealistic
closed-set assumption that all potential classes have been
encountered during training. This ignores the inevitability
of outliers in real-world scenarios. In safety-critical
domains like autonomous driving, this oversight can have
dire, irreversible consequences. DNNs may confidently
misclassify unknown outlier inputs as familiar classes.
Addressing this vulnerability is essential for public trust
and the adoption of Artificial Intelligence (AI) in
high-stakes environments. Out-of-distribution (OOD)
detection has therefore emerged as a linchpin for the safe
and dependable deployment of intelligent systems. This
thesis tackles the urgent need for robust OOD detection. It
presents three innovative contributions that elevate the
field and set new standards for reliability and safety
across real-world contexts.

First, we confront the common yet unrealistic
dataset-dependent OOD detection splitting definition that
one labeled dataset is in-distribution (ID) and all the
unlabeled datasets are OOD, under the impractical
assumption that training data is clean, balanced.We
introduce two novel frameworks to handle these
complexities. Most real-world applications follow the
semantically coherent OOD detection splitting definition,
where some ID samples appear in these unlabeled datasets.
The Adaptive Hierarchical Graph Cut (AHGC) network resolves
multi-granularity label discrepancies between labeled and
unlabeled datasets. It effectively identifies semantically
coherent OOD samples that other methods misclassify.
Complementing this, the Uncertainty-aware Adaptive Semantic
Alignment (UASA) network tackles cross-domain and
class-imbalanced data. It pioneers a prototype-based
alignment strategy that closes the domain gap and is robust
to imbalanced classes, addressing OOD detection and ID
classification in the unlabeled target domain.

Second, we address the significant practical limitation of
data scarcity by venturing into the domain of few-shot OOD
detection. Recognizing that most existing methods require
extensive labeled in-distribution data, we developed the
Adaptive Multi-prompt Contrastive Network (AMCN). This
model uniquely leverages the power of large-scale
vision-language models (CLIP) to generate adaptive textual
prompts for both in-distribution and out-of-distribution
classes. By learning a discriminative class boundary from
only a handful of samples, AMCN effectively compensates for
the scarcity of training data and corresponding labels,
marking a significant step towards data-efficient OOD
detection.

Third, we extend the scope of OOD detection from static
images to dynamic, complex video scenarios. We introduce
the novel task of OOD Action Detection (ODAD) in untrimmed
videos. To solve this, we propose the Uncertainty-Guided
Appearance-Motion Association Network (UAAN). This approach
reasons over spatial-temporal inter-object interactions by
synergistically modeling appearance and motion features. It
allows for the simultaneous localization and identification
of both known (in-distribution) and unknown
(out-of-distribution) actions. This is a critical
capability for safety-critical applications like autonomous
driving.

Collectively, these contributions redefine the landscape of
OOD detection across four pivotal dimensions: semantic
granularity understanding, cross-domain robustness, data
efficiency, and temporal dynamics modeling. The
methodologies introduced not only surpass existing
benchmarks but also prove their value in diverse,
real-world settings where robustness is non-negotiable.

Downloads

Next from AAAI 2026

DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES