Singapore

Nature is inherently structured! The entities in the real world are naturally organized in rich relationships. For example, dolphins and sharks, despite their striking visual resemblance in body shape and fins, are actually from entirely different branches of the animal hierarchy, i.e., mammals and fishes, respectively. This remarkable similarity is a prime example of ‘convergent evolution’, where unrelated species develop similar features because they face similar environmental challenges. This illustrates how nature’s underlying organization often transcends superficial visual resemblances. Although humans intuitively grasp and utilize these profound natural constraints, they are typically underutilized in most AI systems. As a result, trained AI models tend to align with
statistical patterns in the data, such as sampling biases or class imbalance, rather than adhering to the underlying relational consistency. This thesis argues that AI systems must evolve beyond learning “flat” feature representations, which are domain-agnostic and derived purely from data correlations, to “explicitly model the domain-specific structural relationships”. A key benefit of encoding relational priors in the learning process is that it can inject domain knowledge as an inductive bias, leading to more robust and reliable models. My research investigates incorporating domain knowledge by leveraging “graph-based structural priors” that explicitly model relational constraints in various visual recognition tasks. This work spans three distinct dimensions of visual recognition, progressing from coarse-level (image-level) to fine-grained (scene-level) understanding. My research highlights a crucial limitation in existing AI models: they often fail to incorporate real-world constraints, leading to significant errors. I show that even powerful, pre-trained neural networks can make severe mistakes due to a lack of domain knowledge. I argue that
standard metrics like top-1 accuracy, precision, and recall are insufficient for evaluating model robustness, and propose a new metric based on rank order of the predictions as a better indicator of reliability. The benchmark on various large-scale datasets confirms that existing solutions do not sufficiently capture the domain knowledge, which is often available as a taxonomy tree, motivating our design of better learning frameworks. I also examine complex visual re-identification (Re-ID) tasks, such as monitoring animals in the wild. I find that existing foundational models struggle with new species and environments. This challenge is compounded by the high cost of manual annotation for adapting these systems to new settings. While existing unsupervised learning methods can help reduce the need for extensive labeling, they often suffer from under- and over-segmentation errors, which led me to develop more effective active learning strategies. Finally, I address the limitations of the classic Kalman filter, a widely used tool for dynamic systems. I point out that this filter makes a flawed assumption that the movement of each individual object is independent of its
dynamic surroundings. In the real world, this is rarely the case. I demonstrate the need for a new filtering mechanism that not only considers an object’s past movements but also its spatial relationship with other dynamic entities in its environment. In my analysis, I observed the vision foundation models for all recognition tasks, i.e., classification, detection, and segmentation, lack the domain knowledge. I believe that our learning framework, which was designed specifically for classification, can be adapted for other recognition tasks. I speculate that a unified learning framework can be designed that can be leveraged for making vision foundation models aware of the available taxonomy.

AAAI 2026

Exploiting Graph-Based Structural Priors for Visual Recognition

visual re-identification

dynamical modeling

hierarchical feature representation learning

visual recognition

multi-object tracking

active learning

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

My research aims to pioneer efficient and reliable wearable
intelligence algorithms that transform healthcare robotics
into adaptive, patient-centered systems.
I take a four-step approach: (1) design multimodal wearable
sensing platforms to capture human and biometric signals;
(2) train a foundation model that learns from these rich
datasets to reason about human behaviors and health states;
(3) validate the model through large-scale simulation and
principled uncertainty quantification; and (4) deploy it in
rehabilitation and assistive robots for intelligent,
personalized care.
This research not only advances fundamental understanding
of multimodal human behavior, but also opens new pathways
for early disease diagnosis, adaptive treatment, and
accessible digital health.
By bridging AI, wearables, and robotics, my work aspires to
lay the groundwork for the next generation of healthcare
technologies that are proactive, trustworthy, and deeply
aligned with human well-being.

Wearable Intelligence for Healthcare Robotics: From Brain Activity to Body Movements

Deep neural networks (DNNs) have revolutionized machine
learning, driving breakthroughs from image classification
to autonomous vehicles. However, a critical flaw undermines
their reliability. Most DNNs operate under the unrealistic
closed-set assumption that all potential classes have been
encountered during training. This ignores the inevitability
of outliers in real-world scenarios. In safety-critical
domains like autonomous driving, this oversight can have
dire, irreversible consequences. DNNs may confidently
misclassify unknown outlier inputs as familiar classes.
Addressing this vulnerability is essential for public trust
and the adoption of Artificial Intelligence (AI) in
high-stakes environments. Out-of-distribution (OOD)
detection has therefore emerged as a linchpin for the safe
and dependable deployment of intelligent systems. This
thesis tackles the urgent need for robust OOD detection. It
presents three innovative contributions that elevate the
field and set new standards for reliability and safety
across real-world contexts.

First, we confront the common yet unrealistic
dataset-dependent OOD detection splitting definition that
one labeled dataset is in-distribution (ID) and all the
unlabeled datasets are OOD, under the impractical
assumption that training data is clean, balanced.We
introduce two novel frameworks to handle these
complexities. Most real-world applications follow the
semantically coherent OOD detection splitting definition,
where some ID samples appear in these unlabeled datasets.
The Adaptive Hierarchical Graph Cut (AHGC) network resolves
multi-granularity label discrepancies between labeled and
unlabeled datasets. It effectively identifies semantically
coherent OOD samples that other methods misclassify.
Complementing this, the Uncertainty-aware Adaptive Semantic
Alignment (UASA) network tackles cross-domain and
class-imbalanced data. It pioneers a prototype-based
alignment strategy that closes the domain gap and is robust
to imbalanced classes, addressing OOD detection and ID
classification in the unlabeled target domain.

Second, we address the significant practical limitation of
data scarcity by venturing into the domain of few-shot OOD
detection. Recognizing that most existing methods require
extensive labeled in-distribution data, we developed the
Adaptive Multi-prompt Contrastive Network (AMCN). This
model uniquely leverages the power of large-scale
vision-language models (CLIP) to generate adaptive textual
prompts for both in-distribution and out-of-distribution
classes. By learning a discriminative class boundary from
only a handful of samples, AMCN effectively compensates for
the scarcity of training data and corresponding labels,
marking a significant step towards data-efficient OOD
detection.

Third, we extend the scope of OOD detection from static
images to dynamic, complex video scenarios. We introduce
the novel task of OOD Action Detection (ODAD) in untrimmed
videos. To solve this, we propose the Uncertainty-Guided
Appearance-Motion Association Network (UAAN). This approach
reasons over spatial-temporal inter-object interactions by
synergistically modeling appearance and motion features. It
allows for the simultaneous localization and identification
of both known (in-distribution) and unknown
(out-of-distribution) actions. This is a critical
capability for safety-critical applications like autonomous
driving.

Collectively, these contributions redefine the landscape of
OOD detection across four pivotal dimensions: semantic
granularity understanding, cross-domain robustness, data
efficiency, and temporal dynamics modeling. The
methodologies introduced not only surpass existing
benchmarks but also prove their value in diverse,
real-world settings where robustness is non-negotiable.

Advancing Out-of-Distribution Detection Across Diverse Scenarios

Recently, deep Reinforcement Learning (RL) methods have
been widely used in labor management within transportation
gig markets, such as ride-hailing, food delivery, and
express delivery. Compared to traditional rule-based and
optimization-based methods, RL can capture more information
about long-term uncertainty and environmental dynamics,
leading to better and non-myopic strategies. However, deep
learning methods have long been criticized for their low
interpretability, raising concerns about algorithmic
discrimination in gig markets. Currently, most works focus
on this issue from the perspective of statistical analysis
and surveys. However, the underlying reasons related to the
algorithms remain unclear, as most companies do not
disclose their algorithms. This lack of transparency can
hinder governments from designing efficient management
policies to address these problems. To fill this research
gap, this thesis proposal aims to develop appropriate RL
methods to mimic the labor management behavior of
transportation gig platforms and to propose effective
policies that protect the rights of gig workers.

Towards Fairness in Transportation Gig Markets: Identifying, Imitating, and Mitigating Algorithm Discrimination via Deep Reinforcement Learning

AI systems often fail on challenging or out-of-distribution inputs—a critical limitation in domains such as healthcare, finance, and autonomous driving. Learning to Defer (L2D) addresses this by training models not only to predict but also to decide when to defer to external experts. This thesis develops a unified and robust framework for L2D that advances its theoretical foundations, reliability, and applicability. It characterizes Bayes-optimal routing policies, establishes surrogate-consistency guarantees, and introduces a unified adversarial framework for attacking and defending L2D with Bayes-optimal robustness. It further proposes the first top-k deferral methods in both two-stage and one-stage settings. Empirical studies validate these ideas in multi-task learning and extractive question answering with large language models. Ongoing work explores token-level routing in LLMs, online adaptation with dynamic experts, and partial deferral.

Towards Robust Human–AI Decision-Making via Learning-to-Defer

While Reinforcement Learning (RL) has demonstrated remarkable success in solving complex sequential decision-making problems, its application in real-world, safety-critical systems is hindered by its reliance on carefully engineered reward functions. Designing effective rewards is notoriously challenging and can lead to unintended or unsafe behaviors — a phenomenon known as reward hacking. Specification-guided RL has emerged as a principled alternative, leveraging formal methods to directly encode high-level objectives, safety requirements, and behavioral constraints. However, the practical utility of this approach is often limited by coarse or under-specified logical formulas and the computational challenge of enforcing safety at scale. This thesis addresses these limitations by developing a unified framework for the automated refinement, scalable enforcement, and flexible adaptation of formal specifications in RL.

Specification-Guided Reinforcement Learning

Molecular conformations, the stable three-dimensional structures corresponding to local minima on the potential energy surface, govern key molecular properties and consequently underpin a wide range of downstream tasks. However, contemporary learning-based methods often lack scalability, interpretability, and robustness, thereby significantly constraining their practical effectiveness and reliability. In this context, I will introduce my ongoing explorations and the proposed research plan to address these challenges, with the ultimate objective of developing conformation‑centric universal foundation models to accelerate scientific discovery.

Improve Molecular Conformation Modeling with Geometric Deep Learning

Deep learning models offer state-of-the-art performance but
their inherent opacity is a major barrier to adoption in
high-stakes domains. In contrast, Takagi-Sugeno-Kang (TSK)
fuzzy systems provide rule-based transparency but often
lack the predictive power of deep networks. My PhD research
addresses this critical trade-off by developing the
Fuzzy-Modulated Linear Consequents (FMLC) framework, a
novel hybrid architecture that synergizes these two
paradigms. The core of FMLC is a deep neural network that
processes fuzzified input features to generate
context-dependent "modulators". These modulators
dynamically parameterize a TSK-style linear consequent
layer, creating a model that is both highly performant and
inherently interpretable. My latest work, Learnable-FMLC
(L-FMLC), advances this by introducing a regularized,
adaptive fuzzification layer that autonomously learns the
optimal fuzzy partitions from data, and a two-stage rule
distillation framework to ensure interpretability remains
scalable in high-dimensional problems. This research
delivers a validated, theoretically-grounded, and scalable
framework, contributing a significant step towards
transparent and trustworthy AI.

Fusing Deep Learning and Fuzzy Logic: A Framework for Adaptive and Scalable Interpretability

Autonomous driving must cope with motion blur, low light,
and dynamic agents, where RGB frames and event cameras
offer complementary strengths. This thesis investigates how
to fuse them across the perception–reasoning–planning
pipeline. It introduces FlexEvent, a frequency-robust
detector with adaptive fusion and label-efficient training;
Talk2Event, the first benchmark for event–language
grounding with attribute-aware modeling; and the ongoing
EventChat, an event–frame VLM for perception, spatial
relations, and ego reasoning. Future work will extend this
framework with iterative perception and reinforcement
learning for long-horizon decision making. Together, these
efforts aim to deliver robust perception, interpretable
reasoning, and planning support through event–frame fusion.

Towards Robust and Interpretable Event–Frame Fusion for Autonomous Driving

Higher autonomy is an increasingly common goal in the
design of transportation systems for the cities of the
future. Recently, part of this autonomy in both rail and
maritime transport has come from the field of artificial
intelligence and machine learning, particularly for
perception tasks (detection and recognition of rail
signals, other vessels, or other elements in the vehicle
environment) using neural networks. Although AI-based
approaches have gained significant popularity in many
application fields due to their good performance, their
unpredictability and lack of formal guarantees regarding
their desired behavior present a major issue for the
deployment of such safety-critical systems in urban areas.
The goal of my PhD thesis is to design new formal methods
to analyze and ensure the safety of such AI-based
perception modules in autonomous vehicles. More
specifically, my PhD topic aims to formally evaluate the
safety of a recently introduced class of continuous AI
models which is neural ODE.

Formal Verification of Neural ODE for Safety Evaluation in Autonomous Vehicles

Causal discovery is the task of learning a causal model from a source of information. Traditionally, the community has focused on algorithms that infer causal models from observational and/or interventional data, while alternative approaches have been only marginally explored. The proposed work aims to contribute to the theoretical foundations connecting agent-based systems with causal modeling, and to identify conditions under which newly developed causal discovery algorithms can be applied to elicit causal knowledge from agents.

Downloads

Next from AAAI 2026

Wearable Intelligence for Healthcare Robotics: From Brain Activity to Body Movements

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Wearable Intelligence for Healthcare Robotics: From Brain Activity to Body Movements

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads