Singapore

Scientific research articles, typically distributed in PDF format, contain valuable knowledge but remain challenging to convert into structured datasets due to fragmented workflows that separate parsing, annotation, and visualization. Existing annotation platforms operate on plain text, which requires an additional PDF-to-text conversion step before annotation, while PDF parsing tools lack automated annotation suggestions. To bridge this gap, we introduce Docora, a system that unifies PDF parsing, automated annotation assistance, and multi-view visualization into a single interactive platform. Docora enables researchers to configure entity and relation schemas for any domain, automatically generates initial annotations using rule-based, model-based, or LLM-based extractors, and provides synchronized visualizations across PDF, text, and graph views. Users can refine annotations directly on the PDF canvas, ensuring consistency between document layout and structured representations. The system’s source code is publicly available to facilitate further research and development.

AAAI 2026

Docora: A System for Interactive Knowledge Extraction and Visualization from Scientific PDFs

automated annotation assistance

pdf parsing

information extraction

demo

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Event detection is essential for surveillance, particularly in retail loss prevention where accurate, timely monitoring is critical. Large vision–language models (VLMs) provide strong generalization but are inefficient on video streams and prone to hallucinations from redundant frames. We present \textbf{SmartEyes}, a plug-and-play system for real-time retail surveillance. SmartEyes introduces \textbf{Perception–Cognition Focusing (PCF)}, which combines lightweight perception with semantic triggering to isolate two keyframes—customer contact and departure—and constrain the VLM to a focused differencing task. This design reduces hallucination while enabling efficient reasoning. Our demo features a SAM-powered ROI interface and live CCTV monitoring, achieving accurate alerts within 1–2 seconds on a single RTX 4080 GPU.

SmartEyes: Plug-and-Play Event Detection for Retail Loss Prevention

We present DS SERVE, a framework that transforms large-scale text datasets—comprising half a trillion tokens—into a high-performance neural retrieval system. DS SERVE offers both a web interface and API endpoints, achieving low latency with modest memory overhead on a single node. The framework also supports inference-time tradeoffs between latency, accuracy, and result diversity. We anticipate that DS SERVE will be broadly useful for a range of applications such as large-scale retrieval-augmented generation (RAG), training data attribution, training a search agent, and beyond.

DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Large Language Models (LLMs) have revolutionized the simulation of agent societies, enabling autonomous planning, memory formation, and social interactions. However, existing frameworks often overlook systematic evaluations for event organization and lack visualized integration with physically grounded environments, limiting agents' ability to navigate spaces and interact with items realistically. We develop MiniAgentPro, a visualization platform featuring an intuitive map editor for customizing environments and a simulation player with smooth animations. Based on this tool, we introduce a comprehensive test set comprising eight diverse event scenarios with basic and hard variants to assess agents' ability. Evaluations using GPT-4o demonstrate strong performance in basic settings but highlight coordination challenges in hard variants.

A Visualized Framework for Event Cooperation with Generative Agents

We present AutoTuneX, a system architecture design and implementation for users to interactively fine-tune large language models (LLMs) based on automated hyperparameter optimization particularly built around Bandit Limited Discrepancy Search (Kishimoto et al. 2022). Next to a classical Graphical User Interface (GUI) our system features an agentic runtime to facilitate automated fine-tuning via chat.

AutoTuneX: Interactive Automated Fine-Tuning for Large Language Models

The dynamic nature of cloud spending and pricing structures
pose challenges for practitioners in IT Financial Operations
(FinOps). Recent advances in agentic systems enables them
to instead rely on agents for complex FinOps tasks such as
drawing insights from their data through natural language
queries. In this work, we present an IT FinOps Data Insights
Agent, that implements “chat with your data” approach to
support practitioners in their daily tasks. Our agent achieves
up to 90% accuracy across ITBench FinOps scenarios.

Agentic Solutions for IT Financial Operations

Accurate citation is critical, yet error rates remain high across scientific literature. We present RefLens, an end-to-end system that automates citation verification from PDF parsing to interactive report generation. Unlike summary- or embedding-based approaches, RefLens performs evidence-grounded verification by extracting verbatim spans from original sources and displaying citation-level cards and a paper-level dashboard. In a 35-participant study, users rated value (M=4.34), trust (M=4.15), and usability (M=4.19) highly, with strong adoption intention (M=4.28).

RefLens: End-to-End Evidence-Grounded Citation Verification with LLM Agents

In this paper, we present the development of an automated visual inspection system for detecting defects on the upper airframe surface. The system employs a multi-camera PTZ system to synchronously capture and process images at designated regions. Developed software handles path planning and camera localization, while a hybrid-AI framework is incorporated to detect various defect-types, such as hairline cracks, loose screws and bird strike damage. The demonstration showcases the detection capabilities and prototype functionalities on a large aircraft model, furnished with a user interface to run system features and visualize results. To support this work, performance testing is provided against relevant models.

Automated Multi-Camera Inspection System for Aircraft

The demo presents a tool that visualizes acting of planning agents in dynamic environments that might be modified by ``acts of nature'', The purpose of this tool is to better understand the behavior of the agent, debug agent's behavior, and for making the underlying planning concepts accessible to wider audience.

PANSim: Visualization Tool for Planning and Acting against Nature

Nature is inherently structured! The entities in the real world are naturally organized in rich relationships. For example, dolphins and sharks, despite their striking visual resemblance in body shape and fins, are actually from entirely different branches of the animal hierarchy, i.e., mammals and fishes, respectively. This remarkable similarity is a prime example of ‘convergent evolution’, where unrelated species develop similar features because they face similar environmental challenges. This illustrates how nature’s underlying organization often transcends superficial visual resemblances. Although humans intuitively grasp and utilize these profound natural constraints, they are typically underutilized in most AI systems. As a result, trained AI models tend to align with
statistical patterns in the data, such as sampling biases or class imbalance, rather than adhering to the underlying relational consistency. This thesis argues that AI systems must evolve beyond learning “flat” feature representations, which are domain-agnostic and derived purely from data correlations, to “explicitly model the domain-specific structural relationships”. A key benefit of encoding relational priors in the learning process is that it can inject domain knowledge as an inductive bias, leading to more robust and reliable models. My research investigates incorporating domain knowledge by leveraging “graph-based structural priors” that explicitly model relational constraints in various visual recognition tasks. This work spans three distinct dimensions of visual recognition, progressing from coarse-level (image-level) to fine-grained (scene-level) understanding. My research highlights a crucial limitation in existing AI models: they often fail to incorporate real-world constraints, leading to significant errors. I show that even powerful, pre-trained neural networks can make severe mistakes due to a lack of domain knowledge. I argue that
standard metrics like top-1 accuracy, precision, and recall are insufficient for evaluating model robustness, and propose a new metric based on rank order of the predictions as a better indicator of reliability. The benchmark on various large-scale datasets confirms that existing solutions do not sufficiently capture the domain knowledge, which is often available as a taxonomy tree, motivating our design of better learning frameworks. I also examine complex visual re-identification (Re-ID) tasks, such as monitoring animals in the wild. I find that existing foundational models struggle with new species and environments. This challenge is compounded by the high cost of manual annotation for adapting these systems to new settings. While existing unsupervised learning methods can help reduce the need for extensive labeling, they often suffer from under- and over-segmentation errors, which led me to develop more effective active learning strategies. Finally, I address the limitations of the classic Kalman filter, a widely used tool for dynamic systems. I point out that this filter makes a flawed assumption that the movement of each individual object is independent of its
dynamic surroundings. In the real world, this is rarely the case. I demonstrate the need for a new filtering mechanism that not only considers an object’s past movements but also its spatial relationship with other dynamic entities in its environment. In my analysis, I observed the vision foundation models for all recognition tasks, i.e., classification, detection, and segmentation, lack the domain knowledge. I believe that our learning framework, which was designed specifically for classification, can be adapted for other recognition tasks. I speculate that a unified learning framework can be designed that can be leveraged for making vision foundation models aware of the available taxonomy.

Exploiting Graph-Based Structural Priors for Visual Recognition

My research aims to pioneer efficient and reliable wearable
intelligence algorithms that transform healthcare robotics
into adaptive, patient-centered systems.
I take a four-step approach: (1) design multimodal wearable
sensing platforms to capture human and biometric signals;
(2) train a foundation model that learns from these rich
datasets to reason about human behaviors and health states;
(3) validate the model through large-scale simulation and
principled uncertainty quantification; and (4) deploy it in
rehabilitation and assistive robots for intelligent,
personalized care.
This research not only advances fundamental understanding
of multimodal human behavior, but also opens new pathways
for early disease diagnosis, adaptive treatment, and
accessible digital health.
By bridging AI, wearables, and robotics, my work aspires to
lay the groundwork for the next generation of healthcare
technologies that are proactive, trustworthy, and deeply
aligned with human well-being.

Content not yet available

Next from AAAI 2026

SmartEyes: Plug-and-Play Event Detection for Retail Loss Prevention

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES