The usage of exocentric and egocentric videos in Video Question Answering (VQA) is a new endeavor in human-robot interaction and collaboration studies. Particularly for egocentric videos, one may leverage eye-gaze information to understand human intentions during the task. In this paper, we build a novel task-oriented VQA dataset, called GazeVQA, for collaborative tasks where gaze information is captured during the task process. GazeVQA is designed with a novel QA format that covers thirteen different reasoning types to capture multiple aspects of task information and user intent. For each participant, GazeVQA consists of more than 1,100 textual questions and more than 500 labeled images that were annotated with the assistance of the Segment Anything Model. In total, 2,967 video clips, 12,491 labeled images, and 25,040 questions from 22 participants were included in the dataset. Additionally, inspired by the assisting models and common ground theory for industrial task collaboration, we propose a new AI model called AssistGaze that is designed to answer the questions with three different answer types, namely textual, image, and video. AssistGaze can effectively ground the perceptual input into semantic information while reducing ambiguities. We conduct comprehensive experiments to demonstrate the challenges of GazeVQA and the effectiveness of AssistGaze.

GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations

This paper tackles an emerging and challenging problem of long video temporal grounding~(VTG) that localizes video moments related to a natural language (NL) query. Compared with short videos, long videos are also highly demanded but less explored, which brings new challenges in higher inference computation cost and weaker multi-modal alignment. To address these challenges, we propose CONE, an efficient COarse-to-fiNE alignment framework. CONE is a plug-and-play framework on top of existing VTG models to handle long videos through a sliding window mechanism. Specifically, CONE (1) introduces a query-guided window selection strategy to speed up inference, and (2) proposes a coarse-to-fine mechanism via a novel incorporation of contrastive learning to enhance multi-modal alignment for long videos. Extensive experiments on two large-scale long VTG benchmarks consistently show both substantial performance gains (e.g., from 3.13 to 6.87\% on MAD) and state-of-the-art results. Analyses also reveal higher efficiency as the query-guided window selection mechanism accelerates inference time by 2x on Ego4D-NLQ and 15x on MAD while keeping SOTA results. Codes have been released at https://github.com/houzhijian/CONE.

<iframe src="https://app.sli.do/event/or9aypsi961h595NQtD7kk/embed/polls/b4fd5e18-b942-4a75-808d-6ee312d298a6" width="300" height="400"></iframe>

CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding

VQA is an ambitious task aiming to answer any image-related question. However, in reality, it is hard to build such a system once for all since the needs of users are continuously updated, and the system has to implement new functions. Thus, Continual Learning (CL) ability is a must in developing advanced VQA systems. Recently, a pioneer work split a VQA dataset into disjoint answer sets to study this topic. However, CL on VQA involves not only the expansion of label sets (new Answer sets). It is crucial to study how to answer questions when deploying VQA systems to new environments (new Visual scenes) and how to answer questions requiring new functions (new Question types). Thus, we propose CLOVE, a benchmark for Continual Learning On Visual quEstion answering, which contains scene- and function-incremental settings for the two aforementioned CL scenarios. In terms of methodology, the main difference between CL on VQA and classification is that the former additionally involves expanding and preventing forgetting of reasoning mechanisms, while the latter focusing on class representation. Thus, we propose a real-data-free replay-based method tailored for CL on VQA, named Scene Graph as Prompt for Symbolic Replay. Using a piece of scene graph as a prompt, it replays pseudo scene graphs to represent the past images, along with correlated QA pairs. A unified VQA model is also proposed to utilize the current and replayed data to enhance its QA ability. Finally, experimental results reveal challenges in CLOVE and demonstrate the effectiveness of our method. Code and data
are available at https://github.com/showlab/CLVQA.

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

**Are you attending this poster session virtually?** 
In-person printed posters are available for in-person attendees only.
 
**Are you attending this poster session in person?** 
Hybrid posters are displayed in the East Foyer.

PS1 (In-person) Posters, Demo, Industry, Findings

poster

**Welcome to EMNLP 2023!**

On behalf of the EMNLP 2023 Organizing Committee, I extend a warm and heartfelt welcome to all of you to the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). It is with immense pleasure and excitement that we gather here in Singapore, a vibrant hub of innovation and technological advancement.

The conference program is packed with insightful presentations, thought-provoking workshops, and engaging networking opportunities. In addition to the technical sessions, we have also planned several social events that will provide you with the opportunity to connect with your colleagues.

The conference is held in person at the Resorts World Convention Centre in Singapore, and available on-line with the help of Underline.

For those in Singapore, I hope that you will find time to explore the exciting Sentosa Island, Gardens by the Bay, Singapore Botanic Gardens and the many other attractions unique to Singapore.

Of course, we are grateful to our sponsors and partners for their generous support of EMNLP 2023, whose contributions make it possible for us to host this world-class event.

Yuji Matsumoto (RIKEN AIP) 
EMNLP 2023 General Chair

To access the **EMNLP 2023** event page on Underline, you need to register for the Conference. 
Please follow **[this link](https://2023.emnlp.org/registration/)** for more details.

EMNLP 2023

EMNLP 2023 took place in Singapore from Dec 6th to Dec 10th, 2023.

Virtual Poster Session 1

### Welcome to ACL 2023, the 61st Annual Meeting of the Association for Computational Linguistics! 
 The conference will be held in Toronto, Canada, July 9-14, 2023. 
Following the succession of the recent conferences in our field, ACL 2023 will adopt a hybrid format.
While the impact of Covid has considerably diminished in terms of traveling, obtaining visas to Canada
entails a very long process. Moreover, the global economic conditions pose challenges for many individuals to travel to conferences. Recognizing these circumstances, we know many participants may not be
able to attend the conference in person. Therefore, we are committed to providing a great virtual platform
so everyone has the opportunity to interact with other participants and enjoy the conference. Based on the
current registered participants, approxiately 30% have chosen to attend the conference virtually. Whether
you join us in person or virtually, we sincerely hope everyone has a remarkable conference experience. 
This General Chair’s message is where I express my gratitude to the many individuals who have made
enormous contributions to the conference over the past year.

Read [**ACL 2023 General Chair's message**](https://docs.google.com/document/d/1WobYM7norbG4dI48s75HfJoD89qgX5a_F-6U8AteLSA/edit?usp=sharing/) in full.

##### **[Conference Handbook](https://2023.aclweb.org/downloads/acl2023-handbook.pdf)**

ACL 2023

The Association for Computational Linguistics (ACL) is the premier international scientific and professional society for people working on computational problems involving human language, a field often referred to as either computational linguistics or natural language processing.

CV: Language and Vision 1

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-23 is the Thirty-Seventh AAAI Conference on Artificial Intelligence. The theme of this conference is to create collaborative bridges within and beyond AI. Like previous AAAI conferences, AAAI-23 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and two new activities: a Bridge Program and a Lab Program. Many of these activities are tailored to the theme of bridges and all are selected according to the highest standards, with additional programs for students and young researchers. 
AAAI is providing you with a conference planner, which you can use to help organize your itinerary of activities. This includes talks to attend in person, talks to attend remotely, breaks with colleagues and your site seeing activities. To access this conference planner, please go to [https://aaai-2023.takemobi.io](https://aaai-2023.takemobi.io).

In order to access this site, you need to register. If you haven't already, please register [here](https://aaai.org/Conferences/AAAI-23/registration/).


AAAI 2023

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines.

Difei Gao

3

8

Presentations

GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations

CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES