
Yaoyuan Liang
Tsinghua University
visual grounding
detection
vision and language
transformer
video understanding
detr
vision-and-language
2
presentations
SHORT BIO
I am a Ph.D student in Shenzhen Key Laboratory of Ubiquitous Data Enabling, Tsinghua Shenzhen International Graduate School, Tsinghua University. Research interests: Multi-modal Learning, Vision-Language Understanding and LLM-enhanced Visual Grounding
Presentations

CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding
Yaoyuan Liang and 7 other authors

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
Shilong Liu and 7 other authors