
Yadong Mu
language and vision
cv
manipulation
rob
3
presentations
Presentations

Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images
Hongyu Yan and 1 other author

Granularity-Adaptive Spatial Evidence Tokenization for Video Question Answering
Hao Jiang and 8 other authors

Tree-Structured Trajectory Encoding for Vision-and-Language Navigation
Yadong Mu and 1 other author