
Chaoya Jiang
question answering
contrastive learning
multimodal
commonsense reasoning
in-context learning
vision and language pre-training
retrieval augmentation
vision-language pretraining
vision transformers
patch selection
demonstration augmentation
prediction calibration
4
presentations
SHORT BIO
Hello! I am a Ph.D. candidate at Peking University, My research focuses on multimodal large language models and vision langauge pretraining. My goal is to contribute to the development of advanced AI systems with enhanced multimodal capabilities.
Presentations

MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Haowei Liu and 10 other authors

Enhancing In-Context Learning via Implicit Demonstration Augmentation
Xiaoling Zhou and 6 other authors

Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation
Chaoya Jiang

TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
Chaoya Jiang