Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
keywords:
computer science
perception
vision
neural networks
natural language processing
Research has begun exploring the performance of large vision language models (LVLMs) in recognizing illusions. However, studies often have not distinguished actual and apparent features, leading to ambiguous assessments of machine cognition.
We introduce a visual question answering (VQA) dataset, categorized into genuine and fake illusions. Genuine illusions present discrepancies between actual and apparent features, whereas fake illusions have the same actual and apparent features even though they look illusory. We evaluate the performance of LVLMs for genuine and fake illusion VQA tasks and investigate whether the models discern actual and apparent features. Our findings indicate that although LVLMs may appear to recognize illusions by correctly answering questions about both feature types, they predict the same answers for both Genuine Illusion and Fake Illusion VQA questions. This suggests that their responses might be based on prior knowledge of illusions rather than genuine visual understanding.