Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Recent advances in vision-language models (VLMs) have enabled accurate image-based geolocation, raising serious concerns about location privacy risks in everyday social media posts. However, current benchmarks remain coarse-grained, linguistically biased, and lack multimodal and privacy-aware evaluations. To address these gaps, we present KoreaGEO Bench, the first benchmark designed for fine-grained, multimodal, and privacy-aware evaluation of VLM geolocation, using Korean street views as a rich case study. Our benchmark dataset comprises 1,080 high-resolution images sampled across four socio-spatial clusters and nine place types, enriched with multi-contextual annotations and two styles of Korean captions simulating real-world privacy exposure. We introduce a three-path evaluation protocol to assess ten mainstream VLMs under varying input modalities and analyze their accuracy, spatial bias, and reasoning behavior. Results reveal modality-driven shifts in localization precision and highlight structural prediction biases toward core cities. Ultimately, our work calls for a dual approach in geolocation benchmarking: alongside pursuing the breadth of global coverage, we urge the development of in-depth, localized benchmarks tailored to the unique socio-spatial characteristics of diverse regions to foster more responsible and equitable VLMs.