Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Event detection is essential for surveillance, particularly in retail loss prevention where accurate, timely monitoring is critical. Large vision–language models (VLMs) provide strong generalization but are inefficient on video streams and prone to hallucinations from redundant frames. We present \textbf{SmartEyes}, a plug-and-play system for real-time retail surveillance. SmartEyes introduces \textbf{Perception–Cognition Focusing (PCF)}, which combines lightweight perception with semantic triggering to isolate two keyframes—customer contact and departure—and constrain the VLM to a focused differencing task. This design reduces hallucination while enabling efficient reasoning. Our demo features a SAM-powered ROI interface and live CCTV monitoring, achieving accurate alerts within 1–2 seconds on a single RTX 4080 GPU.