Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Membership Inference Attacks (MIAs) test whether a model has memorized training data, and are a key tool for auditing privacy risks in machine learning. Recent papers report near-perfect MIA success against large vision-language models such as CLIP, but almost all evaluations train on one web-scale corpus (e.g. LAION-400M) and treat samples from a different corpus (e.g. COCO or CC12M) as non-members -- thereby turning the task into out-of-distribution (OOD) detection rather than true membership testing, introducing spurious signals unrelated to true memorization.
We revisit the problem with a distribution-matched benchmark built from the CommonPool-L corpus of DataComp. A ViT-B/16 CLIP trained on 400 M pairs is accompanied by two 26-shard, i.i.d. splits that serve as member and non-member sets, sharing the exact same acquisition and preprocessing pipeline. Under this strictly in-distribution setting, every published MIA baseline collapses to chance ($\approx$51\% AUC). To explain this collapse, we derive a scaling-law upper bound for similarity-based attacks showing that the expected member vs. non-member similarity gap decays as $\mathcal{O}(T/N)$ for contrastive learning with $T$ epochs over $N$ samples. Empirically, as we vary the training set size while holding all hyper-parameters fixed, the gap follows the predicted linear trend in log–log space, and Cosine Similarity Attack AUC drops from 94\% to 51\%. %, matching the analytic sigmoid mapping. Finally, we propose a simple, white-box, gradient-based MIA that outperforms prior attacks for CLIP without relying on OOD cues. We release code, checkpoints, and data to foster comprehensive and reproducible privacy research on multimodal foundation models.