Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
The proliferation of Machine Learning as a Service (MLaaS) has enabled widespread deployment of large language models (LLMs) via cloud APIs, but also raises critical concerns about model integrity and security. Existing black-box tamper detection methods, such as watermarking and fingerprinting, rely on the stability of model outputs—a property that does not hold for inherently stochastic LLMs. In this paper, we propose a novel fingerprinting approach tailored for LLM tamper detection in the black-box setting. By formulating the problem as hypothesis testing, we introduce a Regularized Entropy-Sensitive Fingerprinting (RESF) objective that leverages a first-order surrogate for KL divergence to maximize sensitivity while controlling false positives. To robustly distinguish genuine tampering from benign temperature changes, we develop a lightweight two-tier sequential test combining support-based and distributional checks, with rigorous control of the global false-alarm rate. Our method is supported by comprehensive theoretical analysis and explicit performance guarantees. Extensive experiments across multiple LLMs and tampering scenarios demonstrate that RESF achieves up to 98.80\% detection accuracy, even under minimal LoRA fine-tuning modifications, establishing a new standard for practical and reliable LLM integrity verification.
