Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Training data detection is critical for enforcing copyright and data licensing, as Large Language Models are trained on massive text corpora scraped from the internet. We present SPECTRA, a watermarking approach that makes training data reliably detectable even when it comprises less than 0.001 \% of the training corpus. SPECTRA works by using an LLM to generate semantically equivalent paraphrases of text, and then computing its token log probabilities, using a scoring model that was not trained on the text. A paraphrase is then sampled with a score computed using the token log probabilities that is close to the score of the original text. We compare the token log probabilities of a "suspect" model to those of the scoring model to detect if the watermarked data was used for training. We demonstrate that SPECTRA achieves a consistent p-value gap of over nine orders of magnitude when detecting data used to train a model versus data not used to train a model. SPECTRA equips data owners with a scalable, deploy‑before‑release watermark that survives even large‑scale LLM training.