
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

workshop paper
Know Thine Enemy: Adaptive Attacks on Misinformation Detection Using Reinforcement Learning
keywords:
misinformation
robustness
adversarial examples
reinforcement learning
We present XARELLO: a generator of adversarial examples for testing the robustness of text classifiers based on reinforcement learning. Our solution is adaptive, it learns from previous successes and failures in order to better adjust to the vulnerabilities of the attacked model. This reflects the behaviour of a persistent and experienced attacker, which are common in the misinformation-spreading environment. We evaluate our approach using several victim classifiers and credibility-assessment tasks, showing it generates better-quality examples with less queries, and is especially effective against the modern LLMs. We also perform a qualitative analysis to understand the language patterns in the misinformation text that play a role in the attacks.