Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Large Language Models (LLMs) can struggle to balance gullibility to misinformation and resistance to valid corrections in persuasive dialogues, a critical challenge for reliable deployment. We introduce DuET-PD (Dual Evaluation for Trust in Persuasive Dialogues), a framework evaluating multi-turn stance-change dynamics across dual dimensions: persuasion type (corrective/misleading) and domain (knowledge/safety), using MMLU-Pro and SALAD-Bench. With DuET-PD, we uncover a primacy effect in initial persuasion and a capability-robustness trade-off: capable models often resist valid corrections, especially in safety tasks, while open-source models show higher gullibility. To address this, we introduce Holistic DPO, a training approach balancing positive and negative persuasion examples. Unlike prompting or resist-only training, Holistic DPO enhances both robustness to misinformation and receptiveness to corrections. Our framework and quantitative insights, coupled with the Holistic DPO method, enable LLMs to better navigate persuasive dialogues, improving reliability in knowledge- and safety-critical contexts.
