
Zhexin Zhang
reinforcement learning
dialogue
safety
multitask
toxicity
large language model
sticker
unified framework
instruction tuning
safety detection
3
presentations
Presentations

InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning
Zhexin Zhang and 4 other authors

Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen and 6 other authors

Selecting Stickers in Open-Domain Dialogue through Multitask Learning
Zhexin Zhang and 4 other authors