Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
We address the challenge of integrating high-level semantic reasoning with low-level trajectory planning in end-to-end autonomous driving, where most existing frameworks decouple perception, decision-making, and control, leading to limited interpretability and poor instruction compliance. To bridge this gap, we propose Driving with Advice, a novel closed-loop framework that treats a vision-language model (VLM) as a motion advisor to provide interpretable, language-mediated guidance for trajectory generation. Our approach introduces three key innovations: (1) Semantic-Intentional Pretraining (SIP), which injects driving rationale into a compact VLM via machine-generated question-answering pairs; (2) a discrete action space grounded in directional and speed primitives, enabling structured and interpretable policy learning; and (3) an advice-following diffusion policy refined via Group Relative Policy Optimization under a multi-objective reward that ensures safety, comfort, and alignment with semantic intent. We evaluate our method on the NAVSIM benchmark in a closed-loop setting, achieving a state-of-the-art Predictive Driver Model Score (PDMS) of 91.5, outperforming strong baselines in safety (NC: 99.2). The results demonstrate that leveraging language as a cognitive interface between perception and control enhances both generalization and behavioral transparency, advancing the paradigm of language-conditioned driving.