Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Explanations are an important tool for gaining insights into model behavior, calibrating user trust, and ensuring compliance. Past few years have seen a flurry of methods for generating explanations, many of which involve computing model gradients or solving specially designed optimization problems. Owing to the remarkable reasoning abilities of LLMs, self-explanation, ie, prompting the model to explain its outputs has recently emerged as a new paradigm. We study a specific type of self-explanations, self-generated counterfactual explanations (SCEs). We design tests for measuring the efficacy of LLMs in generating SCEs. Analysis over various LLM families, sizes, temperatures, and datasets reveals that LLMs often struggle to generate SCEs. When they do, their prediction often does not agree with their own counterfactual reasoning.