Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
The accurate trust assessment of large language models (LLMs), which can enable selective prediction and improve user confidence, is challenging due to the diverse multi-modal input paradigms. We propose textbfFunctionally textbfEquivalent textbfSampling for textbfTrust textbfAssessment (FESTA), an input sampling technique for multimodal models, which generates an uncertainty measure based on the equivalent and complementary input sampling. The sampling approach expands the input space to measure the consistency (through equivalent samples) and sensitivity (through complementary samples) properties of the model. These two uncertainty measures are combined to form the final FESTA estimate. Our approach only requires black-box access, and is unsupervised. The experiments are conducted with various off-the-shelf multi-modal LLMs, on visual and audio reasoning tasks. The proposed FESTA approach is shown to significantly improve (33.3% relative improvement for vision-LLMs and 29.6% relative improvement for audio-LLMs) the area-under-receiver-operating-curve (AUROC) metric on these reasoning tasks.