Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
This survey explores the burgeoning field of representation convergence in foundation models, a phenomenon where diverse neural networks, despite differences in architecture, training objectives, and data modalities, increasingly learn aligned representations. We delve into the Platonic Representation Hypothesis, which posits that this convergence reflects a move towards a shared statistical model of reality. Our survey examines empirical evidence for this convergence across single modalities, including vision, language, speech, and graph data, as well as across modalities such as vision-language, speech-language, and graph-language. We further discuss the potential factors driving this convergence, such as scale, task generality, and simplicity bias. Finally, we explore the significant implications of representation convergence for multimodal learning, unified AI systems, and our understanding of intelligence, while also considering the limitations and open questions that remain in this evolving area of research.