
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Spike cameras, as innovative neuromorphic devices, generate continuous spike streams to capture high-speed scenes with lower bandwidth and higher dynamic range than traditional RGB cameras. However, reconstructing high-quality video from these inputs, especially under low-light conditions, remains challenging. Traditional methods often rely on synthetic data or basic reconstructions as supervision signals, but these approaches falter when dealing with noisy or low-quality spike signals, leading to performance degradation. This is primarily due to inadequate noise modeling, the domain gap between synthetic and real datasets, and the reliance on low-quality pseudo labels, resulting in images with unclear textures, excessive noise, and diminished brightness. To address these challenges, we introduce a novel reconstruction framework that goes beyond traditional training paradigms. Instead of relying solely on visual data, we incorporate textual descriptions and unpaired high-quality datasets as new forms of supervision. Textual descriptions provide additional context that guides the network's feature reconstruction, while high-quality datasets help produce sharp latent images. Our experiments on real-world low-light datasets, such as U-CALTECH and U-CIFAR, demonstrate that this approach significantly enhances texture clarity and luminance balance. Furthermore, the reconstructed images are well-aligned with the broader visual features needed for downstream tasks, ensuring more robust and versatile performance in challenging environments.