Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Modern generators often violate basic physics—e.g., inconsistent shadows, geometry, and measurement models limiting trust for video synthesis and computational imaging. We propose finite-time Schrödinger-Bridge (SB) world models that cast generation as entropy-regularized optimal transport from a simple prior to a physics- and data-consistent distribution. Unlike post-hoc corrections, our approach injects structure along the transport path: multi-view geometry (reprojection/epipolar constraints, homographies, depth-aware warps) for video, and differentiable optical operators (PSF-based defocus, lightweight Fourier propagation for coherent/partially coherent settings) for imaging. With known poses, we penalize reprojection and warp-aligned photometric/feature errors; with unknown poses, a compact head estimates motion/flow with cycle-consistency. Compact UNet/ViT backbones and short SB horizons target efficiency. Evaluation spans 3D consistency metrics, physics fidelity via forward-simulation error, and generative quality/efficiency (FID/KID, FVD) against strong diffusion baselines, plug-and-play data-consistency, and unconstrained SB. By constraining the path rather than only the endpoint, the method aims to shorten sampling while improving cross-view coherence and physical plausibility across sensors (cameras, microscopes, medical scanners).