Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Dragonfly is an interconnect topology widely deployed in high-performance computing systems. A critical challenge in Dragonfly networks is workload interference on shared network links. Parallel discrete event simulation (PDES) is commonly used to analyze and address this interference. However, high-fidelity PDES is computationally expensive, making it impractical for large-scale or real-time scenarios. Hybrid simulation that incorporates data-driven surrogate models offers a promising alternative, especially for forecasting application runtime, a task complicated by the dynamic behavior of network traffic. We present SMART, a surrogate model that combines graph neural networks (GNNs) and large language models (LLMs) to capture both spatial and temporal patterns from port level router data. SMART outperforms existing statistical and machine learning baselines and achieves inference in 0.5150 seconds, demonstrating its practical viability for runtime use and integration into hybrid simulation frameworks.