Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Knowledge Distillation (KD) serves as an effective approach to addressing heterogeneity issues in Federated Learning (FL), leveraging additional datasets to align local and global models better. There are two primary distillation paradigms: feature-based distillation, which utilizes intermediate-layer features of the network, and logit-based distillation, which employs the final layer's logit outputs. However, existing studies often select distillation methods based on intuitive and empirical evidence when facing different heterogeneous settings, neglecting the intrinsic relationship between distillation paradigms and heterogeneity. This oversight may result in suboptimal federated knowledge distillation performance under heterogeneous conditions. In this paper, we propose the \underline{C}onsolidated \underline{D}istillation for Heterogeneous \underline{Fed}erated Learning - \textbf{FedCD} that balances knowledge representations from both feature-based and logit-based distillation to enhance performance. Specifically, to address the misalignment between knowledge conveyed by features and logits, we aggregate features from different layers via cross-layer attention to preserve semantic knowledge, followed by distribution modeling using Gaussian Mixture Models. This process strengthens knowledge distillation by constraining the transformation of different network layers' features under a consolidated distribution, thereby mitigating impacts from both data and model heterogeneity. Extensive experiments demonstrate that FedCD outperforms state-of-the-art methods by over 10.72\% and validate the effectiveness of our approach.