Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
The rapid advancement of generative models has produced highly realistic synthetic images, posing significant challenges to digital media authenticity. Existing AI-generated image detectors often fail to generalize to images from unseen generators when crossing architectural boundaries (i.e., Generative Adversarial Networks (GANs) vs. Diffusion Models (DMs)). We hypothesize that this generalization gap arises from fundamental differences in how these architectures generate images. In this work, we provide the first theoretical analysis explaining why GANs and DMs produce fundamentally different artifacts through the lens of the manifold hypothesis. We prove that GANs produce characteristic boundary artifacts from partial manifold coverage, while DMs exhibit over-smoothing and unique noise patterns due to the need for complete coverage. Motivated by this theoretical finding, we propose a novel semi-supervised detection approach called Triarchy Detector (TriDetect) that enhances standard binary classification with an architecture-aware clustering loss. Specifically, instead of producing binary classification heads, the architecture-aware classifier generates distinct logits for both real images and multiple fake clusters. To prevent the problem of cluster collapse in unsupervised learning scenarios, we implement balanced cluster assignment through the Sinkhorn-Knopp algorithm. Furthermore, we design a cross-view consistency mechanism to ensure that the model learns discriminative features that capture architectural patterns rather than image statistics. By learning to recognize architectural patterns that persist across different generators within the same family, our method achieves superior generalization to unseen generators.
