Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Generative models have become a powerful tool for synthesizing training data in computer vision tasks. Current approaches solely focus on aligning generated images with the target real dataset distributions. As a result, they only captured the common features in the real dataset and merely generated 'easy samples', which are already well-learned from real data. In contrast, those rare 'hard samples', with atypical features but crucial for enhancing performance, cannot be effectively generated. Consequently, these approaches must synthesize large volumes of data to yield appreciable performance gains, yet the upper bound remains limited. To overcome this limitation, we present a novel methodology that can learn to control the learning difficulty of samples during generation, in addition to domain alignment. Thus, it can efficiently generate valuable `hard samples' that yield significant performance improvements for target tasks. This is achieved by incorporating learning difficulty as a new condition in generative models with a designed encoder structure, training and generation strategy. Experimental results across multiple datasets show that our method can achieve higher performance with less generation cost. Specifically, we can get the best performance with only 10\% addtional synthetic data, saving 63.4 GPU hours of generation than previous SOTA on ImageNet. Moreover, our method also offers insightful visualizations of category-specific hard factors, serving as a tool for analyzing the datasets.