Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Task-specific data selection, which aims to identify the most relevant training instances from a large corpus to optimize performance on a target task, is a critical challenge in modern AI. Prevailing methods typically rely on either representation clustering or gradient-based influence estimation. However, these approaches have notable limitations. Representation-based methods rely on static features; they measure semantic proximity but are agnostic to the process of learning. Conversely, influence-based methods, while capturing optimization directions, often focus narrowly on aligning with the validation loss, which may not fully correlate with the desired capabilities. To address these issues, we propose TRACE, a novel algorithm that simultaneously considers data consistency in the optimization direction and representation space, and performs TRajectory-based Activation Change Estimation to select instruction. Specifically, TRACE first performs a targeted weight update using the validation set. It then captures the optimization trajectory by calculating the change in neuron activations for each before and after this update. By selecting data whose activation change are most similar to those of the validation set, TRACE ensures alignment in both the representational and optimization domains. Our experiments demonstrate that TRACE outperforms baseline methods across various tasks, particularly in complex, data-scarce scenarios.