Feeling Validated: Constructing Validation Sets for Few-Shot Learning
Feeling Validated: Constructing Validation Sets for Few-Shot Learning
07 December 2022
We study validation set construction via data augmentation in true few-shot text classification. Empirically, we show that task-agnostic methods---known to be ineffective for improving test set accuracy for state-of-the-art models when used to augment the training set---are effective for model selection when used to build validation sets. However, test set accuracy on validation sets synthesized via these techniques does not provide a good estimate of test set accuracy. To support better estimates, we propose DAugSS, a generative method for domain-specific data augmentation that is trained once on task-agnostic data and then employed for augmentation on any data set, by using provided training examples and a set of guide words as a prompt. In experiments with 6 data sets, both 5 and 10 examples per class, training the last layer weights and full fine-tuning, and the choice of 4 continuous-valued hyperparameters, DAugSS is better than or competitive with other methods of validation set construction, while also facilitating better estimates of test set accuracy.
Venue : Empirical Methods in Natural Language Processing (EMNLP) 2022
File Name : validation_generation (2).pdf