Feeling Validated: Constructing Validation Sets for Few-Shot Learning

Feeling Validated: Constructing Validation Sets for Few-Shot Learning

Ari Kobren, Michael Wick, Swetasudha Panda, Jason Peck, Naveen Jafer Nizar, Qinlan Shen, Gioacchino Tangari

07 December 2022

We study validation set construction via data augmentation in true few-shot text classification. Empirically, we show that task-agnostic methods---known to be ineffective for improving test set accuracy for state-of-the-art models when used to augment the training set---are effective for model selection when used to build validation sets. However, test set accuracy on validation sets synthesized via these techniques does not provide a good estimate of test set accuracy. To support better estimates, we propose DAugSS, a generative method for domain-specific data augmentation that is trained once on task-agnostic data and then employed for augmentation on any data set, by using provided training examples and a set of guide words as a prompt. In experiments with 6 data sets, both 5 and 10 examples per class, training the last layer weights and full fine-tuning, and the choice of 4 continuous-valued hyperparameters, DAugSS is better than or competitive with other methods of validation set construction, while also facilitating better estimates of test set accuracy.


Venue : Empirical Methods in Natural Language Processing (EMNLP) 2022