Oracle Labs | Single Publication Page

Semantic Membership Inference Attack against Large Language Models

Hamid Mozaffari, Virendra Marathe

14 December 2024

Membership Inference Attacks (MIAs) determine whether a specific data point was included in the training set of a target model. In this paper, we introduce the Semantic Membership Inference Attack (SMIA), a novel approach that enhances MIA performance by leveraging the semantic content of inputs and their perturbations. SMIA trains a neural network to analyze the target model’s behavior on perturbed inputs, effectively capturing variations in output probability distributions between members and non-members. We conduct comprehensive evaluations on the Pythia and GPT-Neo model families using the Wikipedia and MIMIR datasets. Our results show that SMIA significantly outperforms existing MIAs; for instance, for Wikipedia, SMIA achieves an AUC-ROC of 67.39% on Pythia-12B, compared to 58.90% by the second-best attack.

Venue : RedTeam GenAI workshop at NeurIPS 2024

Click on the button below to download this publication.

Semantic Membership Inference Attack against Large Language Models

Semantic Membership Inference Attack against Large Language Models

Resources For

Partners

Emerging Technology

What’s New

Contact Us