This paper develops the rst black-box MI attack algorithms that combine information from previously known standalone MI attacks to let the adversary take advantage of access to both the original model and one or more updated models to improve MI on the update set.
A large body of research has shown that machine learning models are vulnerable to membership inference (MI) attacks that violate the privacy of the participants in the training data. Most MI research focuses on the case of a single standalone model, while production machine-learning platforms often update models over time, on data that often shifts in distribution, giving the attacker more information. This paper proposes new attacks that take advantage of one or more model updates to improve MI. A key part of our approach is to leverage rich information from standalone MI attacks mounted separately against the original and updated models, and to combine this information in specic ways to improve attack eectiveness. We propose a set of combination functions and tuning methods for each, and present both analytical and quantitative justication for various options. Our results on four public datasets show that our attacks are eective at using update information to give the adversary a signicant advantage over attacks on standalone models, but also compared to a prior MI attack that takes advantage of model updates in a related machine-unlearning setting. We perform the rst measurements of the impact of distribution shift on MI attacks with model updates, and show that a more drastic distribution shift results in signicantly higher MI risk than a gradual shift. Our code is available on GitHub. case, models are released are to mobile devices or all around the world predictions. This paper investigates the threat of repeated model updates for an attacker who monitors their releases and wishes to infer membership of specic samples in each update dataset. We formalize the problem of membership-inference under repeated model updates in a way that supports a wide range of model update procedures, sizes of update batches, and distribution shift in the new data (Section 3). Geared toward this problem, we develop the rst black-box MI attack algorithms that combine information from previously known standalone MI attacks—such as the state-of-the-art LiRA attack [5]—to let the adversary take advantage of access to both the original model and one or more updated models to improve MI on the update set (Section 4). Our algorithms compute the standalone attack’s condence scores separately against the original model, then against the update model(s), and combine them to obtain a condence score for membership in the update set. We justify the need to use detailed condence scores information by showing that combining only the binary membership decisions does not increase the attacker’s power. We consider two dierent methods for combining scores, each motivated analytically by the study of a simple example. Our analysis and experiments demonstrate that the best choice of score will depend on the specic learning algorithm being attacked. is study and We our on four datasets—FMNIST, CIFAR-10, Purchase100, and IMDb—using and neural We