3260 papers • 126 benchmarks • 313 datasets
Model extraction attacks, aka model stealing attacks, are used to extract the parameters from the target model. Ideally, the adversary will be able to steal and replicate a model that will have a very similar performance to the target model.
(Image credit: Papersgraph)
These leaderboards are used to track progress in model-extraction-6
Use these libraries to find model-extraction-6 models and implementations
No subtasks available.
Entangled Watermarking Embeddings (EWE) is introduced, which encourages the model to learn common features for classifying data that is sampled from the task distribution, but also data that encodes watermarks, which forces an adversary attempting to remove watermarks that are entangled with legitimate data to sacrifice performance on legitimate data.
FedRolex is proposed, a partial training (PT)-based approach that enables model-heterogeneous FL and can train a global server model larger than the largest client model and reduces the gap between model- heterogeneous and model-homogeneous FL, especially under the large-model large-dataset regime.
It is found that the proposed data-free model extraction approach achieves high-accuracy with reasonable query complexity – 0.99× and 0.92× the victim model accuracy on SVHN and CIFAR- 10 datasets given 2M and 20M queries respectively.
GINSEW, a novel method to protect text generation models from being stolen through distillation by injecting secret signals into the probability vector of the decoding steps for each target token, is proposed.
A quantitative comparison of the tools proposed by the papers on the unifying task of process model entity and relation extraction so as to be able to compare them directly is proposed.
Simple, efficient attacks are shown that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees against the online services of BigML and Amazon Machine Learning.
This paper formalizes the PME task into the multi-grained text classification problem, and proposes a hierarchical neural network to effectively model and extract multi-Grained information without manually-defined procedural features.
DAWN (Dynamic Adversarial Watermarking of Neural Networks), the first approach to use watermarking to deter model extraction theft, is introduced and is shown to be resilient against two state-of-the-art model extraction attacks.
This work highlights an exploit only made feasible by the shift towards transfer learning methods within the NLP community: for a query budget of a few hundred dollars, an attacker can extract a model that performs only slightly worse than the victim model.
The fingerprint is robust against distillation, related model extraction attacks, and even transfer learning when the attacker has no access to the model provider's dataset, and is the first method that reaches an AUC of 1.0 in verifying surrogates.
Adding a benchmark result helps the community track progress.