3260 papers • 126 benchmarks • 313 datasets
Data Poisoning is an adversarial attack that tries to manipulate the training dataset in order to control the prediction behavior of a trained model such that the model will label malicious examples into a desired classes (e.g., labeling spam e-mails as safe). Source: Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics
(Image credit: Papersgraph)
These leaderboards are used to track progress in data-poisoning-4
No benchmarks available.
Use these libraries to find data-poisoning-4 models and implementations
No datasets available.
No subtasks available.
This paper explores poisoning attacks on neural nets using "clean-labels", an optimization-based method for crafting poisons, and shows that just one single poison image can control classifier behavior when transfer learning is used.
This work designs and evaluates a new model-poisoning methodology based on model replacement and demonstrates that any participant in federated learning can introduce hidden backdoor functionality into the joint global model, e.g., to ensure that an image classifier assigns an attacker-chosen label to images with certain features.
A stealthy data poisoning attack on the least-squares estimator that can escape classical statistical tests is proposed, and the efficiency of the proposed attack is shown.
This work addresses the worst-case loss of a defense in the face of a determined attacker by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization.
Three attacks are developed that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition, and the Karush–Kuhn–Tucker conditions.
This work focuses on Trojan attacks that augment the function of reinforcement learning policies with hidden behaviors that can be implemented through minuscule data poisoning and in-band reward modification that does not affect the reward on normal inputs.
This paper proposes a new method for solving bilevel optimization problems using the classical penalty function approach which avoids computing the inverse and can also handle additional constraints easily and proves the convergence of the method under mild conditions and shows that the exact hypergradient is obtained asymptotically.
A new technique is proposed that makes imperceptible changes to this dataset such that any model trained on it will bear an identifiable mark, robust to data augmentation and the stochasticity of deep network optimization.
This work poses crafting poisons more generally as a bi-level optimization problem, where the inner level corresponds to training a network on a poisoned dataset and the outer level corresponding to updating those poisons to achieve a desired behavior on the trained model, and proposes MetaPoison, a first-order method to solve this optimization quickly.
Unified benchmarks for data poisoning and backdoor attacks are developed in order to promote fair comparison in future work and to find that existing poisoning methods have been tested in contrived scenarios, and they fail in realistic settings.
Adding a benchmark result helps the community track progress.