Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners
Abstract
We investigate a problem at the intersection of machine learning
and security: training-set attacks on machine learners. In such attacks
an attacker contaminates the training data so that a specific learning
algorithm would produce a model profitable to the attacker. Understanding
training-set attacks is important as more intelligent agents
(e.g. spam filters and robots) are equipped with learning capability and
can potentially be hacked via data they receive from the environment.
This paper identifies the optimal training-set attack on a broad family
of machine learners. First we show that optimal training-set attack can
be formulated as a bilevel optimization problem. Then we show that
for machine learners with certain Karush-Kuhn-Tucker conditions we
can solve the bilevel problem efficiently using gradient methods on
an implicit function. As examples, we demonstrate optimal trainingset
attacks on Support Vector Machines, logistic regression, and linear
regression with extensive experiments. Finally, we discuss potential
defenses against such attacks.
Permanent Link
http://digital.library.wisc.edu/1793/72617Type
Technical Report
Citation
TR1819

