Sensibility, Robustness and Defense Against Adversarial Attacks

Adversarial Attacks Demonstrate Lack of Sensibility

Adversarial Attacks Using Small Differential Changes Cause "Stupid" Mistakes

Large neural networks performing image classification are notoriously vulnerable to adversarial attacks that cause a classifier to change its answer due to a minor change in the input pattern. Two images that are indistinguishable to a human viewed may be classified differently, with one classification correct and the other classification totally unrelated to either image.

Vulnerability is Not Specific to Images or to Neural Networks

However, for at least one form of adversarial attack, the problem is not specific to image classification nor is it specific to large neural networks. In this form of adversarial attack on an image classifier (called an incremental differential attack), the attacker starts with a known classifer and with an image that is classified correctly. A published classifier may used, or the attacker may develop a neural network classifier of its own. The attacker's is intended to approximate unpublished classifier. The attacker's classifier may also approximate a classifier that is not a neural network. Using the known classifier, the attacker uses gradient descent like the gradient descent computation that is used in training neural networks but with a different objective function. The objective is to lower the activation for the for any output node that corresponds to the correct answer and/or to increase the socre of one or more selected output nodes that correspond to wrong answers. Unlike in training the back propagation is not used to update the connection weights in the network. Instead, the back propagation is extended to the input variables and the gradient descent updates the values of the input variables. That is, the back propagation is used to change the image by small incrmental amounts. Selecting a subset of wrong answers allows the attacker to make a separate attack for any selected subset of the set of wrong answers. Thus, with this method, there are exponentially many different adversarial attacks possible on any single image.

Incremental Differential Attack Against Any Smoothly Differentiable Function in a High-Dimension Space

An incremental differential attack produces an error if the change is enough so that the output activation for a wrong answer exceeds the output activation of the correct answer. The attack succeeds against any smoothly differentiable function defined in a high-dimensional space, not against just deep neural networks. Let ε be a small number such that a change in an input variable by ε is not noticable. For example, in a digital image, the value of ε might be one-half the size of a quatization level. One strategy for an attacker would be to change each input variable by ε times the sign of the derivative of the objective with respect to the input variable. This strategy will create an new image that is indistinguishable from the original. The new image will be misclassified if the sum of the magnitudes of the partial derviatives summed over all input variables is > 1.0.

Sensibility: Acknowledge and Try to Avoid "Stupid" Mistakes

Even very intelligent people sometimes make stupid mistakes. An aspect of intelligence is the ability to recognize a stupid mistake. Even the best machine learning systems may make a mistake that would cause a person to say: "That's stupid! Nobody would make that mistake." More worrisome are the facts that:

An incremental differential adversarial attack can routinely cause a system to make a stupid mistake.
Neural networks generally lack a machanism for introspection and often do not detect an adversarial attack.
Even when a mistake is identified, a neural network will generally not be able to recognize it as a "stupid" mistake.

An intelligent person, on the other hand will recognize and acknowledge a "stupid" mistake.

Some Approaches to Increasing Sensibility:

Reduce the effective dimensionality of a network.
Use piecewise constant activation functions.
Use parametric probability models with a small number of parameters.
Use judgment nodes to provide a form of introspection while limiting growth of network dependencies.
Use imitation learning to train a network designed for sensibility

The architecture of a network and the certain activation functions, for example as in (1), (2) and (3) above, may reduce the sensitivity to adversarial attacks and the incidence of stupid mistakes. The use of judgment nodes (4) helps to detect errors, including stupid mistakes. With imitation learning (5), a system developer or a human + AI learning management system may start with a conventional network and then develop a separate system with comparable or better performance while incorporating these architectural features to give the imitation system sensibility.

Navigation Menu

by James K Baker and Bradley J Baker

The text in this work is licensed under a Creative Commons Attribution 4.0 International License.

Some of the ideas presented here are covered by issued or pending patents. No license to such patents is created or implied by publication or reference to herein.