Despite its impeccable success, modern deep learning systems are still prone to adversarial examples. Let's take computer vision for example. Consider an image of a pig (picture to the left). A deep learningbased image classifier can successfully identify this as a pig. Consider another instance of the same image (picture to the right), a slightly perturbed version of the first picture. It still looks like a pig to human eyes, but the image classifier identifies it as an airliner. These perturbations are called adversarial examples.
> Figure 1: To our eyes, the figures are identical, but to an image classifier, they are not the same. This is an example of an adversarial example. <
What is the typical process of training a deep learningbased image classifier? Here are the quick steps:
Here is a pictorial depiction of the above process:
> Figure 2: Pictorial overview of the training process in deep learning (source). <
Let us take a step back and reflect on steps 5 and 6 
 We then calculate the gradients of the loss function with respect to the network’s learnable parameters.
 We backpropagate these gradients to update the learnable parameters of our network such that the loss is minimized.
Two things to catch here is 
Now, what is even more tempting to explore is 
Suppose we can update the input data so that the training objective gets maximized instead of minimized. In that case, essentially, the updated input data can be used to fool our network. Of course, we would not want our updated input data to look visually implausible (in computer vision). For example, an image of a hog should still appear as a hog to the human eyes, as we saw in Figure 1. So, this constrains us in the following aspects 
Why care about gradients only? Because this quantity will inform us how to nudge the input data to affect the loss function.
Do not worry if the concept of adversarial examples is still foreign to you. We hope it will become more evident with the few next sections, but here is a pictorial overview of what is coming.
> Figure 3: Pictorial overview of the FGSM method (source). <
Adversarial examples can mainly come in two different flavors to a deep learning model. Sometimes, the data points can be naturally adversarial (unfortunately !) and sometimes, they can come in the form of attacks (also referred to as synthetic adversarial examples). We will be reviewing both the types in this section.
Ever wondered what if an image is naturally adversarial? What could be more dangerous than that? Consider the following grid of images for example 
> Figure 4: On the leftmost corner of each image, we see the groundtruth label, and on the rightmost corner, we see the predictions of a standard CNNbased classifier (source). <
In their paper Natural Adversarial Examples, Hendrycks et al. present these naturally adversarial examples (along with a dataset called ImageNetA) for classifiers pretrained on the ImageNet1K dataset. To assess how dangerous could these examples be here is a phrase from the abstract of the paper 
[…] DenseNet121 obtains around 2% accuracy, an accuracy drop of approximately 90% […]
To summarize potential reasons as to why these situations might be occurring, the authors emphasize the following points 
Geirhos et al. studied this phenomenon in their paper ImageNettrained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. They first stylize the ImageNet1K training images to impose bias toward shape in the CNNbased classifiers. Unfortunately, this idea does not transfer well in the case of the ImageNetA dataset.
What is worse is the traditional adversarial training methods do not help to prevent the standard CNNbased classifiers from these examples. To help the networks tackle these severe problems, Hendrycks et al. suggest using selfattention mechanisms since it helps a network better capture longrange dependencies and other forms of interactions from its inputs.
Natural adversarial examples are a pain. However, it is also possible to take a natural image and turn that into an adversarial example synthetically.
In this part, we will be exploring the following questions we raised with code in tf.keras
. For now, we will be basing our exploration with respect to image classification models. However, note that adversarial examples are a general threat and can be extended to any learning task.
What if we calculated the gradients with respect to the input data? What if the input data were updated (with the gradients) to maximize the loss?
So, here is how you would run an inference with a pretrained ResNet50 model in tf.keras

# Load ResNet50
resnet50 = tf.keras.applications.ResNet50(weights="imagenet")
# Run inference
preds = resnet50.predict(preprocessed_image)
# Process the predictions
...
If we pass the following image to resnet50
and process the predictions we would get  [('n02395406', 'hog', 0.99968374), ('n02396427', 'wild_boar', 0.00031595054), ('n03935335', 'piggy_bank', 9.8273716e08)]
(top3 predictions).
> Figure 5: A sample image for running inference with an image classifier. <
Now, these labels we are seeing come from the ImageNet dataset, and hog
corresponds to the class index of 341. So, if we calculate the crossentropy loss between 341
(ground truth label) and resnet50
's predictions, we can expect the loss to be low 
cce = tf.keras.losses.SparseCategoricalCrossentropy()
loss = cce(
tf.convert_to_tensor([341]),
tf.convert_to_tensor(preds)
)
print(loss.numpy())
# 0.0004157156
Turning this loss into probability gives us the number (not the exact one but pretty close) we saw above  np.exp(loss.numpy())
 0.9995844.
Now, in order to perturb the image, we will be adding a quantity $\delta$ to it such that when it gets added to the input image, the loss increases. Keep in mind that while perturbing the image, we will not be disturbing its visual semantics i.e., to our eyes, the image should be appearing as a hog. Enough talk, let us see this in action 
scc_loss = tf.keras.losses.SparseCategoricalCrossentropy()
for t in range(50):
with tf.GradientTape() as tape:
tape.watch(delta)
inp = preprocess_input(image_tensor + delta)
predictions = resnet50(inp, training=False)
loss =  scc_loss(
tf.convert_to_tensor([341]),
predictions
)
if t % 5 == 0:
print(t, loss.numpy())
# Get the gradients
gradients = tape.gradient(loss, delta)
# Update the weights
optimizer.apply_gradients([(gradients, delta)])
# Clip so that the delta values are within [0,1]
delta.assign_add(clip_eps(delta))
Here’s how clip_eps
is implemented 
# Clipping utility so that the pixel values stay within [0,1]
EPS = 2./255
def clip_eps(delta_tensor):
return tf.clip_by_value(delta_tensor, clip_value_min=EPS, clip_value_max=EPS)
I used Adam as the optimizer with a learning rate of 0.1. Both the EPS
and learning rate hyperparameters play a vital role here. Even the choice of optimizer can significantly affect the loss landscape.
This form of gradient descentbased optimization is also referred to as Projected Gradient Descent.
If we finally add delta
to our original image, it does change that much 
> Figure 6: A synthetically generated adversarial example. <
And when we run inference 
# Generate prediction
perturbed_image = preprocess_input(image_tensor + delta)
preds = resnet50.predict(perturbed_image)
print('Predicted:', tf.keras.applications.resnet50.decode_predictions(preds, top=3)[0])
We get  Predicted: [('n01883070', 'wombat', 0.9999833), ('n02909870', 'bucket', 9.77628e06), ('n07930864', 'cup', 1.6957516e06)]
. We don’t see any sign of hog in the top3 predictions 😂
Okay, this is fun, but what if we wanted the model to predict the class of our choice? For example, I want the model to predict the hog as a Lakeland_terrier
. Note that this target class should belong to the ImageNet dataset in this case. Generally, we would want to choose this target class on which a given model has been trained on. This attack is also known as targeted attack.
The only part that gets changed in the previous code listing we saw is this one 
loss =  scc_loss(
tf.convert_to_tensor([341]),
predictions
)
Instead, we do 
loss = ( scc_loss(tf.convert_to_tensor([true_index]), predictions) +
scc_loss(tf.convert_to_tensor([target_index]), predictions))
This expression allows us to update the delta
in such a way that the probability of the model to predict the true class gets minimized and to predict the target class gets maximized. After a round of optimization, we get the same image (apparently) but get the predictions as  Predicted: [('n02095570', 'Lakeland_terrier', 0.07326998), ('n01883070', 'wombat', 0.037985403), ('n02098286', 'West_Highland_white_terrier', 0.034288753)]
.
> Figure 7: A synthetically generated adversarial example (targeted this time). <
We just saw how to fool one of the most influential deep learning architectures in merely ten code lines. As responsible deep learning engineers, this should not be the only thing we would crave for. However, instead, we should be exploring ways to train deep learning models with some adversarial regularization. We will be getting to that in a later report.
At this point, you are probably wondering how does delta
look like? Absolutely random.
> Figure 8: The learned delta
vector. <
Moreover, to get delta
to an observable condition you would need to zoom it quite a bit  plt.imshow(50*delta.numpy().squeeze()+0.5)
.
Until now, the techniques we studied are mostly based around images but note that these apply to other modalities of data. If you are interested to know more about how it may look like in text, check out TextAttack.
This section presents the susceptibility of different optimizers to the synthetic attacks. We have only seen the Adam optimizer in action till now (with a learning rate of 0.1 and other default hyperparameters of tf.keras
). Let’s see what the other optimizers respond to the attacks.
This section shows the class activation maps retrieved from our network's predictions on different perturbed images. I used the code from this blog post by PyImageSearch to generate these activation maps.
I hope this report provided you with an intuitive introduction to adversarial examples in deep learning. The techniques we discussed are inspired by the Fast Gradient Sign Method, as demonstrated by Goodfellow et al. in their paper  Explaining and Harnessing Adversarial Examples. Additionally, this paper discusses a wide variety of different adversarial attacks and, most notably, how to defend against them.
You might be wondering if adversarial examples are so negatively impactful why can’t we just train our networks to be adversarially robust. Well, of course, you can! This report could have been an overkill if I included it. This report is a part of a minicrash course on adversarial examples and adversarial training in deep learning, so definitely stay tuned if you are interested to know more about this field, including training neural networks to be adversarially aware.
Huge thanks to Ishaan Malhi and Jack Morris for providing inputs on the early drafts of the report.