Generating adversarial perturbations in batches

luyuwang49 · January 16, 2018, 2:43pm

Some background first: currently some popular libraries (e.g., foolbox) generate adversarial attacks per image, which means at a time the loss is computed from a single image and then the gradients are backpropagated to the input image. An example from our repo is as below.

In this way, one has to write a for loop in order to generate adversarial examples over the data. This can be slow in the situation of adversarial training in which adversarial perturbations are needed in each training epoch (code). I am wondering is it possible to generate adversarial examples in batches using PyTorch, which full utilizes the power of GPUs.

class FGSMAttack(object):
    def __init__(self, model=None, epsilon=None):
        """
        One step fast gradient sign method
        """
        self.model = model
        self.epsilon = epsilon
        self.loss_fn = nn.CrossEntropyLoss()

    def perturb(self, x_nat, y):
        """
        Given one example (x_nat, y), returns its adversarial
        counterpart with an attack length of epsilon.
        """
        x = np.copy(x_nat)

        x_var = to_var(torch.from_numpy(x), requires_grad=True)
        y_var = to_var(torch.LongTensor([int(y)]))

        scores = self.model(x_var)
        loss = self.loss_fn(scores, y_var)
        loss.backward()
        grad_sign = x_var.grad.data.cpu().sign().numpy()

        x += self.epsilon * grad_sign
        x = np.clip(x, 0, 1)

        return x

SimonW · January 16, 2018, 2:54pm

If your model support batching, the code you gave should just work on batched data. You can even do the clipping in pytorch to speed up further.

luyuwang49 · January 17, 2018, 3:36pm

Thank you for your reply. I want to be more explicit that, if (x_var, y_var) is given as a batch, then:

scores = self.model(x_var)
loss = self.loss_fn(scores, y_var)
loss.backward()

will only compute loss as a single number to be backpropagated to the input images in the batch. However, I think the perturbation on an image should be generated from its own loss. The loss should be computed by each individual image to be backpropagated. In this manner the loss should be a vector (whose length = batch size).

If this is true I guess we can still compute scores as above (which supports batching), but is there a good way to vectorize the loss computation and backpropagation in PyTorch?

SimonW · January 17, 2018, 4:26pm

I understand your concern now. Assuming your loss_fn is a standard loss that linearly combines loss for each data, then doing backward using the aggregated loss is equivalent with doing it separately because \grad_X a X = a * 1, where 1 is the tensor with same shape as X but filled with all ones.

If you are using the loss fns in nn.* and want to do it more explicitly, you can use the reduce=False kwarg (doc here: http://pytorch.org/docs/master/nn.html#loss-functions). Then the loss will be a tensor, but this usually doesn’t average across data dimensions as well. To get per-data loss, you may need to do manual averaging across dimensions within data (depending on the fn you use). Then, you can specify gradients for each data and backward by loss.backward(torch.ones(batch_size)).