Require_grad in leaf node

Ali_Mirzaeyan · May 21, 2020, 4:54am

Hi, I don’t understand why this piece of code works not properly. I enabled the “required_grad” for the input dataset, then normalize it, and when I backward through the model (M_O) I expect that “data_batch_.grad.data” has some values other than zero but it always zero. Something is getting wrong in the normalize function which I don’t know what it is. I’m saying that because when I bypass the normalize function the gradients are not zero ( works correctly ).


def normalize(t, mean, std, dataset):

    if dataset == 'mnist':
        t[:, 0, :] = (t[:, 0, :] - mean[0]) / std[0]
    if dataset == 'cifar10':
        t[:, 0, :, :] = (t[:, 0, :, :].detach() - mean[0]) / std[0]
        t[:, 1, :, :] = (t[:, 1, :, :].detach() - mean[1]) / std[1]
        t[:, 2, :, :] = (t[:, 2, :, :].detach() - mean[2]) / std[2]

    return t

....
    criterion = nn.CrossEntropyLoss()
    criterion = criterion.to(device)
...
     for i, (data_batch, labels_batch) in enumerate(adv_iterator):
                data_batch = data_batch.to(device)
                labels_batch = labels_batch.to(device)

                data_batch_ = normalize(data_batch.clone().detach(), mean, std, chose_dataset)
                **data_batch_ = torch.tensor(data_batch_, requires_grad=True, device=device)**
               # **data_batch_ = data_batch**

                out_adv, _, _ = M_O(data_batch_)
                loss = criterion(out_adv, labels_batch)

                M_O.zero_grad()
                loss.backward()

                noise = torch.sign(data_batch_.grad.data)

                print(noise[0])
...

albanD · May 21, 2020, 2:49pm

Hi,

First, you should never use .data as it mostly leads to issues.

Second, you call .detach() on the data_batch before passing it to the normalize function. So you explicitly break the graph. This is why you don’t get any gradients back.

Ali_Mirzaeyan · May 21, 2020, 3:14pm

I detached the “data_batch”, however I want the gradient of “data_batch_” ( which is defined based on the normalized “data_batch_”), so the graph should be fine from “data_batch_” to the end (loss function), am I wrong ?

albanD · May 21, 2020, 3:30pm

Ho sorry I got confused because you were mentioning your normalize function. But it is not relevant here since you create your leaf after it?
One thing you can do to be sure is data_batch_.register_hook(print). This will print the gradient of data_batch_ when it is computed.
If it prints, that means that the gradients are properly computed and they just happen to be 0. most likely because you have non-differentiable ops later in your network.
If it does not print, then it means that your break the graph somewhere below inside your network.

Ali_Mirzaeyan · May 21, 2020, 3:38pm

I don’t know what is going on inside normalize that causes the problem, because like I said when I just skipped the “normalize” function it return correct answer.
I mean if I replace these line of code

                data_batch_ = normalize(data_batch.clone().detach(), mean, std, chose_dataset)
                data_batch_ = torch.tensor(data_batch_, requires_grad=True, device=device)
               # **data_batch_ = data_batch*

with just this
data_batch_ = data_batch
it shows me some expected results. So I infer that something is off about this normalize function.

albanD · May 21, 2020, 4:01pm

In this case, can you replace the second line with:

data_batch_ = data_batch_.detach().to(device).requires_grad_()

To make sure you get a proper leaf?

Ali_Mirzaeyan · May 21, 2020, 4:04pm

still the same results

albanD · May 21, 2020, 4:08pm

Can you create a code sample (30/40) lines that I will be able to run that shows this please?

Ali_Mirzaeyan · May 21, 2020, 4:15pm

I changed the normalize code from

def normalize(t, mean, std, dataset):

    if dataset == 'mnist':
        t[:, 0, :] = (t[:, 0, :] - mean[0]) / std[0]
    if dataset == 'cifar10':
        t[:, 0, :, :] = (t[:, 0, :, :].detach() - mean[0]) / std[0]
        t[:, 1, :, :] = (t[:, 1, :, :].detach() - mean[1]) / std[1]
        t[:, 2, :, :] = (t[:, 2, :, :].detach() - mean[2]) / std[2]

    return t

to this

def normalize(t, mean, std, dataset):
    return t

and It is working, so I assuming that I should rewrite the code inside the normalize function differently.