Require_grad in leaf node

Hi, I don’t understand why this piece of code works not properly. I enabled the “required_grad” for the input dataset, then normalize it, and when I backward through the model (M_O) I expect that “data_batch_.grad.data” has some values other than zero but it always zero. Something is getting wrong in the normalize function which I don’t know what it is. I’m saying that because when I bypass the normalize function the gradients are not zero ( works correctly ).


def normalize(t, mean, std, dataset):

    if dataset == 'mnist':
        t[:, 0, :] = (t[:, 0, :] - mean[0]) / std[0]
    if dataset == 'cifar10':
        t[:, 0, :, :] = (t[:, 0, :, :].detach() - mean[0]) / std[0]
        t[:, 1, :, :] = (t[:, 1, :, :].detach() - mean[1]) / std[1]
        t[:, 2, :, :] = (t[:, 2, :, :].detach() - mean[2]) / std[2]

    return t
....
    criterion = nn.CrossEntropyLoss()
    criterion = criterion.to(device)
...
     for i, (data_batch, labels_batch) in enumerate(adv_iterator):
                data_batch = data_batch.to(device)
                labels_batch = labels_batch.to(device)

                data_batch_ = normalize(data_batch.clone().detach(), mean, std, chose_dataset)
                **data_batch_ = torch.tensor(data_batch_, requires_grad=True, device=device)**
               # **data_batch_ = data_batch**

                out_adv, _, _ = M_O(data_batch_)
                loss = criterion(out_adv, labels_batch)

                M_O.zero_grad()
                loss.backward()

                noise = torch.sign(data_batch_.grad.data)

                print(noise[0])
...

Hi,

First, you should never use .data as it mostly leads to issues.

Second, you call .detach() on the data_batch before passing it to the normalize function. So you explicitly break the graph. This is why you don’t get any gradients back.

I detached the “data_batch”, however I want the gradient of “data_batch_” ( which is defined based on the normalized “data_batch_”), so the graph should be fine from “data_batch_” to the end (loss function), am I wrong ?

Ho sorry I got confused because you were mentioning your normalize function. But it is not relevant here since you create your leaf after it?
One thing you can do to be sure is data_batch_.register_hook(print). This will print the gradient of data_batch_ when it is computed.
If it prints, that means that the gradients are properly computed and they just happen to be 0. most likely because you have non-differentiable ops later in your network.
If it does not print, then it means that your break the graph somewhere below inside your network.

I don’t know what is going on inside normalize that causes the problem, because like I said when I just skipped the “normalize” function it return correct answer.
I mean if I replace these line of code

                data_batch_ = normalize(data_batch.clone().detach(), mean, std, chose_dataset)
                data_batch_ = torch.tensor(data_batch_, requires_grad=True, device=device)
               # **data_batch_ = data_batch*

with just this
data_batch_ = data_batch
it shows me some expected results. So I infer that something is off about this normalize function.

In this case, can you replace the second line with:

data_batch_ = data_batch_.detach().to(device).requires_grad_()

To make sure you get a proper leaf?

still the same results

Can you create a code sample (30/40) lines that I will be able to run that shows this please?

I changed the normalize code from

def normalize(t, mean, std, dataset):

    if dataset == 'mnist':
        t[:, 0, :] = (t[:, 0, :] - mean[0]) / std[0]
    if dataset == 'cifar10':
        t[:, 0, :, :] = (t[:, 0, :, :].detach() - mean[0]) / std[0]
        t[:, 1, :, :] = (t[:, 1, :, :].detach() - mean[1]) / std[1]
        t[:, 2, :, :] = (t[:, 2, :, :].detach() - mean[2]) / std[2]

    return t

to this

def normalize(t, mean, std, dataset):
    return t

and It is working, so I assuming that I should rewrite the code inside the normalize function differently.