Do I need to have requires_grad=True for input when switch From Pytorch 0.3 to 1.0

In pytorch 0.3 we used to have Variable and when training we needed to do Variable(input).
Therefore, in this way input.requires_grad became True.
so my assumption was that input.requires_grad should always be true for training. is that true?
but now im reading ‘training a classifier’ in pytorch website and see that the input.requires_grad is not specified to be true at th begining. but eventually i becomes true after sending that to the network.
So am i misunderstanding something? :slight_smile:

Wrapping a tensor into Variable didn’t change the requires_grad attribute to True.
You had to specify it while creating the Variable:

x = Variable(torch.randn(1), requires_grad=True)

Usually you don’t need gradients in your input. However, gradients in the input might be needed for some special use cases e.g. creating adversarial samples.

4 Likes

Got it, thank you very much for the clarification :slight_smile:

Speaking of adversarial samples, I’m having issues with generating gradients w.r.t. inputs. I’m currently adapting an existing virtual adversarial training code (https://github.com/naoto0804/pytorch-VAT) repository to fit my needs. I want to compute the gradient of the output of a model w.r.t. the input on the model. Unfortunately, when computing the gradient w.r.t to the input, I get “None”. The problem occurs in the VATLoss module specified below. Argument x would be the input.


class VATLoss(nn.Module):

    def __init__(self, xi=10.0, eps=1.0, ip=1):
        """VAT loss
        :param xi: hyperparameter of VAT (default: 10.0)
        :param eps: hyperparameter of VAT (default: 1.0)
        :param ip: iteration times of computing adv noise (default: 1)
        """
        super(VATLoss, self).__init__()
        self.xi = xi
        self.eps = eps
        self.ip = ip

    def forward(self, model, x):
        with torch.autograd.set_grad_enabled(False):
            pred = F.softmax(model(x), dim=1)

        # prepare random unit tensor
        d = torch.rand(x.shape, device='cuda:0').sub(0.5)
        d_ = _l2_normalize(d)

        with _disable_tracking_bn_stats(model):
            # calc adversarial direction
            for _ in range(self.ip):
                d.requires_grad = True
                pred_hat = model(x + self.xi * d_)
                logp_hat = F.log_softmax(pred_hat, dim=1)
                adv_distance = F.kl_div(logp_hat, pred, reduction='batchmean')
                adv_distance.backward()
                d = _l2_normalize(d.grad)
                model.zero_grad()

                # calc LDS
                r_adv = d * self.eps
                pred_hat = model(x + r_adv)
                logp_hat = F.log_softmax(pred_hat, dim=1)
                lds = F.kl_div(logp_hat, pred, reduction='batchmean')

        return lds

Unfortunately, d.grad returns None. When calling this function, I pass to the argument x a PyTorch nn.Parameter(unlabeled_sample, requires_grad = True). I tried using d_grad = torch.autograd.grad(adv_distance, d) instead of adv_distance.backward() to get the gradient with respect to d to it still returns None. Does this lack of gradient computation for d have to do with the fact that it’s not linked to any optimizer? Also when debugging I saw that right before adv_distance.backward(), d attributes indicate it has is_leaf = True, requires_grad = True and _version = 2, Any ideas?

I added _l2_normalize(d) to show what’s inside.


def _l2_normalize(d):
    d_reshaped = d.view(d.shape[0], -1, *(1 for _ in range(d.dim() - 2)))
    d /= torch.norm(d_reshaped, dim=1, keepdim=True) + 1e-8
    return d

I’m using PyTorch 1.0.1. Help!

JP

Where do you print d.grad?
If I just add a print statement after adv_distance.backward(), I’ll get a valid gradient for d.
Also, if I register d as an attribute, I can print the gradient successfully.

I didn’t print d.grad, I was checking the attribute through the debugger of PyCharm. When you say “register d as an attribute”, you mean creating a self.d = d, in __init__(...) ?

1 Like

I just registered it in the forward method:

 def forward(self, model, x):
        with torch.autograd.set_grad_enabled(False):
            pred = F.softmax(model(x), dim=1)

        # prepare random unit tensor
        d = torch.rand(x.shape).sub(0.5)
        d_ = _l2_normalize(d)

        self.d_dummy = d
        ....

criterion = VATLoss()
model = nn.Linear(10, 10)
x = torch.randn(1, 10)
loss = criterion(model, x)
loss.backward()

print(criterion.d_dummy.grad)
>tensor([[ 0.1734, -0.1360, -0.5533, -0.0225,  0.5100, -0.3970,  0.3045, -0.3553,
         -0.0289, -0.0856]])

I implemented your modifications, and when I call adv_distance.backward(...) inside of the forward(...) function of VATLoss I still get None:

[...]
 def forward(self, model, x):
        with torch.autograd.set_grad_enabled(False):
            pred = F.softmax(model(x), dim=1)

        # prepare random unit tensor
        d = torch.rand(x.shape, device='cuda:0').sub(0.5)
        d_ = _l2_normalize(d)

        with _disable_tracking_bn_stats(model):
            # calc adversarial direction
            for _ in range(self.ip):
                self.dummy_d = d_
                self.dummy_d.requires_grad = True
                pred_hat = model(x + self.xi * self.dummy_d)
                logp_hat = F.log_softmax(pred_hat, dim=1)
                adv_distance = F.kl_div(logp_hat, pred, reduction='batchmean')
                adv_distance.backward(retain_graph = True)
                print(self.dummy_d.grad)
                d_ = _l2_normalize(self.dummy_d.grad)
                model.zero_grad()
[...]

The print(…) returns None. The code upstream looks like:

[...]
    model.train()
    optimizer.zero_grad()

    unlabeled_sample = nn.Parameter(unlabeled_sample, requires_grad=True)

    VATLoss_ = VATLoss()
    vat_loss_ = VATLoss_(model, unlabeled_sample) / len(unlabeled_sample)
[...]

Should I call the backward() outside of the forward() of VATLoss instead?

I’m not sure what your code is actually doing, but I get valid gradients even inside forward:

...
adv_distance.backward()
print(d.grad)
print(self.d_dummy.grad)
...
> tensor([[ 1.7061, -0.8534,  0.1437,  0.7026,  0.3845,  0.9892,  1.0329, -1.8058,
         -2.0117, -0.1880]])
tensor([[ 1.7061, -0.8534,  0.1437,  0.7026,  0.3845,  0.9892,  1.0329, -1.8058,
         -2.0117, -0.1880]])

It looks like you are trying to create some adversarial sample inside the forward of VATLoss, but apparently you are recreating it in each iteration from scratch?
I think it might be better to create a new topic and explain your use case a bit more so that others might have a look at this issue. :wink:

1 Like