Requires_grad and leaf nodes

2020-05-14T04:00:00Z

In this tutorial, the following code is used for forward propagation:

# zero the parameter gradients
optimizer.zero_grad()

# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
    outputs = model(inputs)
    _, preds = torch.max(outputs, 1)
    loss = criterion(outputs, labels)

    # backward + optimize only if in training phase
    if phase == 'train':
        loss.backward()
        optimizer.step()

# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)

However, could they have not done this instead?

# zero the parameter gradients
optimizer.zero_grad()

inputs.requires_grad = True
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)

# backward + optimize only if in training phase
if phase == 'train':
    loss.backward()
    optimizer.step()

# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)

Since inputs is a leaf node and if a leaf node has requires_grad=True
then all subsequent tensors will also have requires_grad=True ?

Hi,

You don’t actually need gradients for inputs here. All the parameters inside the model already require gradients and you don’t need to set it on inputs.

The goal is to disable the autograd when you’re not training to reduce the memory consumption (because it won’t build the computational graph and won’t save the necessary intermediary results).
So yes, you can remove it if you want, it will work as well, but it will use more memory.

Hi @albanD,

Thanks a lot for your quick response.

So just to be clear, if any one of the following:

  • The input to the model
  • All model parameters

Has requires_grad=True, then gradients will be tracked, correct? This is because all operations involve one of these two components right?

Also, does model.train() set requires_grad=True for all the parameters? If not, what model.train() actually do internally? Similarly, does model.eval() set requires_grad=False for all parameters? If not, what does it do internally?

In addition to my last response, I do not see requires_grad being set to True for the parameters of the resnet18 model anywhere in the code, so when exactly did gradients start to be tracked for the parameters?

Thanks again for your help,
Mahmoud

Hi,

Has requires_grad=True , then gradients will be tracked, correct?

Basically when you set requires_grad=True on a Tensor, you ask for gradients for that Tensor (assuming a leaf Tensor here).
So if you set requires_grad on the input, it will compute the gradient wrt to the input.
If you set it on the parameters, it will compute the gradients for the parameters.
And if you set both, it will compute both.
So you should only set it for the Tensor for which you actually want gradients. Otherwise you might increase memory/runtime for no reason.

Also, does model.train() set requires_grad=True for all the parameters?

.train() changes the behavior of the nn.Module. requires_grad informs the autograd whether gradient should be computed for this Tensor or not.
Same for eval.
Train and eval change the behavior of nn.Modules for training or evaluation. For example dropout is deactivated during evaluation. And batchnorm uses saved statistics during training instead of the ones computed on the current batch.

when exactly did gradients start to be tracked for the parameters?

nn.Parameter have require_grad=True by default. So when you create an nn.Parameter, it is always a leaf Tensor that requires grad (unless you pass requires_grad=False to the constructor).

1 Like

Thanks a lot for the help!