Suppose we have the following setup:
optimizer = Adam(model.parameters())
for (data,labels) in dataloader:
output = model(data)
loss = F.nll_loss(output,labels)
data.grad // will this be reset to zero after every iteration by optimizer.zero_grad() ?
so my question is will optimizer.zero_grad set the gradients with respect to the data to zero as well, and more generally will it set the gradients of parameters not tracked by the optimizer to zero ?
No it will only reset the gradient of the parameters associated with it.
Thank you for your response.
But in my experiments I observe that for data.grad the value is reset to 0, and I think that is because for every iteration within the dataloader loop, data is a new tensor (since we have different datapoints for every new iteration), and consequently data.grad is set to none, and after loss.backward() data.grad gets a new unaccumulated gradient. Is this correct ?
If data is a different Tensor, then yes it won’t have any existing .grad field.