Suppose we have the following setup:
optimizer = Adam(model.parameters())

for (data,labels) in dataloader:
output = model(data)
loss = F.nll_loss(output,labels)
loss.backward()
data.grad // will this be reset to zero after every iteration by optimizer.zero_grad() ?
optimizer.step()
optimizer.zero_grad()

so my question is will optimizer.zero_grad set the gradients with respect to the data to zero as well, and more generally will it set the gradients of parameters not tracked by the optimizer to zero ?

But in my experiments I observe that for data.grad the value is reset to 0, and I think that is because for every iteration within the dataloader loop, data is a new tensor (since we have different datapoints for every new iteration), and consequently data.grad is set to none, and after loss.backward() data.grad gets a new unaccumulated gradient. Is this correct ?