I see that nearest
mode performs gradient backward correctly in the following example:
import torch, torch.nn.functionasl as F
x = torch.randn(1,1,5,5,requires_grad=True)
y = F.interpolate(x, size=(10,10), mode='nearest')
y.sum().backward()
print(x.grad)
Can you check if any other operation that you use breaks the computation graph or if the gradients are really 0
from the later layers?