Hello Pytorch devs and users,
My question may look a bit awkward at a first glance; but it turns out to be useful in some occasions. For the model I have been working on, I need to take the derivative of a function and then, I should create its corresponding graph to have it in my cost function. Furthermore, this derivative operator should also operate during validation/testing phase, b/c it is a derivative w.r.t. input rather than parameters.
However, the problem is that when I employ
with torch.no_grad() to do validation (likewise testing), autograd package naturally starts to complain. For the time being, I use the following two lines of code to do my validation; though I face deadlocks when I run my model using
DistributedDataParallel to do distributed training in a single node/multi gpu environment:
Is there a way to use autograd package to take derivative of a function w.r.t. inputs in
with torch.no_grad() environment?
Hi, I’ve also encountered this problem recently, because my network requires some gradient as input (like an error signal). I searched and asked someone and there seems to be impossible to calculate gradient in
torch.no_grad(). So I have to do some hacks by only applying
torch.no_grad() to the pipelines that don’t require grad, while manually cleaning the grad after calculating the derivative I need
Oh. I see. I thought so. Well… Hope springs eternal like they say… Anyway, the cumbersome issue with my model is that the autograd is required deep beneath within one of my modules; hence there is no even a slim chance for me to use
torch.no_grad() during validation. Probably, I will continue with the workaround which disables autograd on weights (while keeping it functional for the input) during validation phase:
Then, I will put it back in training mode:
It seems this is the only option for me to both avoid deadlock caused by DDP at single node/multi-GPU training and keep autograd package functional on inputs during validation phase.
Thanks for the reply, @Dazitu616