Autograd package in no_grad Environment

Hello Pytorch devs and users,

My question may look a bit awkward at a first glance; but it turns out to be useful in some occasions. For the model I have been working on, I need to take the derivative of a function and then, I should create its corresponding graph to have it in my cost function. Furthermore, this derivative operator should also operate during validation/testing phase, b/c it is a derivative w.r.t. input rather than parameters.

However, the problem is that when I employ with torch.no_grad() to do validation (likewise testing), autograd package naturally starts to complain. For the time being, I use the following two lines of code to do my validation; though I face deadlocks when I run my model using DistributedDataParallel to do distributed training in a single node/multi gpu environment:

myModel.eval()
myModel.requires_grad_(False)

Is there a way to use autograd package to take derivative of a function w.r.t. inputs in with torch.no_grad() environment?

Hi, I’ve also encountered this problem recently, because my network requires some gradient as input (like an error signal). I searched and asked someone and there seems to be impossible to calculate gradient in torch.no_grad(). So I have to do some hacks by only applying torch.no_grad() to the pipelines that don’t require grad, while manually cleaning the grad after calculating the derivative I need

Oh. I see. I thought so. Well… Hope springs eternal :wink: like they say… Anyway, the cumbersome issue with my model is that the autograd is required deep beneath within one of my modules; hence there is no even a slim chance for me to use torch.no_grad() during validation. Probably, I will continue with the workaround which disables autograd on weights (while keeping it functional for the input) during validation phase:

myDDP.module.eval()
myDDP.module.requires_grad_(False)

Then, I will put it back in training mode:

myDDP.module.train()
myDDP.module.requires_grad_(True)

It seems this is the only option for me to both avoid deadlock caused by DDP at single node/multi-GPU training and keep autograd package functional on inputs during validation phase.

Thanks for the reply, @Dazitu616