What happens when we use torch.no_grad() in the middle of forward pass?

seyeeet · September 13, 2021, 4:12pm

I see some codes use with torch.no_grad() in the middle of forward pass in their model, I was wonder what will happen when we do it? does it mean no gradient for the part under the with torch.no_grad()? what are the things that I need to be careful about when I do it?

googlebot · September 13, 2021, 5:22pm

yes, it affects newly created tensors within wrapped code block, disabling usual .requires_grad propagation

seyeeet · September 14, 2021, 1:53am

so would it makes sense to say that if I put that in the middle of my operation in the forward pass then the gradient will flow but it wont update the parameters for that specific area under the with torch.no_grad()?

googlebot · September 14, 2021, 6:04am

it breaks the flow, e.g.:

b=f1(a)
with no_grad():
  c=f2(b,p)
d=f3(c)

if p is a parameter, it disables its update, yes. but c->b flow is also disabled, and then b->a edge is never reached.
so, in this situation, you’d normally use c = f2(b, p.detach()) instead; no_grad() in forward() is more for auxiliary paths, not for segments in the middle.