Pytorch 0.3.0 equivalent of "with torch.no_grad()"

KFrank · September 16, 2019, 5:02pm

Hello Forum!

I am, for whatever reason, attempting to convert a simple
pytorch-1.0.1 training script to pytorch version 0.3.0.

As far as I can tell, torch.no_grad() doesn’t exist in 0.3.0.

My use case is that (in the training loop) I run the model
(predict, based on my input data), and then do some sanity
checks and collect some statistics. I wrap these latter two
in a with torch.no_grad(): block. I don’t really understand
what I am doing (but it works …), but I suppose I avoid
cluttering up my graph with cruft (efficiency) and/or changing
the results of calling loss.backward() (correctness).

What should I do in 0.3.0? Can I just leave out the
with torch.no_grad() (because my additional calculations
come after calculating the loss, so the gradients won’t be
affected)? Should I clone / detach any tensors I use in my
sanity / statistics calculations? Is there something in 0.3.0
like no_grad() that I should be using (but with a different
name or semantics)?

Thanks for any help.

K. Frank

ptrblck · September 16, 2019, 5:20pm

In 0.3 you would use Variable(torch.tensor(...), volatile=True) to avoid creating the computation graph.
Let me know, if that works for you.

KFrank · September 16, 2019, 7:32pm

Hi Peter!

Thank you for your reply.

To clarify in my concrete case:

The output of running the forward pass, for training, of my model
is a torch.autograd.variable.Variable. Should I retrieve its
FloatTensor by using its data property, wrap that tensor in a
volatile Variable, a la:

torch.autograd.variable.Variable (preds.data, volatile = True)

and then use this volatile Variable for any additional calculations?

Thanks.

K. Frank

ptrblck · September 16, 2019, 7:58pm

Hi Frank,

this work flow would detach the output from the model and all following operation put in a "no_grad block".
preds would still hold to the computation graph. The Variable wrapping preds.data is detached and Autograd does not track any operations on it.
Is this the use case?

KFrank · September 16, 2019, 8:48pm

Hello Peter!

Thank you for your clarifying reply.

Yes, I believe that it is.

I will follow up when I’ve I tried this out (and if and when I think I
understand it …).

Best.

K. Frank

KFrank · September 17, 2019, 2:20am

Hi Peter!

Thank you again. Yes, this works. With this (plus a couple of
other 0.3.0 tweaks) my 0.3.0 training run appears to be statistically
identical to the 1.0.1 version, so it looks like I’m not corrupting
the gradients, or anything.

Best.

K. Frank