I am, for whatever reason, attempting to convert a simple
pytorch-1.0.1 training script to pytorch version 0.3.0.
As far as I can tell, torch.no_grad() doesn’t exist in 0.3.0.
My use case is that (in the training loop) I run the model
(predict, based on my input data), and then do some sanity
checks and collect some statistics. I wrap these latter two
in a with torch.no_grad(): block. I don’t really understand
what I am doing (but it works …), but I suppose I avoid
cluttering up my graph with cruft (efficiency) and/or changing
the results of calling loss.backward() (correctness).
What should I do in 0.3.0? Can I just leave out the with torch.no_grad() (because my additional calculations
come after calculating the loss, so the gradients won’t be
affected)? Should I clone / detach any tensors I use in my
sanity / statistics calculations? Is there something in 0.3.0
like no_grad() that I should be using (but with a different
name or semantics)?
The output of running the forward pass, for training, of my model
is a torch.autograd.variable.Variable. Should I retrieve its FloatTensor by using its data property, wrap that tensor in a
volatile Variable, a la:
this work flow would detach the output from the model and all following operation put in a "no_grad block". preds would still hold to the computation graph. The Variable wrapping preds.data is detached and Autograd does not track any operations on it.
Is this the use case?
Thank you again. Yes, this works. With this (plus a couple of
other 0.3.0 tweaks) my 0.3.0 training run appears to be statistically
identical to the 1.0.1 version, so it looks like I’m not corrupting
the gradients, or anything.