Thanks a lot for this very detailed explanation, Simon! @SimonW
I’m not sure what you mean by optimization being unrolled into many layers. Optimization (you probably mean model optimization) is not even applied here. There is no history, no gradients, only forward results.
‘Optimization’ was a typo, I actually mean computation. For inference based models, e.g. in Adapt DeconvNet, the coefficients in each layer are computed by solving an optimization, which requires many iterations of convolution and transpose convolution. Then the coefficients are sent to the next layer. In BP based networks, the coefficients are solved by a single convolution, which is forward. My previous concern was that if I manually implement the inference optimization, which is similar to the for loop I provided earlier, the underlying PyTorch implementation might build some graph or other structures. Building this graph for an inference optimization is like unrolling the optimization into many forward layers. According to what you said, it seems in this case (requires_grad=False or volatile=True) PyTorch won’t do anything heavy.
Why does dynamic graph imply being not safe?
Just as the above explanation, by ‘safe’ I really mean there is no additional expensive behaviors beyond just the tensor computation.
You have some valid point, but it’s not really that we are narrow minded. It was a design choice to separate things that tracks history for BP (which became Variable) and things that don’t (which became Tensor). And the former is naturally in the .autograd namespace.
Separating Variable and Tensor was great! That simplifies a lot of code migration from CPU numpy code to GPU Tensor code. This was one of the reasons a few peers switched from Tensorflow to PyTorch. Another big reason is this forum is awesome!
And due to popularity reasons and code structure, things like conv layers are only directly supported on Variables. (you can also make them work directly on tensors with a bit of work.)
It’s why I said ‘yet this step is so easy to make’. Actually I was thinking about writing a set of convolution functionals for Tensors. If you can give me some hints or pointers, that will be great! When you use Variable under autograd package to build an inference based model, it generally makes people worry about the graph or other overhead. I actually discussed this with a few other users of PyTorch and Tensorflow. Our general consensus is that when you try to build an inference model, this design makes things much less straight forward. And overall, only Tensors and convolutions (for Tensors) are needed.
Futhermore, this volatile=True option (you don’t need requires_grad=False if you set volatile=True) already gives you very similar experience to directly working on tensors.
Okay, then I will use this while waiting for the next version.
Moreover, as @albanD mentioned above, we have merged the two classes together.
I didn’t completely get this. Does that mean when they are merged, there will be a set of convolution function provided for clean tensors? If not, I would like to start to implement these functions though I might need some minimal instructions.
So I don’t really get the reason for this complaint.
Maybe narrow minded was a bit too strong. (We actually discussed what was the right word to express the feeling. It’s really just a wish that PyTorch can become a better framework for theoretical modeling development.) We actually think PyTorch did a great job on the Tensor part. If a set of convolution functions are provided for clean Tensors, the framework is going to be much more perfect to support many inference based models. If my complaint was too strong, I apologize for that since PyTorch really did a good job.