Looking at the examples it seems there’s two ways to initialize a network.
The first is to use nn.Sequential
, to which one passes, in order, the layerwise operations one wants a network to have. The other is define a class inheriting from Module that then contains an __init__
and forward
(and optionally backward
method), where in __init__
one explicitly defines the layerwise operations the network is composed of, and in __forward__
the calculations necessary to go from input
-> output
.
As I understand the second method is useful for when one has a more complicated structure, something recursive for example. But then I don’t understand why, save for the one example in the tutorial examples, I never see nn.Sequential
being used anywhere. Even in something as simple as a mnist example https://github.com/pytorch/examples/blob/master/mnist/main.py ? Could there be more advantages to using the second method (instead of just passing modules to nn.Sequential
)?
Onto weight initialization. I’m not sure what is the proper way to do this. It seems there’s again two options here. The first is two loop over the modules and then depending on the instance
perform an operation or not (looking into the docs it seems Linear
has fields weight
and bias
). Or one could use the parameters
generator function of the network, although I’m not sure how one would differentiate in this case between parameters one wants to change and ones one doesn’t.
During some playing around I noticed, due to using the ELU
activation function, that it seems the alpha
parameter is also included in the parameters
of the model. Does this mean it is also a parameter that will be optimized? How can I disable that (the equivalent of requires_grad=False
on a tensor).
Also, have I understand correctly that the type of my features and targets defines where the operations will be run on? How can I pick a specific GPU if I have multiple?