Confused about how to setup a NN

Looking at the examples it seems there’s two ways to initialize a network.
The first is to use nn.Sequential, to which one passes, in order, the layerwise operations one wants a network to have. The other is define a class inheriting from Module that then contains an __init__ and forward (and optionally backward method), where in __init__ one explicitly defines the layerwise operations the network is composed of, and in __forward__ the calculations necessary to go from input -> output.

As I understand the second method is useful for when one has a more complicated structure, something recursive for example. But then I don’t understand why, save for the one example in the tutorial examples, I never see nn.Sequential being used anywhere. Even in something as simple as a mnist example ? Could there be more advantages to using the second method (instead of just passing modules to nn.Sequential)?

Onto weight initialization. I’m not sure what is the proper way to do this. It seems there’s again two options here. The first is two loop over the modules and then depending on the instance perform an operation or not (looking into the docs it seems Linear has fields weight and bias). Or one could use the parameters generator function of the network, although I’m not sure how one would differentiate in this case between parameters one wants to change and ones one doesn’t.
During some playing around I noticed, due to using the ELU activation function, that it seems the alpha parameter is also included in the parameters of the model. Does this mean it is also a parameter that will be optimized? How can I disable that (the equivalent of requires_grad=False on a tensor).

Also, have I understand correctly that the type of my features and targets defines where the operations will be run on? How can I pick a specific GPU if I have multiple?

1 Like

to answer your three questions:

  1. We chose to make the examples to be best practices. We dont suggest users to use sequential except for basic convenience. Sequential becomes inflexible very quickly.

  2. You can use this recently added function to filter out just the ELU parameters and not send them to the optimizer.

  3. you can use the environment variable CUDA_VISIBLE_DEVICES=“device_id” to control which GPU to use. For example CUDA_VISIBLE_DEVICES=2 python # uses GPU-3

1 Like

Thank you for your answers.