Beginner: Should ReLU/sigmoid be called in the __init__ method?

OK, I see.
Let me try to boil it down to the standard approach and possible advantages of changing this approach.

In the standard use case, you are writing some kind of model with some layers. The layers hold most likely some parameters which should be trained. nn.Conv would be an example. On the other side some layers don’t have any trainable parameters like nn.MaxPool. However, usually also these are created in __init__. Other “layers” don’t have parameters and can also be seen as simple functions instead of a proper layer (at least in my opinion) like nn.ReLU.
In your forward method you are creating the logic of your forward (and thus also backward) pass.
In a very simple use case you would just call all created layers one by one passing the output of one layer to the other. However, since the computation graph is created dynamically, you can also create some crazy forward passes, e.g. using a random number to select a repetition of some layers, split your input into different parts and call different “sub modules” with each part, use some statistics of your input to select a specific part of your model, etc. You are not bound to any convention during this as long as all shapes etc. match.
For me this is one of the most beautiful parts of PyTorch. Basically you can let your imagination flow without worrying too much about some limited API which can only call layers in a sequential manner.
And this is also the reason to break some of these conventions I’ve mentioned before.

Think about a specific use case where you would like to use a conv layer, but for whatever reason you need to access and maybe manipulate its weight often. The first approach of creating the layer in __init__ and applying it in forward would certainly work. However, the weight access might be a bit cumbersome.
So how about we just store the filter weights as nn.Parameters in __init__ and just use the functional API (F.conv2d) in the forward method. Would that work at all? Sure! Since you’ve properly registered the filter weights in __init__, they will be trained as long as they are used somewhere in the computation during your forward pass.
As you can see these are somewhat advanced use cases and I wouldn’t say they are breaking some kind of PyTorch semantics. Using the functional API is totally fine for advanced use cases. I would not recommend to use the functional API for every layer from now on. It’s much easier to use nn.Modules in most use cases.

Have a look at the implementations of torchvision.models.
You will see all kind of different coding styles depending on the complexity of the problem. While simpler models might be implemented using some nn.Sequential blocks in __init__ and just calling them in forward (e.g. AlexNet, other models will be implemented in a different manner using more functional calls (e.g. Inception since it’s a bit more complicated to split and merge the activations as well as getting the aux loss).

To sum it up, my two cents are: use whatever feels good and easy for you. Although there are some “standard” approaches for some use cases, I have to say that even after working with PyTorch for a while now, I probably change the one or other coding style every few weeks (because I suddenly have the feeling the code logic is easier to follow using this new approach :wink: ). If you ask 10 devs to implement Inception, you’ll probably get 10 different but all beautiful and useful implementations.

92 Likes