Is it necessary to overwrite the forward method for a class that inherits from nn.Module?

Raphikowski · October 1, 2019, 8:52am

Hi all,

I have a class that represents a shallow neural network (just one layer), which is why I have implemented that layer by myself via matrix multiplications (i.e., without using any of the nn.linear or other preimplemented layers). I have implemented this directly in a loss function and have therefore no forward method. The network trains as expected, but are there any disadvantages or pitfalls when not overwriting that forward method (my class inherits from nn.Module)?

ptrblck · October 1, 2019, 10:33am

How did you implement your class?
If you like the manual approach, you don’t need to derive a class from nn.Module and can just write the operations as you wish. An nn.Module class helps in managing the internal parameters and makes some workflows easier, e.g. pushing all parameters to the GPU by calling .to('cuda').

Raphikowski · October 1, 2019, 10:43am

Thank you for your answer.
So the class basically consists of an init where I initializie the matrices as nn.Parameters to enable training and then I have a loss function where I calculate the loss using these matrices and some external input (the data). Do you think there will be any unexpected behaviour without the forward class when I inherit from nn.Module? Because I actually would like to make use of cuda since this class is only one part of a bigger model. So basically I would like to let it remain inheriting from nn.Module. Do you think this is a bad idea?

ptrblck · October 1, 2019, 11:51am

In that case, I would just implement the forward method, since I assume your current loss function is probably doing the same the forward would do.
If you implement the forward method, you can call the module directly (my_module(data)), it will work inside an nn.Sequential module, and all hooks will be registered properly.

I’m not sure, if there are any significant differenced between your custom loss function and what forward would do.

Raphikowski · October 1, 2019, 12:03pm

Thank you for your reply.
Do you think I could also just call the forward function from within my loss function? Because I would rather call that loss function by its name and not call it ‘forward,’ since this is more understandable for other ppl working on my code.
PS: What exactly is a hook? I googled it but didnt find a comprehensive answer? Can you provide a short explanation or a useful link?

ptrblck · October 1, 2019, 12:10pm

Would calling your custom function inside the forward work?
E.g.

...
def forward(self, x):
    l = self.loss(x)
    return l

It’s a bit of nitpicking and if you are sure you won’t use hooks and always call the custom method, you could of course just go for it.
However, if other users would like to use your module in an nn.Sequential or just use it as a standard nn.Module, they will face a NotImplementedError.

Hooks are used to e.g. get intermediate activations as explained here.

Raphikowski · October 1, 2019, 12:32pm

Ok so just to be sure: Is the worst thing that could happen if I dont implement the forward method, that ppl could get a NotImplementedError, if they try to call the forward method? Or could there be some unexpected behaviour in the code execution or in the training of the model parameters, due to the hooks or sth?

ptrblck · October 1, 2019, 12:47pm

I think the worst thing would be the exception, when trying to use your module in the “normal” way, i.e. calling it directly via module(data).
Also, if users are using hooks for whatever reason, they would need to implement forward manually (which is not the standard use case, but could be annoying).

The first issue is why I would rewrite your module, if I would like to use it

Raphikowski · February 4, 2020, 1:27pm

Hi so to bring that topic up another time:
When trying to display the computational graph using tensorboard this is not possible if I have a class inheriting from nn.Module without a forward method. But if i impelment the forward method the graph can be displayed. Do you know why this is the case?
Does it not build up the computational graph correctly without a forward method? Or do you just need these hooks from the forward method to properly display the graph?

ptrblck · February 4, 2020, 4:05pm

As explained before, the forward method will be called if you call the module directly. Also hooks etc. will be properly registered.
Since the computation graph is created dynamically during the forward pass, each module will be most likely called at some point (haven’t looked in detail into tensorboard backend).

Raphikowski · February 4, 2020, 4:26pm

Thank you for your answer. I think my question is rather: Is it possible that if I dont implement the forward method and do everything by means of functions in my nn.module class that have different names (e.g., ‘likelihood’), that these functions are not added to the computational graph (even though I call them during training)?
Because tensorflow is only able to draw the computational graph as an image if I rename my custom made function ‘likelihood’ to ‘forward’. Or is that only due to the fact that this tensorboard script relies somehow on the forward function?
Baseline quesion: If I dont name my function ‘forward’ but instead ‘likelihood’ (for example) is the computational graph then still properly build up? I.e., would the results/computations change somehow if I renamed the function to ‘forward’?

ptrblck · February 5, 2020, 4:50am

The computation graph will be properly built even without any function definitions.
You could write your complete model using the functional API and Autograd will track each operation as long as you are using parameters, which require gradients.

However, if you are writing a custom nn.Module, I would still recommend to implement the forward method, as e.g. other users or libraries might rely on it.
Internally the __call__ method will call into forward, so if some users or libs use your model as:

output = model(input)

your custom function will not be called.
I assume this might be the cast for the tensorboard issue, as at some point the model might be called directly with some dummy inputs.

Raphikowski · February 5, 2020, 8:32am

Great, thanks a lot for your answer!