I have a class that represents a shallow neural network (just one layer), which is why I have implemented that layer by myself via matrix multiplications (i.e., without using any of the nn.linear or other preimplemented layers). I have implemented this directly in a loss function and have therefore no forward method. The network trains as expected, but are there any disadvantages or pitfalls when not overwriting that forward method (my class inherits from nn.Module)?
How did you implement your class?
If you like the manual approach, you don’t need to derive a class from
nn.Module and can just write the operations as you wish. An
nn.Module class helps in managing the internal parameters and makes some workflows easier, e.g. pushing all parameters to the GPU by calling
Thank you for your answer.
So the class basically consists of an init where I initializie the matrices as nn.Parameters to enable training and then I have a loss function where I calculate the loss using these matrices and some external input (the data). Do you think there will be any unexpected behaviour without the forward class when I inherit from nn.Module? Because I actually would like to make use of cuda since this class is only one part of a bigger model. So basically I would like to let it remain inheriting from nn.Module. Do you think this is a bad idea?
In that case, I would just implement the
forward method, since I assume your current
loss function is probably doing the same the
forward would do.
If you implement the
forward method, you can call the module directly (
my_module(data)), it will work inside an
nn.Sequential module, and all hooks will be registered properly.
I’m not sure, if there are any significant differenced between your custom
loss function and what
forward would do.
Thank you for your reply.
Do you think I could also just call the forward function from within my loss function? Because I would rather call that loss function by its name and not call it ‘forward,’ since this is more understandable for other ppl working on my code.
PS: What exactly is a hook? I googled it but didnt find a comprehensive answer? Can you provide a short explanation or a useful link?
Would calling your custom function inside the
def forward(self, x):
l = self.loss(x)
It’s a bit of nitpicking and if you are sure you won’t use hooks and always call the custom method, you could of course just go for it.
However, if other users would like to use your module in an
nn.Sequential or just use it as a standard
nn.Module, they will face a
Hooks are used to e.g. get intermediate activations as explained here.
Ok so just to be sure: Is the worst thing that could happen if I dont implement the forward method, that ppl could get a NotImplementedError, if they try to call the forward method? Or could there be some unexpected behaviour in the code execution or in the training of the model parameters, due to the hooks or sth?
I think the worst thing would be the exception, when trying to use your module in the “normal” way, i.e. calling it directly via
Also, if users are using hooks for whatever reason, they would need to implement
forward manually (which is not the standard use case, but could be annoying).
The first issue is why I would rewrite your module, if I would like to use it
Hi so to bring that topic up another time:
When trying to display the computational graph using tensorboard this is not possible if I have a class inheriting from nn.Module without a forward method. But if i impelment the forward method the graph can be displayed. Do you know why this is the case?
Does it not build up the computational graph correctly without a forward method? Or do you just need these hooks from the forward method to properly display the graph?
As explained before, the
forward method will be called if you call the module directly. Also hooks etc. will be properly registered.
Since the computation graph is created dynamically during the forward pass, each module will be most likely called at some point (haven’t looked in detail into tensorboard backend).
Thank you for your answer. I think my question is rather: Is it possible that if I dont implement the forward method and do everything by means of functions in my nn.module class that have different names (e.g., ‘likelihood’), that these functions are not added to the computational graph (even though I call them during training)?
Because tensorflow is only able to draw the computational graph as an image if I rename my custom made function ‘likelihood’ to ‘forward’. Or is that only due to the fact that this tensorboard script relies somehow on the forward function?
Baseline quesion: If I dont name my function ‘forward’ but instead ‘likelihood’ (for example) is the computational graph then still properly build up? I.e., would the results/computations change somehow if I renamed the function to ‘forward’?
The computation graph will be properly built even without any function definitions.
You could write your complete model using the functional API and Autograd will track each operation as long as you are using parameters, which require gradients.
However, if you are writing a custom
nn.Module, I would still recommend to implement the
forward method, as e.g. other users or libraries might rely on it.
__call__ method will call into
forward, so if some users or libs use your model as:
output = model(input)
your custom function will not be called.
I assume this might be the cast for the tensorboard issue, as at some point the model might be called directly with some dummy inputs.
Great, thanks a lot for your answer!