Oh I didn’t read the forward. forward is not the right place to initiallize the modules.
Pytorch’s nn is a tree structure and pretty much all the functions that call model.paremeters() and iterating over this tree.
Defining layers within forward is preventing all the methods before being aware that these layers exist and will create tons of silent issues. I would encourage you not to do this.
The most obvious is:
if you defined the optimizer before these layers are intialized by doing optimizer=some_opt(model.paremeters())
The optimizer will not upgrade the layers with the gradients for those intialized in forward, which will cause your network not to perform well and will be hard to catch.
Hi there, thanks for the comment. Let me assert how I understand this, then you can correct me.
I’ve an autoencoder:
class AE:
self.encoder = Encoder()
self.decoder = None
decoder is None because it takes results self.encoder(..) for instantiation.
… …
PS: as I was writing this I realised it could be modified to:
```python
class AE:
self.encoder = Encoder(...)
self.decoder = Decoder(...)
But,
Aren’t there some cases where what I did would make sense, given a decoder that needs parameters returned from the encoder forward method to instantiate it @JuanFMontesinos ?
I don’t recall a clear case where you cannot define the architecture without encoder parameters.
Could you provide an example?
In addition, there are many “lazy” layers where you don’t know number of channels of the input to a module and it is initialized at runtime: https://pytorch.org/docs/stable/generated/torch.nn.LazyConv2d.html
Most operations in the latests pytorch have an analogous lazy layer (conv2d, conv1d, batchnorm, etcetera…)
I just assign my model and the batches .to ‘cuda’ without problem.
It can also be helpful to import accelerate which helps assign dataloaders, losses, etc to the GPU