Including Optional Head parameters in optimizer/correct use of submodules?

So I have been working using a shared network architecture where there are multiple “network heads” that can be active depending on which task should be active. I implemented this as a list which gets created modules for these heads appended to it, and the head to be used is taken as an input parameter, which denotes which head to put the output of the shared layers through.

I realised in my work that my system wasn’t learning anything useful and was converging very oddly. I confirmed gradients are going as correctly, looked through all sorts of things and finally realised that the weights in the heads were not updating. The weights in the bodies were.

So i realised that if i take my optimiser call, which has parameters from, this call does not actually include the head parameters - in essence, everything seems to be working, it’s just not being optimized.

So, how do i include the heads parameters when done in this way in the optimizer? Or is there an overall implementation change i should make here? Bare in mind that ideally i would like to simply take in num_tasks as an input

Your debugging path sounds quite good.
Most likely you are using a Python list to store the different heads, which won’t properly register the submodules in the parent module (and thus won’t return the head parameters in model.parameters()).
Could you use an nn.ModuleList as a drop-in replacement and check your code again?

Sorry forgot to respond to this, yes this solved the issue.