A portion of my neural network gets trained ( essentially a Linear layer), however a GRU network build on top of it doesn’t get trained. This is very strange.
I print out the weights after every gradient update, and it is only the weights of the Linear layer which get modified.
My optimizer takes in the entire model to update.
There is a Linear layer, added before the input to the gru and a Linear layer added to the output to the GRU. Both of them have their weights updated.