Hi! I was working on an ASR model in tensorflow keras, but now I want to swich to pytorch. I’m trying to reimplement keras model in pytorch, but I think, I did a mistake, because the same model on the same data does not learn in pytorch.
Here is a full jupyter notebook of my problem: notebook on github
As You can see, the TF model overfits the random data, as expected, but the pytorch model does not learn anything.
I’m using pytorch 1.1.0 with CUDA, and Tensorflow 2.0.0-beta1
initialization of weights will matter here.
nn.Linear layers that you created, initialize their weights to something other than the default initialization and see if it makes a difference. You can use https://pytorch.org/docs/stable/nn.html#torch-nn-init for convenience to try different initializations.
Could You post a simple example on how to properly initialize weights?
Do I have to re-initialize LSTM weights after each epoch or sample?
Ok, I initialized all my Linear weights based on this comment, but pytorch is still not learning:
if isinstance(m, nn.Linear):
size = m.weight.size()
fan_out = size # number of rows
fan_in = size # number of columns
variance = np.sqrt(2.0/(fan_in + fan_out))
baseline_model = FCBaseline(SEGMENT_WIDTH, SEGMENT_HEIGHT, SEGMENT_CHANNELS, num_classes)
Could be that I missed it but it seems like a possible reason is that you forgot to zero the gradients before/after running a batch. You only seem to do it at the start. Try adding the following INSIDE your training loop:
Does this solve your issue?
See here for an example or here for the reason why this is needed.
Thank You, this solved my issue. I forgot to zero out gradients after each minbach. Now works fine.
# Optimizer needs the gradients of this minibatch only, so zero out prev grads.
loss.backward() # Calculates derivatives with autograd
optimizer.step() # Update weights