Hi! I was working on an ASR model in tensorflow keras, but now I want to swich to pytorch. I’m trying to reimplement keras model in pytorch, but I think, I did a mistake, because the same model on the same data does not learn in pytorch.
the nn.Linear layers that you created, initialize their weights to something other than the default initialization and see if it makes a difference. You can use https://pytorch.org/docs/stable/nn.html#torch-nn-init for convenience to try different initializations.
Could be that I missed it but it seems like a possible reason is that you forgot to zero the gradients before/after running a batch. You only seem to do it at the start. Try adding the following INSIDE your training loop:
Thank You, this solved my issue. I forgot to zero out gradients after each minbach. Now works fine.
# Optimizer needs the gradients of this minibatch only, so zero out prev grads.
optimizer.zero_grad()
loss.backward() # Calculates derivatives with autograd
optimizer.step() # Update weights