What happens is when I trained the model on sample corpus which have (21850,11) shape but when I try to train the larger one which have (22555090,11) shape the loss started swing as the picture below
any help? Thanks
I use Adam as optim
log_softmax as ativation function
nllLoss as loss function
with lr 1e-3