Implementation of a Keras model doesn't converge

ptrblck · May 3, 2020, 3:50am

The weight_decay might be too aggressive in PyTorch, as it’ll add all parameters to the regularization term.
Have a look at this post to exclude batch norm params etc. (or just add the conv parameters to the regularization).