Implementation of a Keras model doesn't converge

The weight_decay might be too aggressive in PyTorch, as it’ll add all parameters to the regularization term.
Have a look at this post to exclude batch norm params etc. (or just add the conv parameters to the regularization).

1 Like