The loss seems to decrease, but really slow.
Checking the gradients, as you already did, is a valid way to see, if you have accidentally broken the computation graph. Since that doesn’t seem to be the case, you would have to verify if your approach using the custom objective function etc. works at all.
To do so I would recommend to try to overfit a small dataset, e.g. just 10 samples, and make sure your current training routine and model are able to overfit this dataset.
1 Like