Nan Loss coming after some time

You could use a normalization layer. Alternatively, you can try dividing by some constant first (perhaps equal to the max value of your data?) The idea is to get the values low enough that they don’t cause really large gradients.

2 Likes