Weird behaviour of Training loss

I have modeled a NN which contains conv3D, LSTM, and conv2D. I’m using MSELoss to map a tensor data with 3 dimension as data dimension to a 2d dimension data where each pixel has a positive value at target. Therefore, the output at each pixel should have the same value. The training loss plot for a single data is something like following picture.


I normalized the input data. have used xavier uniform to initialize the weights. Moreover, I decrease the learning rate with scheduler and clip the gradients with

torch.nn.utils.clip_grad_norm_(model.parameters(), 1000)

Any suggestion?