Difficulty in training a Conv-LSTM

I am having difficulty in training a conv-lstm network for image classification.

The blue curve is the test loss and the red curve is the train loss. The loss seems to explode and I am really struggling to handle it.

My Encoder is pretty simple 4 conv layers getting an image of 256X256 and converting it to 8X8. Then I use an LSTM with 3 layers of 256 dimension size, followed by FC layer. I have not tried training LSTM much so any help shall be highly appreciated