Model giving same output for all video while creating a pytorch Video Captioning system using GRU encoder and GRU language decoder

Hello people ,
sorry to bother you guys but I am just getting started implementing some complex begginner friendly projects with pytorch . So to code a sequence to sequence artitecture with pytorch. I thought of coding a video captioning system . I have used kinematics dataset only about (1200) samples.
My video captioning model predicts same caption for any video so I want help

this is my colab file link
:Google Colab
please help

I think there is problem in my training loop

It seems you are using nn.CrossEntropyLoss, which expects raw logits as the model output, while you are applying a softmax on the decoder’s output tensor thus creating probabilities.
Remove the softmax and see if this would help training the model.

I removed the softmax function but it doesnot help in training My model still classifying same text for all videos. Actually I have used this NLP From Scratch: Translation with a Sequence to Sequence Network and Attention — PyTorch Tutorials 1.13.1+cu117 documentation
tutorial and modified encoder layer to take video as input instead of sentence and modified code little to use accelerator , rest other things are same. I tried both attention decoder and simple decoder given in this tutorial still I got same result. My model is not learning , the loss plot is random up and down.