Hello people ,
sorry to bother you guys but I am just getting started implementing some complex begginner friendly projects with pytorch . So to code a sequence to sequence artitecture with pytorch. I thought of coding a video captioning system . I have used kinematics dataset only about (1200) samples.
My video captioning model predicts same caption for any video so I want help
this is my colab file link
I think there is problem in my training loop
It seems you are using
nn.CrossEntropyLoss, which expects raw logits as the model output, while you are applying a
softmax on the decoder’s
output tensor thus creating probabilities.
softmax and see if this would help training the model.
I removed the softmax function but it doesnot help in training My model still classifying same text for all videos. Actually I have used this NLP From Scratch: Translation with a Sequence to Sequence Network and Attention — PyTorch Tutorials 1.13.1+cu117 documentation
tutorial and modified encoder layer to take video as input instead of sentence and modified code little to use accelerator , rest other things are same. I tried both attention decoder and simple decoder given in this tutorial still I got same result. My model is not learning , the loss plot is random up and down.