@lais823 and I are having very large loss output by BCEWithLogitsLoss. We are working with a vision and language problem which also has attention modules.
We also have followed the guidelines of http://karpathy.github.io/2019/04/25/recipe/ for training a neural network but we are unsure how to trace back and fix the problem at this point.
What are some suggestions you could provide us with?