Very large loss output by BCEWithLogitsLoss

@lais823 and I are having very large loss output by BCEWithLogitsLoss. We are working with a vision and language problem which also has attention modules.

We also have followed the guidelines of http://karpathy.github.io/2019/04/25/recipe/ for training a neural network but we are unsure how to trace back and fix the problem at this point.

What are some suggestions you could provide us with?

Hey Mona, could you please paste your model code (especially the forward call) and training loop? This should make it easier for other to help.