I’m trying to run the train.py
file from GitHub - gtamba/pytorch-slim-cnn: A pytorch implementation of SlimCNN for Facial Attribute Classification : https://arxiv.org/abs/1907.02157. When I run it on my GPU, I get nan for all of the loss values in the training values. However, when I run it on CPU, I don’t get any nan values for the loss.
When I add the line torch.autograd.set_detect_anomaly(True), it comes up with the error message ‘RuntimeError: Function ‘BinaryCrossEntropyWithLogitsBackward0’ returned nan values in its 0th output.’
What seems to be the issue here, and how can I fix it? I have a NVIDIA RTX 3070 GPU and am on CUDA version 11.4. I’m using torch version 1.10.0. Let me know if there’s any other information I can provide to diagnose the problem.
Thanks!