Could you post a (small) executable code snippet so that we could debug the issue?
Also, are you using multiple workers in your DataLoader? If so, does your code run using num_workers=0?
My problem was solved by fixing a bug: I replaced zero by torch.zeros_like() when initializing a tensor.
The num_workers, in my case, was 16 and I did not change it even when the problem was solved. I have tried set it to zero but same problem still happened, therefore I think that problem may be caused by other potential bugs.