My code seems to stop while moving tensor to GPU?

Occasionally, my code will seem to ‘hang’ indefinitely at some point during training, with no errors or anything indicating a problem. Going into task manager, ‘python’ still seems to be taking up both cpu and gpu resources. Besides the lack of progress in training, the only sign that something is broken is that if I try to stop the code, spyder becomes unresponsive. At first I thought it might be spyder, but when I run the coder directly from the anaconda prompt, I can still get the same behavior (except that the anaconda prompt will allow me to ctrl-C the execution).
After placing a lot of ‘print’'s around the training loop, it seems that the code stops while executing these lines:

    xbatch=xtemp.to(device=cuda)
    ybatch=ytemp.to(cuda)

I see a lot of people with ‘hanging’ code gets told to look at their data loading, but all my data is in memory (and the dataloader has num_workers=0). And these lines are after the dataloader has already spit out xtemp and ytemp…

How do I debug this?

Before I kill it, the gpu stats in taskmanager looks like this:
Screenshot 2021-03-01 142136

So, it’s not immediately clear that the GPU is overworked…

After I kill the script from the anaconda prompt, all I get is:

Any luck? I also encountered the similar issue.