I’m playing with the Q&A code from Facebook Research and had it working perfectly. It was running with torch 0.4.0 on a GTX 1080 with cuda 9.0 and Python 3.6.7. I got a new machine with two RTX 2080 Ti cards. I tried just updating everything to the latest, torch 1.1.0, cuda 10.2, latest nVidia driver. The app would hang at the torch.ger call. I tried doing just CPU (no cuda) and got the same result. I goes into the function and never returns.
I downgraded torch to 1.0.1.post2 and it works fine with the CPU. I still need to get all the GPU stuff sorted out to fix the CUDNN_STATUS_EXECUTION_FAILED error but for now I’d just like to figure out why the torch.ger function is not returning.
It’s being passed two tensors of shape  of type torch.float32. Comparing the tensors in 1.0.1.post2 vs 1.1.0 in a debugger they are almost identical. There are a few values that are a tiny bit different but nothing that should break it.
One additional bit of information, the code uses the multiprocessing library. I read that the pytorch wrapper around it is preferred. I was unable to switch over because of some missing functions like Finalize. But just looking at the ger function, it doesn’t seem like multiprocessing is the problem.
Any ideas what I should be looking at?