Hey PyTorch Community,
I tried rerunning my existing code base, which is working with an “older” version , but does not run with the current stable or nightly release. The error occurs in combination with multiprocessing. When I retrieve the network state dict and write it to a variable, I get an RuntimeError (see Stacktrace below).
Most interestingly the code works with PyTorch ‘1.0.0a0+95ca667’ but fails when using ‘1.0.0.dev20190123’. I couldn’t spot any changes within the release notes that could generate this behavior. Is anybody aware of changes between the alpha and the dev version, which could cause this issue?
File “/home/xxx/xxxxxx.py”, line 249, in _run_train
_ = output_queue.get_nowait()
File “/home/xxxxx/miniconda3/lib/python3.7/multiprocessing/queues.py”, line 126, in get_nowait
File “/home/xxx/miniconda3/lib/python3.7/multiprocessing/queues.py”, line 113, in get
File “/home/xxxxx/miniconda3/lib/python3.7/site-packages/torch/multiprocessing/reductions.py”, line 102, in rebuild_cuda_tensor
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch-nightly_1548239643329/work/aten/src/THC/THCCachingAllocator.cpp:637