Multi-GPU error

Xia_Yandi · February 6, 2017, 1:18am

After I wrapped my model with DataParallel, this error happened:

RuntimeError: Assertion `THCTensor_(checkGPU)(state, 5, input, gradOutput, gradWeight, sorted, indices)’ failed. Some of weight/gradient/input tensors are located on different GPUs. Please move them to a single one. at /home/soumith/local/builder/wheel/pytorch-src/torch/lib/THCUNN/generic/LookupTable.cu:17

My model includes an embedding() layer.

Is this caused by embedding()?

If so, any suggestion on how to do multi-gpu properly with embedding() layers inside the model?

Thanks

apaszke · February 6, 2017, 12:07pm

Yes, the embedding doesn’t work with multi-GPU due to a bug. There’s a PR that fixes it, but it needs some small changes.

peak · February 9, 2017, 11:49am

have the bug been fixed ? i meet the same error:
RuntimeError: Assertion `THCTensor_(checkGPU)(state, 5, input, gradOutput, gradWeight, sorted, indices)’ failed. Some of weight/gradient/input tensors are located on different GPUs. Please move them to a single one. at /data/plat/peakzeng/solfware/pytorch/torch/lib/THCUNN/generic/LookupTable.cu:17

apaszke · February 9, 2017, 3:12pm

Not yet, the PR still needs some fixes.

apaszke · February 9, 2017, 5:08pm

This is merged now. But you need to build from source if you need this change right away.