Cuda runtime error: arguments are located on different GPUs

Tomojit · October 21, 2020, 2:29pm

Hi - I have a machine which has two GPUs. I noticed that if I run my code on the the second device (cuda id=1) the code runs for sometimes then error out with the following error message. I’m not doing any memory sharing or multiple GPU programming. I’m using the torch version ‘1.2.0’. Is this an issue with this version? What is a fix for this error?

RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:260

Thank you,
Tomojit

JuanFMontesinos · October 21, 2020, 5:59pm

There is probably a hard-coded device .cuda() in some part of the code.
If you don’t want to inspect the code, you can run
CUDA_VISIBLE_DEVICES=1 python path_to_the_file.py
This environment variable will make that python program to consider the only gpu is the gpu 1.