Hi,
I’m trying run a code on a multi-gpu system (4 GPUs). I’ve a python code running on first two and am running another on the next two.
When the following code is executed:
loss = loss[ignore_mask].mean()
RuntimeError: after cudaLaunch in triple_chevron_launcher::launch(): too many resources requested for launch
However when I try to run the same code on a single GPU system this error doesn’t occur. I’m using Python 3.6.7 [GCC 7.3.0] on linux with pytorch version ‘0.4.1’