Runtime Error in Cudalaunch


I’m getting the following error in my code for loss computation when I run it on multigpu system with 1080Ti cards.

When the following code is executed:

loss = loss[ignore_mask].mean()
RuntimeError: after cudaLaunch in triple_chevron_launcher::launch(): too many resources requested for launch

However when I try to run the same code on a single GPU system this error doesn’t occur. I’m using Python 3.6.7 [GCC 7.3.0] on linux with pytorch version ‘0.4.1’