Pytorch is unable to load model's parameters to a particular GPU

I have logged in to a server which has 4 NVIDIA 1080 GPUs. I ran nvidia-smi and found out that the global memory of GPU 0 is almost full but other GPUs have lot of free global memory. The status is as follows:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25 Driver Version: 390.25 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108… Off | 00000000:02:00.0 Off | N/A |
| 68% 87C P2 181W / 250W | 10752MiB / 11178MiB | 98% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 108… Off | 00000000:03:00.0 Off | N/A |
| 29% 63C P8 22W / 250W | 25MiB / 11178MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 GeForce GTX 108… Off | 00000000:82:00.0 Off | N/A |
| 20% 52C P8 17W / 250W | 25MiB / 11178MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 GeForce GTX 108… Off | 00000000:83:00.0 Off | N/A |
| 34% 67C P2 63W / 250W | 536MiB / 11178MiB | 0% Default |
±------------------------------±---------------------±---------------------+

I am trying to run the code snippet where the CNN is shallow with 3 convolution layers:

device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')
cnn = CNN()
cnn = cnn.to(device)

Its clear that I want to run this on CUDA device 1 which has got ample global memory. But when I run it I am getting error in the line: cnn = cnn.to(device) as:

RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/THCTensorRandom.cu:25

Why is this so? Can somebody help me? Thanks in advance.

Some details:
OS: Ubuntu server 16.04
pytorch version: 0.4.0
python version: 3.5
package manager: pip
CUDA version: 8.0.61
CUDNN version: 7.1.02

Pytorch will create context on gpu 0 regardless of which gpus you end up using. You should set CUDA_VISIBLE_DEVICES.

For more detailed discussion, this link is helpful https://github.com/pytorch/pytorch/issues/3477.