RuntimeError: tensors are on different GPUs, but I just run "CUDA_VISIBLE_DEVICES=1". Why?

RuntimeError: tensors are on different GPUs, but I just run “CUDA_VISIBLE_DEVICES=1”. Why? Even I just run
python train.py

this error still exists.
I cut the conv1_1 to conv3_3 layers of vgg16 to build my model.

Hard to tell without seeing your script. I have gotten this error from trying to run a cuda variable through a non cuda nn.Sequential. Make sure your are explicitly casting your network to the GPU with net.cuda().

import torch
import torch.nn as nn
from torch.autograd import Variable

net = nn.Sequential(nn.Conv2d(3, 128, 4))

for i in range(1):
    input = Variable(torch.randn(1, 3, 10, 10))
    input = input.cuda()
    out = net(input)
    print(out)

This snippet will throw that error

do you add .cuda() to your model?