Arguments are located on different GPU's

I am trying train my model(it work well in pytorch 0.3) with pytorch 0.4.1, but the code report the error "RuntimeError: arguments are located on different GPUs at /opt/conda/conda-bld/pytorch_1533672544752/work/aten/src/THC/generic/THCTensorMathBlas.cu:51
"
I check my code, I set gpu device " os.environ[“CUDA_VISIBLE_DEVICES”] = 0, 1 ", and the net also move to specified gpu " net.cuda() net = torch.nn.DataParallel(net) ", when the net do forward, fetch the data from dataloader, then “data = data.cuda() output = net(data)”, I set a breakpoint here, find that this option just set data to device 0, so the error occured!

So, what’s the correct way to use multi-gpu in pytorch 0.4.1?

Apparently, this issue is fixed in version 0.5.0.