torch.nn.DataParallel always load model to GPU 0

I would like to run a trained model on a specific GPU. I tried torch.nn.DataParallel(model, device_ids=[2]) and make all xx.cuda(device=2). However, it always uses GPU memory from GPU 0. I am using pytorch 0.3.1. Any suggestions to fix it? Thanks!

Use the flag CUDA_VISIBLE_DEVICES=2 python
Also make sure you actually have 3 GPUs. GPUs are zero-indixed, so for 2 GPUs you would have to use device-id 1!

@tjoseph Thanks for your reply. I have 4 GPUs on my machine. I understand I can use CUDA_VISIBLE_DEVICES=2 to fix the problem. However, is there any way to fix it without using CUDA_VISIBLE_DEVICES?