CUDA_VISIBLE_DEVICE is of no use

Does data parallel only support more than batch=1? Actually, I only use batch=1.