Can't do parallel computing

I want to load Resnet model and change last layer to fit my class numbers. But after I use data parallel function right after the modified model. it give this error.

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: CPU

this is the code:

if args.resume:
         if os.path.isfile(args.resume):
             print("=> loading checkpoint '{}'".format(args.resume))
             checkpoint = torch.load(args.resume)
             args.start_epoch = checkpoint['epoch']
             best_prec1 = checkpoint['best_prec1']
             model.load_state_dict(checkpoint['state_dict'])
             model.module.fc=nn.Linear(2048,6).cuda()
             if 'optimizer' in checkpoint:
                 optimizer.load_state_dict(checkpoint['optimizer'])
             print("=> loaded checkpoint '{}' (epoch {})"
                   .format(args.resume, checkpoint['epoch']))
         else:
             print("=> no checkpoint found at '{}'".format(args.resume))
     
cudnn.benchmark = True
#model = model.cuda()
model = torch.nn.DataParallel(model, device_ids=list(range(args.ngpu)))

some times it also has another problem:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)
Can anyone help me? thank you so much

I don’t see any obvious issues in the posted code snippet so could you create a minimal and executable code able to reproduce the issue?

I add torch.cuda.set_device(‘cuda:0’)
this works for me.
Thank you