Hi,
I trained the model using dataParallel, and save the only submodule in the DataParallel. When I validate the model, I load the model for just one GPU. But it seems not working, the prediction accuracy is not good as the training, even if using the same data.
Here’s my code snippet:
....
model.load_state_dict(torch.load(args.state))
....
if args.gpu_num > 1:
device_ids = list(range(0, args.gpu_num))
print(f'Use multiple GPUs: {device_ids}')
model = torch.nn.DataParallel(model, device_ids=device_ids)
After finish training epoch, I save the submodule from DataParallel.
saveModule = model
if args.gpu_num > 1:
# If use multi GPUs to train, only save the child node.
saveModule = list(model.children())[0]
torch.save(saveModule.state_dict(), logpath)
Thanks
Qi