Net in DataParallel make training aware quantization convert model acc error

I run net train and test in single GPU is good, and I try to train model in multi-gpus, and I found that the process of training is good, however, when I convert model into quantization, and the model acc is 1% in CIFAR100 which means random.
My convert code is as follows:

126     if gpus == 1:
127         quantized_model = torch.quantization.convert(net.eval().to(device), inplace=False)
128     else:
129         print("expert model from torch DataParallel")
130         single_net = deepcopy(net.module)
131         #single_net = net
132         quantized_model = torch.quantization.convert(single_net.eval().to(device), inplace=False)
133         #quantized_model = quantized_model.module
134     quantized_model.eval()

if gpu number is more than one, I use net = torch.nn.DataParallel(net), so when I need to convert it, I first do single_net = deepcopy(net.module), but the acc of quantized_model is 1% , well the training acc is about 77%. Meanwhile, it has warning:

/home/autolab/anaconda3/lib/python3.7/site-packages/torch/quantization/ UserWarning: Must run observer before calling calculate_qparams. Returning default scale and zero point.
Returning default scale and zero point.")

So if anyone has any ideas?

There are currently some issues with nn.DataParallel and Quantization Aware Training. There is a WIP PR to fix it -
You can follow the toy example here to make sure you’re following the steps for QAT correctly

Thanks, I think I has solve this problem by follows:

178     test(net.module, epoch, device, args.gpus)
179     qat_model = deepcopy(net)
180     qat_model.eval().to('cpu')
181     torch.quantization.convert(qat_model, inplace=True)
182     test(qat_model.module, epoch, 'cpu', args.gpus)
183     scheduler.step()

I need to run net.module first, then the qat_model in cpu is correct