Resnet50 baseline training

I’ve used nervana distiller to train resnet50 baseline model wit imagenet 1k dataset. During training, validation accuracy remains zero while loss is decreasing.

— validate (epoch=84)-----------
200 samples (128 per mini-batch)
==> Top1: 0.000 Top5: 0.000 Loss: 14.722

==> Best [Top1: 0.000 Top5: 0.000 Sparsity:0.00 NNZ-Params: 25502912 on epoch: 84]

time python3 compress_classifier.py --arch resnet50 --data ../../../data/imagenet_1k -p=50 --lr=0.3 --epochs=180 --compress=../ssl/resnet50_imagenet_baseline_training.yaml -j=1 --deterministic

What could be the reason?
my yaml file contains:
lr_schedulers:
training_lr:
class: MultiStepLR
milestones: [30, 60, 90, 100]
gamma: 0.1

policies:
- lr_scheduler:
instance_name: training_lr
starting_epoch: 0
ending_epoch: 200
frequency: 1

Is your model training fine without the distiller?
If so, I would recommend to create an issue in their GitHub for better visibility. :wink:

1 Like

Thanks. I tried with imagenet full dataset it works fine

I did quantize aware training using resnet50 on ImageNet dataset.

compress_classifier.py -v -a resnet50 --data /home/ubuntu/work/data.imagenet/ --epochs 200 -o logs/resnet50_imagenet_qat_w5a8_i8o8 --compress ../quantization/quant_aware_train/qat_resnet50.yaml -b 72 -p 100 --vs 0.2 --lr 0.01 --pretrained

Dataset sizes:
training=40000
validation=10000
test=50000

The training yaml file

{
"quantizers": {
"pact_quantizer": {
"class": "PACTQuantizer",
"act_clip_init_val": 8.0,
"bits_activations": 8,
"bits_weights": 5,
"overrides": {
"conv1": {
"bits_weights": 8,
"bits_activations": 8
},
"layer1.0.pre_relu": {
"bits_weights": 8,
"bits_activations": 8
},
"final_relu": {
"bits_weights": 8,
"bits_activations": 8
},
"fc": {
"bits_weights": 8,
"bits_activations": 8
}
}
}
},

After 60 epochs

Val accuracy
 

| best_top1   | float | 47.37 |
| current_top1| float | 47.37 |
 
DataParallel(
(module): ResNet(
(conv1): Conv2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False,
Distiller_QuantAwareTrain: weight --> 8 bits
)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): DistillerBottleneck(
(conv1): Conv2d(
64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(1): DistillerBottleneck(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(2): DistillerBottleneck(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
)
(layer2): Sequential(
(0): DistillerBottleneck(
(conv1): Conv2d(
256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): Le
arnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(
256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(1): DistillerBottleneck(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(2): DistillerBottleneck(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(3): DistillerBottleneck(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
)
(layer3): Sequential(
(0): DistillerBottleneck(
(conv1): Conv2d(
512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(1): DistillerBottleneck(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(2): DistillerBottleneck(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(3): DistillerBottleneck(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(4): DistillerBottleneck(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(5): DistillerBottleneck(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
)
(layer4): Sequential(
(0): DistillerBottleneck(
(conv1): Conv2d(
1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(1): DistillerBottleneck(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(2): DistillerBottleneck(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(
in_features=2048, out_features=1000, bias=True,
Distiller_QuantAwareTrain: weight --> 8 bits
)
)
)

Training accuracy was Overall Loss 0.526230 Objective Loss 0.526230 Top1 87.208333 Top5 95.388889 LR 0.000500 Time 1.998097.Since the pretrained model has the accuracy of Top1: 75.600 Top5: 92.740 Loss: 0.977, I thought model overfitted the training data, so I’ve tested on 5000 samples from the test set(val directory) with

compress_classifier.py -v -a resnet50 --data /home/ubuntu/work/data.imagenet.val/ --eval -b 128 --resume-from logs/resnet50_imagenet_qat_w5a8_i8o8/2020.09.16-064602/best.pth.tar -o logs/inference/

I got an accuracy of Top1: 90.300 Top5: 95.680 Loss: 0.589 while the post-training quantization has an accuracy of Top1: 16.700 Top5: 36.460 Loss: 4.929.

What could be the reason test accuracy exceeds here?
I separated val directory in imagenet folder to make sure that training data didn’t use for inference,but the result was the same.