I did quantize aware training using resnet50 on ImageNet dataset.
compress_classifier.py -v -a resnet50 --data /home/ubuntu/work/data.imagenet/ --epochs 200 -o logs/resnet50_imagenet_qat_w5a8_i8o8 --compress ../quantization/quant_aware_train/qat_resnet50.yaml -b 72 -p 100 --vs 0.2 --lr 0.01 --pretrained
Dataset sizes:
training=40000
validation=10000
test=50000
The training yaml file
{
"quantizers": {
"pact_quantizer": {
"class": "PACTQuantizer",
"act_clip_init_val": 8.0,
"bits_activations": 8,
"bits_weights": 5,
"overrides": {
"conv1": {
"bits_weights": 8,
"bits_activations": 8
},
"layer1.0.pre_relu": {
"bits_weights": 8,
"bits_activations": 8
},
"final_relu": {
"bits_weights": 8,
"bits_activations": 8
},
"fc": {
"bits_weights": 8,
"bits_activations": 8
}
}
}
},
After 60 epochs
Val accuracy
| best_top1 | float | 47.37 |
| current_top1| float | 47.37 |
DataParallel(
(module): ResNet(
(conv1): Conv2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False,
Distiller_QuantAwareTrain: weight --> 8 bits
)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): DistillerBottleneck(
(conv1): Conv2d(
64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(1): DistillerBottleneck(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(2): DistillerBottleneck(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
)
(layer2): Sequential(
(0): DistillerBottleneck(
(conv1): Conv2d(
256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): Le
arnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(
256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(1): DistillerBottleneck(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(2): DistillerBottleneck(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(3): DistillerBottleneck(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
)
(layer3): Sequential(
(0): DistillerBottleneck(
(conv1): Conv2d(
512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(1): DistillerBottleneck(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(2): DistillerBottleneck(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(3): DistillerBottleneck(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(4): DistillerBottleneck(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(5): DistillerBottleneck(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
)
(layer4): Sequential(
(0): DistillerBottleneck(
(conv1): Conv2d(
1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(1): DistillerBottleneck(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
(2): DistillerBottleneck(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False,
Distiller_QuantAwareTrain: weight --> 5 bits
)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(add): EltwiseAdd()
(relu3): LearnedClippedLinearQuantization(num_bits=8, clip_val=8.0, inplace)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(
in_features=2048, out_features=1000, bias=True,
Distiller_QuantAwareTrain: weight --> 8 bits
)
)
)
Training accuracy was Overall Loss 0.526230 Objective Loss 0.526230 Top1 87.208333 Top5 95.388889 LR 0.000500 Time 1.998097.Since the pretrained model has the accuracy of Top1: 75.600 Top5: 92.740 Loss: 0.977, I thought model overfitted the training data, so I’ve tested on 5000 samples from the test set(val directory) with
compress_classifier.py -v -a resnet50 --data /home/ubuntu/work/data.imagenet.val/ --eval -b 128 --resume-from logs/resnet50_imagenet_qat_w5a8_i8o8/2020.09.16-064602/best.pth.tar -o logs/inference/
I got an accuracy of Top1: 90.300 Top5: 95.680 Loss: 0.589 while the post-training quantization has an accuracy of Top1: 16.700 Top5: 36.460 Loss: 4.929.
What could be the reason test accuracy exceeds here?
I separated val directory in imagenet folder to make sure that training data didn’t use for inference,but the result was the same.