I went through PyTorch Documentation for Quantize aware training. I prepare and then convert the model to quint 8 models. I save the model to use state_dict(). However, getting the file size using os.path.gets(). The model size is bigger than a non quantized model.
Can you try running a bigger model? Currently you have a weight with a single element. Even if you store the weight in int8, you still need 32 bits for the scale and 32 bits for the zero point. Try making a bigger model with multiple conv layers of more realistic sizes.