Since at the beginning, it says “Note that quantization is currently only supported for CPUs, so we will not be utilizing GPUs / CUDA in this tutorial.”
QAT supports GPU in the beginning, I think that line means it’s not supported in GPU for inference. We had been working on supporting GPU inference through TensorRT and cudnn, but haven’t tried officially releasing them