I was wondering if anyone tried training on popular datasets (imagenet,cifar-10/100) with half precision, and with popular models (e.g, resnet variants)?
It works, but you want to make sure that the BatchNormalization layers use float32 for accumulation or you will have convergence issues. You can do that by something like:
model.half() # convert to half precision for layer in model.modules(): if isinstance(layer, nn.BatchNorm2d): layer.float()
Then make sure your input is in half precision.
Christian Sarofeen from NVIDIA ported the ImageNet training example to use FP16 here:
We’d like to clean-up the FP16 support to make it more accessible, but the above should be enough to get you started.
This is great! Is there documentation on when/where half precision can be used? For example, it doesn’t seem like half precision computation is supported on CPU, but I only discovered this by giving it a shot.