I’m training DeepLabV3 on 1208*1920 images, batch 2 on a T4 GPU.
GPU is close to 100% usage all time in nvidia-smi
training time is the same with and without amp (autocast + scaler). Why is there no benefit? How can I see if float16 is used? in T4 I expected a decent speed up given the presence of tensorcores, surprise that it doesn’t change anything