CPU maxed out on training resnext50_32x4d....while gpu not being used hence slow training

You could try to narrow down the bottleneck by profiling the code.
E.g. to isolate a data loading bottleneck, you could use the data_time class from the ImageNet example. If the data loading is the bottleneck, you could have a look at this post for potential reasons of this bottleneck and workarounds. Of course you won’t be able to change the machine, as it’s a Kaggle node.