Hi,
I am trying to improve training performance of a Transformers type, not very large, neural network, by using a single T4 gpu (AWS g4dn.xlarge instance).
I’ve changed my code, replacing all the numpy functions with torch functions, following advice from 7 Tips To Maximize PyTorch Performance
but I see only about 17% improvement in performance (on a 1000 training epochs)
Running profiler
36% of time is spent on method 'run_backward' of 'torch._C._EngineBase
17% on built-in method tensor
Any suggestions on code optimization for GPU ?
is there even value in using GPUs if the NN is not huge ?
If there is no way to make the GPU significantly faster, it might be cheaper to get more CPU and parallel training on CPUs, wouldn’t it ?