Now I am trying the scratch training of Swin transformer in torchvision models at here
Even though our environment is poor (i9-10980XE, RTX3090x4, 128GB RAM, Ubuntu 22.04, pytorch 2.4 cudatoolkit=12.1), the training progressively slow down and the GPU utilization also drop. CPU cores’ are also enough.
It first uses the GPU utils 100% but after a few iterations, it is reduced dramatically.
It does not maintain the training speed. Please give me some solutions.