I am trying to speed up an algorithm responsible for producing 3d skeleton joints from 2D images. The algorithm (GAST-NET) consists of 4 main blocks running sequentially for every frame. I’m trying to parallelize the 4 blocks. I have some questions regarding the process.
- Will parallelization help speed up the algorithm? I am trying to parallelize on one GPU only.
- What other ways can I look into that can help with speeding up the algorithm?
- Slightly related question, Isn’t PyTorch already trying to use maximum GPU resources to produce output as fast as possible? I monitored the GPU utilization and it was between 38 and 50%. Is there a way I can ensure that the GPU is used to the fullest?