Hi. I’m using a PyTorch Hook to analyzing intermediate feature map’s statistics with a large-scale model training.
Also I’m using multi-GPU to accelerate the training process.(with torch.nn.DataParallel)
The problem is, GPU utilization is drastically drops down when my Hook is executing(I compared GPU utilization with when I don’t activated my Hook - naive training without analyzing)
I know this is Indispensable situation because my analyzing Hook is quite high-cost function, but all my multi-GPU are freezing in Hook phase, which seems not reasonable.
I think there’s some solution, such as Hook-specify GPU setting, like:
GPU 0~3: model training
GPU 4: gather intermediate feature map and analyze
Is it possible? Or is there any good solution to solve my problem?
Any answers will be welcomed. Thanks.
Hooks are very slow for multiple reasons: They are python code and cannot run concurrently (due to python limitations). They require the value of the Tensor usually and so will for the GPU to sync all computation (wait for everything to run, execute, then start queuing more work). Since multi-gpu splits a batch between the gpus, the whole batch runs as slow as the slowest part of the batch. In your case, that would be the GPU that has the hooks.
A possible solution could be to only collect these statistics every 10 batch?
Sorry for late appreciation.
I totally re-wrote my code and changed my algorithm, so I can reduce 11 Days -> 4 Days training time with my hooks.
But oscillating GPU Utilization, such as:
almost 100% GPU util(training process: like backpropagation)
-> 4~10% GPU util(my hooks; even I wrote all operations to be executed in GPU)
This speed-difference between optimized training process and my custom task is still bottleneck.
I hope pytorch to have more powerful optimizing semantics on hooks, cause it’s very useful tool for researchers.
Thanks again for your answers.