Training multiple pytorch models concurrently leading to longer training time for each model

I’m currently training two variations of the following model (GitHub - TengdaHan/DPC: Video Representation Learning by Dense Predictive Coding. Tengda Han, Weidi Xie, Andrew Zisserman.).

I’ve noticed that when training both models concurrently (each on a dedicated GPU), the training time for each model increases by ~25%, as compared to when training only one model at a time.

Currently, GPU usage is >90% for both models.CPU usage is ~60% and RAM usage is ~25% when both models are trained concurrently.

Here are the profiling results (sorted by total CPU time, total CUDA time, CPU memory usage, and CUDA memory usage).

Your screenshots are a bit hard to decipher, however based on some values I could read it seems that the CUDA operations are using approx. the same amount of time, while CPU ops are a bit slower.
This could point towards a CPU-bound workload and the host might not be fast enough to run ahead and schedule the GPU work fast enough if you start multiple processes.

Thank you for your reply! I’ve updated the screenshot to a higher resolution version.