Hi guys,
What is the best practice of running multiple training jobs in a single GPU? Do you run them parallelly or sequentially?
Based on my benchmark, running two training jobs simultaneously takes longer than running a single training job. But the GPU is not fully utilized when jobs are running sequentially, which is a wasteful use of resources.
Thanks