Running multiple independent models on single gpu on PyTorch windows11 (RTX 3090 TI)

jaiabhayk · December 25, 2022, 7:45pm

Hi there,
Anyone having any experience of running independent multiple models on a single gpu (RTX 3090 or RTX 3090 TI on PyTorch + windows11 ). I tried running two models to train and it takes double time when i run two separate models in separate power shell on windows (is there any limitations on rtxt 3090 or rtx 3090 TI) ? [its cifar10 and RAM is 64 gb and gpu ram is 24 gb and dataset is very small to fit in the RAM, so no memory read/write bottleneck].

ptrblck · December 25, 2022, 8:15pm

Check if other compute resources are already exhausted on the device, as it would also disallow the parallel execution of different workloads.
If that’s not the case, check if Windows adds some limitation (I’m not using Windows and don’t know about all “modes” it could use the GPU in).

jaiabhayk · December 25, 2022, 8:28pm

@ptrblck
Thanks, i checked that, i have 64 gb ram, and 24 gb gpu ram. And running cifar10 training which fits in the memory. When i run single model it takes around 4 minutes but when i run two independent in two seperately power shells, it takes 8 minutes for each which is weird and seems like some locking for RTX 3090 TI on windows11 ?

ptrblck · December 25, 2022, 8:48pm

I wasn’t thinking about the device memory but the compute resources.

J_Johnson · December 26, 2022, 2:20am

run cmd
type nvidia-smi
screenshot

This will show what percent of your device CUDA cores are being utilized.

jaiabhayk · December 27, 2022, 3:41am

(No load)
(single model training)
(two independent models training)
(two independent models training, another screenshot after few seconds)

jaiabhayk · December 27, 2022, 3:45am

@J_Johnson , @ptrblck , Please let me know if you have any suggestions ?

J_Johnson · December 27, 2022, 3:56am

99% utilization.

It means that your GPU compute cores are running at max speed. Memory is only one bottleneck. Calculation speed is the other. You’ve maxed out the latter. That’s why you’re getting a slow down on both models.

jaiabhayk · December 27, 2022, 2:57pm

@J_Johnson , My bad (i did not realize that). I verified that with smaller network, it’s not an issue and works as expected. Thanks for your help !

ptrblck · December 27, 2022, 9:38pm

The GPU util. in nvidia-smi reports the relative time where any kernel was using the GPU over the sample period. A high util. shows that kernels are packed, but does not give you any information about the occupancy of the compute resources. You should thus profile your code, check the needed resources (sm count, registers etc.) of your kernels and make sure they can overlap.