Equivalent function from tensorflow ( tf.config.experimental.set_memory_growth) to limit number of processes on GPU

tanvi · August 13, 2020, 8:15pm

In tensorflow, there is a function called tf.config.experimental.set_memory_growth (Details) which allocates as much memory to the process as needed. Moreover, it doesn’t release the memory till the process runs. This prevents any other process from concurrently occupying the memory.
This is a very useful functionality for cases when the GPU is a shared resource and your process has high but dynamic memory requirement.

Could someone tell if there is a similar option in pytorch to limit the number of processes which can share memory concurrently?

ptrblck · August 17, 2020, 9:02am

If I’m not mistaken, there is no functionality in PyTorch, as the memory will be dynamically allocated and cached.
As an untested hack, you could try to allocate a huge tensor at the beginning of your script and delete it right after its creation. This should trigger PyTorch to allocate the memory block and put it into the cache without freeing it.

fabian_schutze · August 3, 2023, 10:46am

I think by now pytorch offers such an API. See set_per_process_memory_fraction torch.cuda.set_per_process_memory_fraction — PyTorch 2.0 documentation .

ptrblck · August 3, 2023, 12:53pm

It’s not the same as PyTorch now allows to set a limit while TF will eagerly flood the specified section to prevent other processes from using it.

fabian_schutze · August 3, 2023, 1:00pm

Interesting! Thanks for the clarification!