Unit Testing with artificial GPU limits

maxme · June 28, 2024, 2:03pm

Hello,
I deploy Pytorch models across multiple devices. My training infrastructure (as per usual) has access to more GPU memory than some of the devices, where the models are deployed. I would like to set up unit tests with an artificial GPU limit, so I can figure out, if some of my models have too high memory requirement.
In essence, I would like to use torch.randn with a certain batch size, the respective resolution and channels; then build my model and apply inference. When - within that unit test session - the system runs out of memory, I would like my unit test to fail.
Is there a way to achieve this?

ptrblck · June 28, 2024, 4:56pm

torch.cuda.set_per_process_memory_fraction(fraction, device=None) might be useful for your workloads.