Unit Testing with artificial GPU limits

I deploy Pytorch models across multiple devices. My training infrastructure (as per usual) has access to more GPU memory than some of the devices, where the models are deployed. I would like to set up unit tests with an artificial GPU limit, so I can figure out, if some of my models have too high memory requirement.
In essence, I would like to use torch.randn with a certain batch size, the respective resolution and channels; then build my model and apply inference. When - within that unit test session - the system runs out of memory, I would like my unit test to fail.
Is there a way to achieve this?

torch.cuda.set_per_process_memory_fraction(fraction, device=None) might be useful for your workloads.