So I trained a 3D-UNet with 16 base filters and 5 layers deep. Now I am trying to infer it on a 240x240x155 on a CPU. I have allocated 128GB of ram, it still pops out with an error.
RuntimeError: $ Torch: not enough memory: you tried to allocate 0GB. Buy new RAM! at /opt/conda/conda-bld/pytorch
I do not have more money to buy new ram, The model should require at the most 32GB of ram for that image.
As far as I understand your issue, the training script takes 16GB at most running on the GPU and more than 128GB on the CPU?
If that’s correct, do you see an increasing memory usage during training or does your script run out of memory during the first iteration?
Did you change something in your data loading pipeline, e.g. are you loading the complete dataset into RAM?