Hi
I am training a model using the UNET++ architecture… I have large images that I crop down to make this more manageable but I feel I should be able to handle more data using the dataparallel setup.
Per the title I am running on eight 8GB gpus on a cluster. I am trying to run a model with an image size of 1024x1024 and batch size of 3. I get an out of memory error. I estimate with these settings I would need 3gb per batch not including all the layers in the model. Still I have ~50-60gb of memory to play with…
What is going on ?