Weird CUDA out of memory error (OOM) at epoch end

gebbissimo · November 24, 2020, 7:54am

Thank you very much for your reply!

The input shapes are static
Data is pushed to the GPU not inside the Dataset, but just before the forward pass

By now I also ran a memory trace suggested in this thread: How to debug causes of GPU memory leaks? . However, somehow the training runs smoothly (and extremely slowly) for several epochs in this case… Could it be that at some point GPU memory is not deallocated quickly enough and that some wait statements would solve it?

Also forgot to mention: I am using an older version of monodepth, namely 0.4.1