The actual size of a batch in my case can be different, and sometimes a batch would cause CUDA out of memory error. Interestingly during training, the CUDA memory usage keeps increasing instead of fluctuating (my guess is that the model would allocate memory based on the largest tensor so far, and doesn’t release very soon), so once I got a CUDA out of memory error, I can never go back to normal training. Is there any ways that I can manually release allocated CUDA memory so that I can use try...catch
to handle out of memory error?
Thanks