to isolate the issue further. A minimal code snippet would be indeed helpful, but I also understand that it can be a huge amount of work to create one.
I believe I narrowed down the problem to a code snippet that replaces one Sequential with another Sequential. I have a list of sequentials called sequentials that I initialized in a Module. Periodically, I do sequentials[i] = initialize_new_sequential(). It seems memory isn’t being freed up properly.
Sorry, I managed to get rid of the error, but now I get memory errors in the middle of training:
RuntimeError: CUDA out of memory. Tried to allocate 250.00 MiB (GPU 0; 10.92 GiB total capacity; 9.19 GiB already allocated; 45.31 MiB free; 10.10 GiB reserved in total by PyTorch)
Memory is constant throughout training, so this shouldn’t happen. However, I do create new sequentials in the middle of training and replace old ones, as in list_of_sequentials[index] = new_sequential
No matter how much I reduce memory requirements, I get this memory leak