Hi. I’ve used PyTorch 1.4 for my project.
I found the weird problem that the reuse of training data causes a stack of “Non-releasable Memory” in GPU.
Specifically, my program consists of several training processes.
Then, in every training process, a model, optimizer, and training data are initialized and trained.
However, since loading the training data using pickle causes too much time (each data instance is the class object consists of lists. It is a natural language dataset e.g. SQuAD v1.1), I try to load and ‘cache’ the training data before several training steps start rather than loading the training data at every training process.
Then, ‘cached’ training data is fed to the “train” function then it is wrapped with TensorDataset inside the “train” function.
However, by using cuda.memory_summary function, I found that this approach causes the stack of Non-releasable memory to GPU at every iteration and it ends up with a Cuda out of memory error.
When I load the training data or deep-copying training data in every training process, this problem did not happen but they cost too much time.
Does anybody have some ideas about why this problem occurs?
Furthermore, how can I solve this problem?
Thank you for reading my question.