Out Of Memory error in GPU when performing random search

ire · June 11, 2020, 6:01pm

I am training an lstm model and currently I am performing random search.

Initially after each random search I was emptying the cache (torch.cuda.empty_cache()), however I was getting the OOM error after some number of random searches (usually around 3).

Then I read that in order for the memory to be freed I need to do del variable first. However, even after that I continued having the same issue. I am tracing the allocated gpu memory ( torch.cuda. memory_allocated()) and I can see that after each random search the memory is being freed. Although, when a new random search starts the memory allocated is a bit higher than the previous random search. I don’t think that this is caused by some variable that is not erased. Is there something that I am missing?

ptrblck · June 12, 2020, 8:41am

What is the random search doing? Could it change some hyperparameters and thus increase the model parameters, which could yield the OOM issue?