Memory consumption on pytorch

sbmaruf · July 3, 2018, 8:18pm

Is there any way to consume the whole gpu memory in pytorch?
I know it’s not an ideal case to have, but there are times when you don’t want others to interfare your training inside the same gpu. Like when you are doing training and evaluation, you could release your momery and use it for other works to do faster processing. sometimes a big sentences could come to the batch so that you need more memory and someone comes after you to the server, runs their process and you got OOM. this actually happens for me while training nmt system.
I don’t find any suggestions about this in PYTORCH.
From my experience of tensorflow it was their by default and to be honest I don’t like this feature by default.
I think this is a must have feature though it has some dark side.
I would very mich like to know how people handle this type of situation when server don’t have any type of resource allocator for gpus.
I got one forum post here which is not clear to me, How to occupy all the gpu memory at the beginning of training

and finally if this feature is not there right now, will it be included in the future version of release?

albanD · July 4, 2018, 9:23am

Hi,

This does not exist in pytorch.
I think it is unlikely to be added in future releases because of how the cuda allocator that we use works.

Most of the work that we do is actually GPU compute bound. So even if there is extra memory available, there is no point for sometone else to launch a job on the same GPU: running both job at the same time is actually slower than running them one after the other + it has higher memory demand.
I guess the solution is not to share a single GPU if you actually need all of it.