[torchsample] Unloading tensor from GPU (storing cpu part of a GPU tensor)

Cubbee · May 5, 2017, 10:58am

I’ve encountered a problem with GPU memory not being freed while trying to use torchsample predict_loader https://github.com/ncullen93/torchsample/blob/master/torchsample/modules/module_trainer.py#L682

def predict_loader(self,
                   loader,
                   cuda_device=-1,
                   verbose=1):
    prediction_list = []
    for batch_idx, batch_data in enumerate(loader):
        if not isinstance(batch_data, (tuple,list)):
            batch_data = [batch_data]
        input_batch = batch_data[0]
        if not isinstance(input_batch, (list,tuple)):
            input_batch = [input_batch]
        input_batch = [Variable(ins) for ins in input_batch]
        if cuda_device > -1:
            input_batch = [ins.cuda(cuda_device) for ins in input_batch]

        prediction_list.append(self.model(*input_batch))
    return torch.cat(prediction_list,0)

As far as I understand, current behaviour is correct, because GPU tensors are stored in a list and not freed, so that causes memory to fill up.

I want to save predictions for multiple batches on cpu memory. I’ve tried to modify the code to save the cpu part like this:

prediction_list.append(self.model(*input_batch.cpu()))

but it didn’t help, the GPU memory usage was still rising after every batch.

Current workaround that I use looks like this, but that doesn’t look right :):

pred = torch.from_numpy(self.model(*input_batch).cpu().data.numpy())
prediction_list.append(pred)

Is there a better way to save only a cpu part of a Tensor or sowehow tell pytorch to free GPU memory of a Tensor?

smth · May 5, 2017, 11:33am

pytorch might free GPU memory but it will not show up in nvidia-smi, because we use our on memory allocator.

Cubbee · May 5, 2017, 11:48am

The problem is that the script is failing with out of memory, so memory is not freed, unfortunately.

erogol · June 23, 2017, 12:44pm

I have the same problem here. When I do .cpu() for network output, by each iteration gpu memory starts to fill up. Did you find a more normative solution ?