Pytorch 0.4: CUDA OOM when concatenating after running inference

pytorch 0.4: CUDA OOM when concatenating after running inference

Before upgrading to pytorch 0.4, I used to be able to run inference on hundreds of 2D images, concatenate them on GPU, and then move on to the next stage of my pipeline. With the 0.4 version, I run into “cuda runtime error (2) : out of memory at…”, so I had to modify my code. Just not sure if this is the most efficient way of doing it. Here is a description and snippets of the code for inference, before and after pytorch 0.4 (I was on pytorch 0.3.1 before).

CUDA/cuDNN installed: 9.0/7.0.5

Code Desc: Iterate through the slices of a 3D CT volume, run prediction for semantic segmentation, append predictions to a list, concatenate preds to create a volume.

pytorch 0.3.1

ct.shape # (382, 320, 320, 3)
pred = []
for idx, sl in tqdm(enumerate(ct), total=ct.shape[0]):
    tensor = torch.unsqueeze(torch.from_numpy(sl.transpose(2,0,1)), 0)
    pred.append(model(Variable(tensor.cuda(), volatile=True)))
vol_pred = torch.cat(pred, 0)

pytorch 0.4

ct.shape # (382, 320, 320, 3)
pred = []
for idx, sl in tqdm(enumerate(ct), total=ct.shape[0]):
    tensor = torch.unsqueeze(torch.from_numpy(sl.transpose(2,0,1)), 0)
    raw_pred = model(Variable(tensor.cuda(), requires_grad=False))
    pred.append(raw_pred.data.cpu())
vol_pred = torch.cat(pred, 0)

You want wrap everything into with torch.no_grad(): for pytorch 0.4.

Best regards

Thomas

Thank you! This is the call I make now:

with torch.no_grad():
    pred = []
    for idx, sl in tqdm(enumerate(ct), total=ct.shape[0]):
        tensor = torch.unsqueeze(torch.from_numpy(sl.transpose(2,0,1)), 0)
        pred.append(model(Variable(tensor.cuda())))
    vol_pred = torch.cat(pred, 0)