Removing models from GPU in loop

stonecold · December 21, 2019, 5:45pm

Hello, I am currently trying to progressively “widen” my network in a loop. The reason I am using the loop is because I want to compare some properties of the output as I increase the size. I can’t think of anything simpler than the following. I create a network with width that can be configured from the beginning

class onelayerNet(nn.Module):
    def __init__(self,scale):
        super(onelayerNet, self).__init__()
        self.fc1 = nn.Linear(10, 10*scale)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        return x

And then have the following loop

Net = {}
for i in range(iters):
    with torch.no_grad():
        scale=10*(i+1)
        Net[i] = onelayerNet(scale=scale)

        if i>0: #copy weights
            index =10*10*i #get shape of previous network
            Net[i].fc1.weight[0:index,:] = Net[i-1].fc1.weight[:,:] #copy weights
        
        Net[i].to(device)
        output = Net[i].forward(images);

        # do some stuff to the output

However, I can’t find anyway to remove the previous models from the GPU (as well as intermediate variables that change on each iteration like ‘output’). I’ve tried using del Net[i-1] after copying it’s weights but this doesn’t get removed from the GPU. I’ve tried ’ Net[i].to(‘cpu’)’ after computing the output, however this doesn’t help either.

Is there a smarter way to do this such that I won’t run into this issue? If not, is there a way I can remove unnecessary things from the GPU after they are used?

ptrblck · December 21, 2019, 7:48pm

If you delete the model and no other references are pointing to it, it will be freed.
Note that PyTorch uses a custom memory caching allocator, to avoid multiple memory allocations.
nvidia-smi will thus still show the GPU memory as used by PyTorch.

I would recommend to .clone() the parameters just to make sure these parameters are not referencing each other.