Will PyTorch free intermediate tensors from GPU after function returns?

wasabi · November 30, 2020, 6:50pm

Hi guys, I want to put some structure on the weight matrix so I have something like this in the forward function.

class myNN(nn.Module):
    def __init__(self):
        super(myNN, self).__init__()
        self.weight = nn.Linear(a, b)
    def transform(self):
        transformed_weight = do_something(self.weight)
        return transformed_weight
    def forward(self, x):
        transformed_weight = self.transform()
        x = do_something2(transformed_weight, x)
        return x

do_something() and do_something2() create some intermediate tensors (also transformed_weight in forward() is also intermediate), will they be freed after forward() returns? I keep getting CUDA out of memory error after several iterations. Thanks.

googlebot · November 30, 2020, 9:28pm

It is weird to do_something() on modules, it is only correct to transform module parameters (but it is ok if you only use a module as a container). Otherwise, maybe you should use stuff from nn.functional instead.

Intermediate tensors are only released before backward() if they’re not needed for gradient computations; generally, multipliers are needed, but summands are not (thus inplace summations usually work).