CUDA out of memory when a Function is used multiple times

Hi, there

When I use a Function multiple time in an iteration, the CUDA memory continuously increases. It is worth noting the Function calls save_for_backward(). The problem disappears when replacing the Function with one does not call save_for_backward(). Any ideas?

Sample code as below:

import torch
from torch.autograd import Function
from torch.autograd import Variable

class Identity(Function):

    def forward(self, input):
        return input

    def backward(self, grad_output):
        return grad_output

class Linear(Function):

    def forward(self, input, weight):
        self.save_for_backward(input, weight)

    def backward(self, grad_output):

        input, weight = self.saved_tensors
        grad_input = grad_weight = None

        if self.needs_input_grad[0]:
            grad_input =
        if self.needs_input_grad[1]:
            grad_weight = grad_output.t().mm(input)

        return grad_input, grad_weight

x = Variable(torch.rand(4000, 3000).cuda(), requires_grad=True)
w = Variable(torch.rand(3000, 3000).cuda(), requires_grad=True)

grad_output = torch.rand(4000, 3000).cuda()

lr = 0.01
for i in range(10000):

    # (1) cuda memory stays the same
    # identity = Identity()
    # loss1 = identity(x)
    # loss2 = identity(x)

    # (2) cuda memory continuously increase
    linear = Linear()
    loss1 = linear(x, w)
    loss2 = linear(x, w)

    loss = loss1 + loss2
    loss.backward(grad_output) = - lr *

    if i % 100 == 0:

Functions are never meant to be reused. Use torch.nn.functional.linear.

Module can be reused. However, when the same Module calls forward() multiple times, it actually creates a new Function object each time, only the Parameters in it (e.g. weight in nn.Linear) are shared. Am I right? Thanks.

Yes. Modules are safe to be reused, Functions aren’t.