I am struggling to fit my model in a 16GB gpu due to cuda out of memory error. What is even more intriguing is that the model runs fine for roughly first 2000 steps and the memory allocated as per nvidia-smi starts increasing from 14GB to 16GB gradually and then crash finally. I have lot of tensors declared in the `forward`

function using method `new_tensor`

or `new_zeros`

which I suspect are not getting dereferenced or freed from memory and that’s why the accumulation from 14GB to 16GB is happening. Here is a dummy code

```
class Test(nn.Module):
def __init__(self):
super(Test, self).__init__()
self.weights = nn.Parameter(torch.zeros(5,5))
def forward(self, x):
dummy_constant = x.new_ones(self.weights.shape[0], x.shape[1])
output = self.weights @ x
output += dummy_constant
return output
```

```
model = Test()
for i in range(1,100):
x = torch.rand(5,i)
out = model(x)
#loss.backward() and other stuff
```

So all in all, will every instance of dummy_constant stay in memory even when it goes out of scope?