Hello,

I am trying to learn the network with the following very simple architecture

```
class MyNet(nn.Module):
def __init__(self):
super(MyNet, self).__init__()
self.fc1 = nn.Linear(2048, 2048)
def forward(self, f1, f2):
res = torch.norm(self.fc1(F.relu(f1 - f2)), dim = 1)
return res
```

Then I do the following:

```
net = MyNet()
net.to(device)
net.eval()
n = 1000
m = 1000
dim = 2048
x = torch.randn(dim, n)
y = torch.randn(dim, m)
scores = torch.zeros(x.size()[1], y.size()[1])
x = x.to(device)
y = y.to(device)
for i in range(x.size()[1]):
print('\r>>>> ' + str(i + 1) + '/' + str(x.size()[1]), end = '')
for j in range(y.size()[1]):
scores[i, j] = net(x[:, i].unsqueeze(0), y[:, j].unsqueeze(0))
```

After I run this code I get `RuntimeError: CUDA error: out of memory`

. Looking on the output of nvidia-smi I observe that memory consumption is growing during the computations. It looks strange for me since I set model into `eval()`

mode and I do not need any gradients or something like this. All iterations are really independent so I expect that whole code needs as much memory as one iteration.

Can you explain what is the reason of such large memory consumption and how to overcome this? I expect that it is easy to overcome since all iterations are independent.

Thanks for the help!