Hello, I have two cases as follows:
CASE1
k = []
for _ in range(500):
T = torch.nn.Parameter(torch.rand(10, 3000, 3000).cuda(1))
k.append(T.cpu())
which causes OOM, and the next case is
CASE2
k = []
for _ in range(500):
T = torch.nn.Parameter(torch.rand(10, 3000, 3000)).cuda(1)
k.append(T.cpu())
which does not raise OOM and the GPU memory is maintained as 1900 MB in entire for loop.
What makes difference between the two cases?