I have identified one part of my code that is causing memory leak in cuda:

```
for j in range(0, len(indices_unlbl)):
inp_rep = inp[j].repeat(inputs.shape[0], 1)
vec = inputs - inp_rep
inp_rep = inp_rep.cpu()
del inp_rep
inf_entries = (torch.norm(input=vec, p=float('inf'), dim=1) > 0.1)
pvtmtx_samples[i1: i1 + blk_size, j][inf_entries] = float('inf')
non_inf_entries = (pvtmtx_samples[i1: i1 + blk_size, j] < float('inf'))
norms_2 = torch.norm(input=vec[non_inf_entries], p=2, dim=1, dtype=float)
norms_2 = norms_2.cpu()
pvtmtx_samples[i1: i1 + blk_size, j][non_inf_entries] = norms_2
inf_entries = inf_entries.cpu()
non_inf_entries = non_inf_entries.cpu()
vec = vec.cpu()
del norms_2
del inf_entries
del non_inf_entries
del vec
```

The line `pvtmtx_samples[i1: i1 + blk_size, j][non_inf_entries] = norms_2`

is causing cuda memory increase in every iteration, if I remove this line the problem vanishes. what I don’t understand is why this is happening even if “pvtmtx_samples” and “norms_2” are both on cpu.