Cuda memory leak even when using cpu tensors

I have identified one part of my code that is causing memory leak in cuda:

for j in range(0, len(indices_unlbl)):

            inp_rep = inp[j].repeat(inputs.shape[0], 1)
            vec = inputs - inp_rep
            inp_rep = inp_rep.cpu()
            del inp_rep

            inf_entries = (torch.norm(input=vec, p=float('inf'), dim=1) > 0.1)
            pvtmtx_samples[i1: i1 + blk_size, j][inf_entries] = float('inf')
            
            non_inf_entries = (pvtmtx_samples[i1: i1 + blk_size, j] < float('inf'))
            norms_2 = torch.norm(input=vec[non_inf_entries], p=2, dim=1, dtype=float)
            norms_2 = norms_2.cpu()

            pvtmtx_samples[i1: i1 + blk_size, j][non_inf_entries] = norms_2

            inf_entries = inf_entries.cpu()
            non_inf_entries = non_inf_entries.cpu()
            vec = vec.cpu()

            del norms_2
            del inf_entries
            del non_inf_entries
            del vec

The line pvtmtx_samples[i1: i1 + blk_size, j][non_inf_entries] = norms_2 is causing cuda memory increase in every iteration, if I remove this line the problem vanishes. what I don’t understand is why this is happening even if “pvtmtx_samples” and “norms_2” are both on cpu.

Could you post a minimal and executable code snippet reproducing the CUDA memory increase if CPUTensors are stored, please?