Low storage memory for torch.norm between rows

Hello all !

I am trying to compute the distance between rows in a matrix. To do that, I use torch.norm(input - input[:, None], dim = 2, p=p)**p. This works really fine and it gives me what I want. However, I want now to do this with layer’s outputs which are really huge matrix and I run out of cuda memory. I also took a look at the pairwise_distance function but it does not give what I want as the output size is (Batch, 1) and I want (Batch, Batch).

Do you have a stable and low memory solution for this problem?