Runtime Error: Invalid buffer size when calculating cosine similarity

I keep getting a RuntimeError: Invalid buffer size: 2894.04 GB when trying to calculate cosine similarity as follows

DEVICE = "mps"

# usage_data is a (244456, 13) shape dataframe 
usage_data_tensor = torch.from_numpy(usage_data.values).float().to(DEVICE)
user_similarity = F.cosine_similarity(usage_data_tensor[:,:,None], usage_data_tensor.t()[None,:,:])

Hi,

I think this is expected? The broadcasting of these Tensors would require almost 3TB of memory to store it.
This also crash on CPU for me.