Traceback (most recent call last):
File "main_supcon.py", line 265, in <module>
main()
File "main_supcon.py", line 247, in main
loss = train(train_loader, model, criterion, optimizer, epoch, opt)
File "main_supcon.py", line 205, in train
loss = criterion(features1,features2)
File "/DATA/rani.1/miniconda3/envs/sim_test/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/DATA/rani.1/SupSim/losses.py", line 106, in forward
sim_ij = torch.diag(similarity_matrix, self.batch_size)
RuntimeError: invalid argument 2: invalid size at /opt/conda/conda-bld/pytorch_1623448233824/work/aten/src/THC/THCStorage.cpp:26

when batch size 512 , error

denominator = self.negatives_mask * torch.exp(similarity_matrix / self.temperature)
RuntimeError: The size of tensor a (1024) must match the size of tensor b (672) at non-singleton dimension 1

I feel the same , but dimension after cosine similarity is causing the error I guess.
Even program starts running but stops at the last step of epoch when
Dimension are as follows:
Representation: torch.Size([672, 128])
Similarity Matrix torch.Size([672, 672])

Before the last step of a epoch ,dimension is as follow
Representation: torch.Size([1024, 128])
Similarity Matrix torch.Size([1024, 1024])

672 is not an issue because the total number of inputs is not multiple of 1024 if remove 672 items it will run that I feel,
you also got a representation matrix and similarity matrix.
the issue I feel is in
sim_ij and sim_ji the torch.diag, because 672 is less than 1024.

are you getting the output of sim_ij and sim_ji in the last step of epoch?

it return the diagonal from the matrix. for example in 3x3 it can only return 5 diagonals (ranging from -2 to 2)
so in case of of 672x672 matrix how it will return 1024 diagonal.