I feel my training is very slow then I checked the gpu and cpu utilization and found the cpu utilization is abnormally high. I thought the code used for calculating loss could be optimize but I don’t understand which operation should I change.
def ctr(self, q, k, tau=1):
logits = torch.mm(q, k.t())
labels = torch.LongTensor(range(q.shape[0])).cuda()
loss = self.criterion(logits/tau, labels)
return 2*tau*loss
Also I checked the pytorch profile and found the training is unstable. The time consuming is commonly in microsecond, but scale in millisecond happens about every 8 batches. Is this a problem?
Create the labels directly on GPU. When you use .cuda(), you are first creating the tensor on CPU and then moving the tensor to GPU which is slow. So instead use this
I do not know about why two to operations are used? If you want to debug the reason, you can try commenting out lines in ctr and see when the number of calls reduces.