Training is slow

I feel my training is very slow then I checked the gpu and cpu utilization and found the cpu utilization is abnormally high. I thought the code used for calculating loss could be optimize but I don’t understand which operation should I change.

def ctr(self, q, k, tau=1):
    logits =, k.t())
    labels = torch.LongTensor(range(q.shape[0])).cuda()
    loss = self.criterion(logits/tau, labels)
    return 2*tau*loss

Also I checked the pytorch profile and found the training is unstable. The time consuming is commonly in microsecond, but scale in millisecond happens about every 8 batches. Is this a problem?

Create the labels directly on GPU. When you use .cuda(), you are first creating the tensor on CPU and then moving the tensor to GPU which is slow. So instead use this

labels = torch.tensor(range(q.shape[0]), dtype=torch.int64, device=q.device)

(There is nothing special about using torch.tensor)

The training is faster slightly.
Another question is why the profile shows I used two ‘to’ operation. I thought I used ‘to’ only when I create labels.

I do not know about why two to operations are used? If you want to debug the reason, you can try commenting out lines in ctr and see when the number of calls reduces.