Build your own loss function in PyTorch


#41

I can’t agree more



#42

Writing a loss function is no different from writing a neural network, or an autograd function.

Here’s an example of writing a mean-square-error loss function:

def mse_loss(input, target):
    return ((input - target) ** 2).sum() / input.data.nelement()

(Vipin Chaudhary) #44

@smth but will this version have ability to backpropagate ? i think we need to perform those functions on autograd Variable ??


(Alban D) #45

Yes, smth’s function is taking Variables as input. So you will be able to backpropagate.


(Spandan Madan) #46

Hi all,

I struggled with this myself, so I’ve started building a tutorial for such stuff in PyTorch. You can find a section on custom losses there too (Section 5). Github link - https://github.com/Spandan-Madan/A-Collection-of-important-tasks-in-pytorch

I wrote this up quickly in my free time so it must have some typos etc. If you think there’s things you would like to see there but are missing, feel free to create an issue on GitHub to make suggestions. Hope this helps!


#48

I have already implemented my own loss in python, but it is too slow. Is there any tutorials which can teach me
to speed it up?(there is a for loop in my loss)


(chaoyang) #49

if the individual loss for a sample in a batch can be positive or negative depending on some conditions, how do i sum the loss over samples? it will become zero if i sum all the samples within a batch.


(gaoyang) #50

excuse me, have you figured this out? So it’s necessary that writing a custom backward function and then return the gradient by self?


(Sunitha Kanuri) #52

Hello,
I would like to use euclidean loss in pytorch. I was writing the formaula. But it is not working. Is this loss function already available in pytorch library? How do i use euclidean loss in network. Thank you.


(Anuvabh) #53

In pytorch it is called MSELoss: http://pytorch.org/docs/0.3.0/nn.html#torch.nn.MSELoss


(Sunitha Kanuri) #54

Thank you very much for the help.

In this paper, they have used euclidean loss for translation and orientation. Can i use the same loss function by using MSEloss for the regression problem?


(Ken) #56

Hi, when you give Euclidean loss between x1 and x2,

loss = torch.norm(x1 - x2, 2)

seems proper implementation.


(Weize Quan) #57

Hi Adam,
I have read this post several times, however, i don’t understand some terminologies, such as “re-wrap the .data in a new Variable”, “.data unpacking”, and “.data repacking”, would you mind showing some examples. Thank you so much.

In addition, i have a special requirement for center_loss, i.e., i need to set different weight to each class. So I refine the code (https://github.com/BestSonny/examples/tree/master/center_loss) show in PyTorch exmaples, and reinplement this loss function by myself.

In exmaples,
trainer.py:
def get_center_loss(centers, features, target, alpha, num_classes):
batch_size = target.size(0)
features_dim = features.size(1)

target_expand = target.view(batch_size,1).expand(batch_size,features_dim)
centers_var = Variable(centers)
centers_batch = centers_var.gather(0,target_expand)
criterion = nn.MSELoss()
center_loss = criterion(features,  centers_batch)

diff = centers_batch - features
unique_label, unique_reverse, unique_count = np.unique(target.cpu().data.numpy(), return_inverse=True, return_counts=True)
appear_times = torch.from_numpy(unique_count).gather(0,torch.from_numpy(unique_reverse))
appear_times_expand = appear_times.view(-1,1).expand(batch_size,features_dim).type(torch.FloatTensor)
diff_cpu = diff.cpu().data / appear_times_expand.add(1e-6)
diff_cpu = alpha * diff_cpu
for i in range(batch_size):
    centers[target.data[i]] -= diff_cpu[i].type(centers.type())

return center_loss, centers

the call of this function:
center_loss, self.model._buffers[‘centers’] = get_center_loss(self.model._buffers[‘centers’], self.model.features, target_var, self.alpha, self.model.num_classes)
softmax_loss = self.criterion(output, target_var)
loss = self.center_loss_weight*center_loss + softmax_loss

My refinement:

self.centers = torch.zeros(num_classes, embedding_size).type(torch.FloatTensor) # 2d tensor
x = self.fc2(x)
self.features = F.relu(x) # 2D tensor

def get_center_loss(self, target, class_weight, alpha):
batch_size = target.size(0)
features_dim = self.features.size(1)

    target_expand = target.view(batch_size,1).expand(batch_size,features_dim)

    centers_var = Variable(self.centers)
    centers_batch = centers_var.gather(0,target_expand).cuda()

    abnormal_loss = Variable(torch.FloatTensor([0]), requires_grad=True)
    normal_loss = Variable(torch.FloatTensor([0]), requires_grad=True)
    for i in range(batch_size):
        if target.data[i] == 0:
            #abnormal_loss += torch.sum((self.features.data[i,:] - centers_batch.data[i,:]) **2)
            abnormal_loss = abnormal_loss.clone() + (self.features.data[i,:] - centers_batch.data[i,:]).pow(2).sum()
        else:
            #normal_loss += torch.sum((self.features.data[i,:] - centers_batch.data[i,:]) **2)
            normal_loss = normal_loss.clone() + (self.features.data[i,:] - centers_batch.data[i,:]).pow(2).sum()
    center_loss = class_weight[0] * abnormal_loss + class_weight[1] * normal_loss
    center_loss = center_loss/features_dim/batch_size


    diff = centers_batch - self.features

    unique_label, unique_reverse, unique_count = np.unique(target.cpu().data.numpy(), return_inverse=True, return_counts=True)

    appear_times = torch.from_numpy(unique_count).gather(0,torch.from_numpy(unique_reverse))

    appear_times_expand = appear_times.view(-1,1).expand(batch_size,features_dim).type(torch.FloatTensor)

    diff_cpu = diff.cpu().data / appear_times_expand.add(1e-6)
    
    #∆c_j =(sum_i=1^m δ(yi = j)(c_j - x_i)) / (1 + sum_i=1^m δ(yi = j))
    diff_cpu = alpha * diff_cpu

    for i in range(batch_size):
        #Update the parameters c_j for each j by c^(t+1)_j = c^t_j − α · ∆c^t_j
        self.centers[target.data[i]] -= diff_cpu[i].type(self.centers.type())

    return center_loss, self.centers, colorization_loss/features_dim/batch_size, normal_loss/features_dim/batch_size

Here, class_weight = torch.FloatTensor([10, 100]) # two weight for center_loss (binary classification)

The call:
criterion = nn.CrossEntropyLoss().cuda()
prediction = model(data_var)

    center_loss, xx, abnormal_loss, normal_loss = model.get_center_loss(target_var, class_weight, args.alpha)
    classfier_loss = criterion(prediction.cuda(), target_var.cuda())
	 
    loss = center_loss.cuda() + classfier_loss
    # compute gradient and update weights
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Is is right for this code? can this backward to change the weights of model?


(jpeg729) #58

The .clone() here is unnecessary. The addition operation clones the Variable, so you don’t have to do so explicitly. In fact if you do, you just add an extra copy in memory. That said, if you were to do an inplace addition +=, then using .clone() might be necessary, but even then, I would wait until PyTorch complained about the inplace operation.

If I understand correctly, the actual loss that needs to be backpropagated is center_loss.
Now center_loss = weighted sum of abnormal_loss and normal_loss so gradients can flow back up to abnormal_loss and normal_loss.
But both of those are calculated from Tensors, not from Variables, so the gradients will go no further. Try this instead…

abnormal_loss = abnormal_loss + (self.features[i,:] - centers_batch[i,:]).pow(2).sum()

Same for normal_loss

normal_loss = normal_loss + (self.features[i,:] - centers_batch[i,:]).pow(2).sum()