Custom loss functions

abhinavtripathi95 · June 12, 2020, 3:29pm

Hello @ptrblck,
I am using a custom contrastive loss function as

def loss_contrastive(euclidean_distance, label_batch):
    margin = 100
    loss = torch.mean( (label_batch) * torch.pow(euclidean_distance, 2) +
                    (1-label_batch) * torch.pow(torch.clamp(margin - euclidean_distance, min=0.0), 2))

However, I get this error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-95-9478fc9e762e> in <module>
----> 1 interp = Interpretation.from_learner(learn)

~/anaconda3/envs/pytorch/lib/python3.6/site-packages/fastai/train.py in from_learner(cls, learn, ds_type, activ)
    158     def from_learner(cls, learn: Learner,  ds_type:DatasetType=DatasetType.Valid, activ:nn.Module=None):
    159         "Gets preds, y_true, losses to construct base class from a learner"
--> 160         preds_res = learn.get_preds(ds_type=ds_type, activ=activ, with_loss=True)
    161         return cls(learn, *preds_res)
    162 

~/anaconda3/envs/pytorch/lib/python3.6/site-packages/fastai/basic_train.py in get_preds(self, ds_type, activ, with_loss, n_batch, pbar)
    339         callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(self.callbacks)
    340         return get_preds(self.model, self.dl(ds_type), cb_handler=CallbackHandler(callbacks),
--> 341                          activ=activ, loss_func=lf, n_batch=n_batch, pbar=pbar)
    342 
    343     def pred_batch(self, ds_type:DatasetType=DatasetType.Valid, batch:Tuple=None, reconstruct:bool=False,

~/anaconda3/envs/pytorch/lib/python3.6/site-packages/fastai/basic_train.py in get_preds(model, dl, pbar, cb_handler, activ, loss_func, n_batch)
     44            zip(*validate(model, dl, cb_handler=cb_handler, pbar=pbar, average=False, n_batch=n_batch))]
     45     if loss_func is not None:
---> 46         with NoneReduceOnCPU(loss_func) as lf: res.append(lf(res[0], res[1]))
     47     if activ is not None: res[0] = activ(res[0])
     48     return res

TypeError: loss_contrastive() got an unexpected keyword argument 'reduction'

How to make the loss function compatible?

ptrblck · June 13, 2020, 2:49am

It seems you are using your custom loss function in FastAI, which apparently expects the reduction keyword for all loss functions.
A potential workaround would be to add the reduction argument,only accept 'mean' as a valid input type, and raise a NotImplementedError for other values.

marziehoghbaie · June 24, 2020, 12:46pm

Hi,
I use the same loss function, but I get this error: RuntimeError: Can only calculate the mean of floating types. Got Long instead.
Can somebody help me out?

ptrblck · June 25, 2020, 3:50am

The error describes that you should pass floating point tensors, so it seems at least one of the tensors (maybe both) are LongTensors.
Convert them to FloatTensors and it should work:

torch.mean((output.float() - target.float())**2)

marziehoghbaie · June 25, 2020, 8:34am

TNX, I also have my own loss function, but using it, my model parameters are not updated and stay the same. I opened another issue; Could you help me with that? I’m struggling with that for about a week and not making any progress.
This is the page of my issue

Ignacio_Hernandez · June 30, 2020, 3:23pm

I’ve been recently working on supervised contrastive learning. After several experiments using the triplet loss for image classification, I decided to implement a new function to add an extra penalty to this triplet loss. This function uses the coefficient of variation (stddev/mean) and my idea is based on this paper:

Learning 3D Keypoint Descriptors for Non-Rigid Shape Matching

Their GitHub repo is here, but it is not very well documented in my opinion.

I’ve been working on Pytorch for the last year and it’s still difficult to me to fully understand how to create loss functions properly. This is the loss function I wrote:

class CoefficientVariationLoss(nn.Module):
    def __init__(self, class_to_idx, weighted=False):
        super(CoefficientVariationLoss, self).__init__()
        self.class_to_idx = class_to_idx
        self.weighted = weighted

    def forward(self, x, labels):
        total_coeff_variation = 0
        l2_norm_positive = F.pairwise_distance(x[0], x[1],2)

        for c in self.class_to_idx.values():
            count_label_pos = torch.sum(labels[0]==c).float()
            if count_label_pos <= 1:
                class_cv = 0
            else:
                class_pos = l2_norm_positive[labels[0]==c]

                if self.weighted:
                    weight = count_label_pos/labels[0].shape[0]
                    class_cv = (torch.std(class_pos)/torch.mean(class_pos))*weight
                else:
                    class_cv = (torch.std(class_pos)/torch.mean(class_pos))
                
                # if torch.isnan(class_cv):
                #     class_cv = 0

            total_coeff_variation += class_cv

        return total_coeff_variation

The two inputs of this function are:

x: x[0], x[1] and x[2] are the embeddings of a triplet (anchor, positive, negative)
labels: labels[0], labels[1] and labels[2] are the classes of each element in the triplet (anchor, positive, negative)

My custom dataloader is serving the data as a dictionary:

sample = {'images': [...], 'labels': [...]}

and the output of my model (triplet network) is a list of tensors in the same order: anchor, positive and negative.

The final loss is finally computed as

loss = loss_triplet + FACTOR * loss_coeffVariation

Have I created this loss correctly? I don’t know if can use indexing in loss functions because of differentiability issues…

Thanks!

ptrblck · July 3, 2020, 7:14am

The indexing operation is differentiable in PyTorch and shouldn’t detach the graph.
You could test, if your custom loss implementation detaches the computation graph by calling backward() on the created loss and printing all gradients in the model’s parameters.
If you see valid values, Autograd was able to backpropagate. On the other hand, if your loss function cuts the graph, the model shouldn’t get gradients at all.

Ignacio_Hernandez · July 3, 2020, 7:37am

Thanks for the reply. How should I test it?

loss_coeffVariation.backward()

or

loss = loss_triplet + FACTOR * loss_coeffVariation
loss.backward()

ptrblck · July 3, 2020, 7:58am

I would test both losses separately and make sure that you get gradient from them.
If you check the gradients from the loss sum, one loss might still be faulty, while the other will be responsible for the actual backward pass.

Ignacio_Hernandez · July 3, 2020, 8:42am

I tried this:

loss_coeffVariation.backward()
debug_params = list(model.projection_head_1.parameters())
print(debug_params[0].grad)

Looks like it’s working:

tensor([[-0.0072, -0.0012,  0.0003,  ..., -0.0034, -0.0048, -0.0048],
        [ 0.0044,  0.0127,  0.0045,  ..., -0.0073,  0.0037,  0.0164],
        [-0.0013, -0.0093, -0.0183,  ...,  0.0024, -0.0047, -0.0115],
        ...,
        [ 0.0073,  0.0121,  0.0077,  ...,  0.0079,  0.0030,  0.0055],
        [ 0.0016, -0.0013, -0.0009,  ..., -0.0050, -0.0048,  0.0136],
        [ 0.0140,  0.0030, -0.0045,  ...,  0.0096,  0.0146,  0.0027]],
       device='cuda:0')

I have more questions regarding how to create custom loss functions, but I’m not sure if this is the right thread to do that. Should I look for another thread/create a new one or keep asking here?

Thanks!

ptrblck · July 4, 2020, 2:31am

You can follow-up here in this thread, as the question seems to be related.

saba · July 21, 2020, 3:35am

Hi Ptrblck,

Sorry to take your time. I need to use wesseterian distance. I found the correct link but I can not implement the code the link is (https://www.kernel-operations.io/geomloss/api/pytorch-api.html#geomloss.SamplesLoss )
indeed I don’t know where are these modules:

from .kernel_samples import kernel_tensorized, kernel_online, kernel_multiscale
from .sinkhorn_samples import sinkhorn_tensorized
from .sinkhorn_samples import sinkhorn_online
from .sinkhorn_samples import sinkhorn_multiscale
from .kernel_samples import kernel_tensorized as hausdorff_tensorized
from .kernel_samples import kernel_online     as hausdorff_online
from .kernel_samples import kernel_multiscale as hausdorff_multiscale

ptrblck · July 21, 2020, 4:45am

These classes are imported from the geomloss repository.
E.g. .sinkhorn_samples is defined here.
You would probably need to clone the repository (or install the package) to run the script.

saba · July 21, 2020, 7:55am

Finally I run the code. The loss is between two datasets by size of 64x9x9 .I used this command.

# Define a Sinkhorn (~Wasserstein) loss between sampled measures
loss = SamplesLoss(loss="sinkhorn", p=2, blur=.05)
L = loss(x, y)

x and y are two dataset including 64 batch size of 9x9 image pixel values . The values are pixel intensity in 9x9.
Do you think I should pass x and y as 65x9x9 or 64x81x1 or 64x81? I tried both no error occurred but the loss was different.

Many thanks

ptrblck · July 22, 2020, 1:04am

Sorry, I’m not familiar with the repository and would recommend to create an issue there, if the examples or other documents don’t help.

saba · August 17, 2020, 1:53am

Hi Ptrblck,
I am running the training code I am not sure what mone means ? why they used mone ? I think it is kind of the label which want to mention that do backward by the condition that label is -1. Am I right?

one = torch.FloatTensor([1])
mone = one * -1

output = netD(real_cpu).view(-1)
errD_real = output.mean(0).view(1)
errD_real.backward(mone)

ptrblck · August 17, 2020, 8:10am

mone is most likely the variable name for "minus one" and is used as the gradient for errD_real.

I don’t know, it mone is used as a label later in the code, but in your code snippet it’s used as the gradient (by default .backward() would use torch.ones(1) as the gradient input).

saba · August 24, 2020, 1:50am

Hi Ptrblck,

I want to use two loss functions for updating my generator (NetG). The first one is Binary cross entropy and the second one is MSE . I am not sure that if the order of the command are correct . I need to optimize the G Network (optimizerG) based on two loss.

        netG.zero_grad()
        fake=NeG(noise)
        label.fill_(real_label)
        output = netD(fake).view(-1)
# the first loss Binary cross entropy----
        errG1 = criterion(output, label)
## - Adding the second loss (MSE) from Gaussy mask to update generator ----
        Difference=CMBMASKGaussy-fake
        MeanSquareError=(Difference**2).mean()
## --- ALL defined loss ----------------        
        errG=errG1+MeanSquareError
## ---- back propagate-----------------        
        errG.backward()        
        # Update G
        optimizerG.step()

ptrblck · August 24, 2020, 4:42am

The code looks alright.
While the errG1 loss would create gradients in netG and netD (you might need to zero out the gradients for netD before trying to update it), the MeanSquareError loss would only create the gradients for netG.

saba · August 24, 2020, 4:51am

Here I just want to update the NetG, NetD will not be updated., I think the code is correct for updating the generator