Help Needed with defining Custom Loss Function

Hey, I’ve been recently trying to implement Supervised Contrastive Loss for Semantic Segmentation at the pixel-level. I’m a bit new to PyTorch and I’m finding it difficult to incorporate it to Semantic Segmentation model. Here’s the Supervised Contrastive Loss for Image Classifaction:(

class SupConLoss(nn.Module):
    def __init__(self, temperature=0.07, contrast_mode='all',
        super(SupConLoss, self).__init__()
        self.temperature = temperature
        self.contrast_mode = contrast_mode
        self.base_temperature = base_temperature

    def forward(self, features, labels=None, mask=None):
            features: hidden vector of shape [bsz, n_views, ...].
            labels: ground truth of shape [bsz].
            mask: contrastive mask of shape [bsz, bsz], mask_{i,j}=1 if sample j
                has the same class as sample i. Can be asymmetric.
            A loss scalar.
        device = (torch.device('cuda')
                  if features.is_cuda
                  else torch.device('cpu'))

        if len(features.shape) < 3:
            raise ValueError('`features` needs to be [bsz, n_views, ...],'
                             'at least 3 dimensions are required')
        if len(features.shape) > 3:
            features = features.view(features.shape[0], features.shape[1], -1)

        batch_size = features.shape[0]
        if labels is not None and mask is not None:
            raise ValueError('Cannot define both `labels` and `mask`')
        elif labels is None and mask is None:
            mask = torch.eye(batch_size, dtype=torch.float32).to(device)
        elif labels is not None:
            labels = labels.contiguous().view(-1, 1)
            if labels.shape[0] != batch_size:
                raise ValueError('Num of labels does not match num of features')
            mask = torch.eq(labels, labels.T).float().to(device)
            mask = mask.float().to(device)

        contrast_count = features.shape[1]
        contrast_feature =, dim=1), dim=0)
        if self.contrast_mode == 'one':
            anchor_feature = features[:, 0]
            anchor_count = 1
        elif self.contrast_mode == 'all':
            anchor_feature = contrast_feature
            anchor_count = contrast_count
            raise ValueError('Unknown mode: {}'.format(self.contrast_mode))

        # compute logits
        anchor_dot_contrast = torch.div(
            torch.matmul(anchor_feature, contrast_feature.T),
        # for numerical stability
        logits_max, _ = torch.max(anchor_dot_contrast, dim=1, keepdim=True)
        logits = anchor_dot_contrast - logits_max.detach()

        # tile mask
        mask = mask.repeat(anchor_count, contrast_count)
        # mask-out self-contrast cases
        logits_mask = torch.scatter(
            torch.arange(batch_size * anchor_count).view(-1, 1).to(device),
        mask = mask * logits_mask

        # compute log_prob
        exp_logits = torch.exp(logits) * logits_mask
        log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))

        # compute mean of log-likelihood over positive
        mean_log_prob_pos = (mask * log_prob).sum(1) / mask.sum(1)

        # loss
        loss = - (self.temperature / self.base_temperature) * mean_log_prob_pos
        loss = loss.view(anchor_count, batch_size).mean()

        return loss

My Segmentation model returns output features of shape [bsz, feat_dim, h, w] , and the target’s shape and target embedding’s shape are [bsz, h, w] & [bsz, emb_dim, h, w] respectively.
I’m not sure if this is the right place to post this. Can someone help me with this.


Do you see any issues, e.g. are errors raised or is the model not converging with this custom loss, or would you just want someone to double check the implementation? :slight_smile:

Hi @ptrblck, Thanks for the reply. My problem here is SupConLoss, which is defined for Classification problems takes in features of shape 3D, [bsz, n_views, …], and labels of shape 1D, [bsz], but my Segmentation model gives output features of shape 4D, [bsz, feat_dim, h, w] and labels are of shape 3D, [bsz, h, w]. Is there anyway I can convert this loss so that it fits for my output features shape & labels shape.

Thank you

I’m not sure what the exact difference between the linked paper and this one is, but the latter was implemented in this repository, which might be a good base for you to implement your SupConLoss.
Alternatively, could you point me to the section of the paper, which corresponds to your implementation, so that I could try to figure out the shapes?

Hi @ptrblck, I guess the only difference between SimCLR & SupContrast is this,

I guess Section 3 of the SupContrast paper corresponds to the implementation of the loss. Meanwhile, I’m trying to implement the loss into this Semantic Segmentation repo.

Thank you:)

If the main difference is the number of positive/negative samples, then I would assume the linked code would be reusable. Did you compare it to your code including how the shapes are calculated?

Hi @ptrblck, The main problem for me is, in the original implementation of SupContrast, labels [ground truth] are just numbers from 0-9, but in my case ground truth is an image of shape [bsz, h,w], what I wanted to do is I wanted to define supervised contrastive loss at the pixel level, that is If two pixels are labeled with the same class, I want to make them a positive pair, otherwise, a negative pair.

Hey @ptrblck, I’ve tried this for calculating the pixel-wise loss using the same SupConLoss as above.

def training(self, epoch):
        train_loss = 0.0
        tbar = tqdm(self.trainloader)
        for k, (image, target) in enumerate(tbar):
            self.scheduler(self.optimizer, k, epoch, self.best_pred)
            image, target = image.cuda(), target.cuda()
            features = self.model(image)
            b,c,h,w = features.size()
            for i in range(h):
                for j in range(w):
                    f = features[:,:,i,j]          #[b,c]
                    f = f.unsqueeze(2)             #[b,c,1]
                    t = target[:,i,j]              #[b]
                    loss = self.criterion(f, t)
            return loss
            train_loss += loss.item()
            tbar.set_description('Train loss: %.3f' % (train_loss / (k + 1)))

My GPU utilization is at 95% but I’m not sure if the code is running or not. Can you double check it once. Thanks:)

The code looks alright, but note that train_loss += loss.item() and the print statement won’t be executed, as you are using a return statement before these lines of code.
Besides that I don’t know, if you really need to use retain_graph=True, since you are re-calculating the loss in each iteration.

So I’ve changed my code to this

def training(self, epoch):
        train_loss = 0.0
        tbar = tqdm(self.trainloader)
        for k, (image, target) in enumerate(tbar):
            self.scheduler(self.optimizer, k, epoch, self.best_pred)
            image, target = image.cuda(), target.cuda()
            features = self.model(image)
            b,c,h,w = features.size()
            for i in range(h):
                for j in range(w):
                    f = features[:,:,i,j]
                    f = f.unsqueeze(2)
                    t = target[:,i,j]
                    loss = self.criterion(f, t)
            train_loss += loss.item()
            tbar.set_description('Train loss: %.3f' % (train_loss / (k + 1)))

And I’ve been getting this error:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

Your loop is wrong, you are not changing f and t before running the backward() step again (or at least that’s more or less what that error means).

Comment out the backward() and step() and debug f and t and see what they are doing.

Hey @victorc25, Can you help me a bit with how to redefine the loop. I am a bit struck with this. Thanks:)

Hi @ptrblck,
I am working on the same loss function.

But my input to loss function (Zi, Zj, Zk from equation no.4 in this paper ) are feature maps which has size of (num_features, h, w) = (128,8,8).

I looked at above code of loss function implementation:
class SupConLoss(nn.Module):

But I have no idea what’s going on in that function and moreover their input size is different. So, I was thinking to code the mathematics of that equation which is very complex. I will need several for loops and some logic to differentiate i, j and k.

Batch size=100, so for each i (0 to 99), I have 9 different j values (because in 1 batch there are 9 images having same label as 'i’th image) making k=90(90 images having labels different than that of ‘i’th image). Like wise I have to go through all the images’ feature maps. Can you please suggest how should I proceed?

In addition to that I have doubt on, how to compute dot product of Zi having size of [128,8,8] and Zj of same size which should give me a single value (a scalar value)?

I would recommend to start with creating the right loss function first even if it’s slow in the first version.
I.e. you shouldn’t care too much about nested loop in the first iteration and make sure the outputs and gradients are expected.
Once this is done you can use this slow method as the baseline and try to use vectorized operations to speed it up.

Hi @ptrblck, I have some doubts about the code posted by the person who created this thread.
I could understand till this line:

# for numerical stability

Can you please explain the remaining code?

That loss function (class SupConLoss(nn.Module):) is based on the equation no.4 in this paper.

upon comparing the equation and the code, I don’t know where exactly in the code they are separating`` Zi, Zj and Zk values for each Li value and moreover I don’t see any loop which would iterate over all values of i. So, I guess that computes loss only for the first image of the batch.

In this file, I don’t know what line no.231 (losses.update(loss.item(), bsz) does. I guess it’s just for printing purpose only.

Thank you for sparing your time.

I think @user432 could better explain the posted code, as I’m unfortunately not familiar with the mentioned paper. :wink:

Yes, the AverageMeter object (which is the class of the losses object) is used for printing purposes and supports calculating the average etc.

I’m trying to implement this supervised contrastive loss function for the task of semantic segmentation too. Does anybody have any suggestions on where I can find the appropriate implementation of this loss for semantic segmentation? many thanks!