How do I use a ByteTensor in a contrastive cosine loss function?

I’m trying to implement the loss function in http://anthology.aclweb.org/W16-1617 in PyTorch. It is shown as follows:

enter image description here

I’ve implemented the loss as follows:

class CosineContrastiveLoss(nn.Module):
    """
    Cosine contrastive loss function.
    Based on: http://anthology.aclweb.org/W16-1617
    Maintain 0 for match, 1 for not match.
    If they match, loss is 1/4(1-cos_sim)^2.
    If they don't, it's cos_sim^2 if cos_sim < margin or 0 otherwise.
    Margin in the paper is ~0.4.
    """

    def __init__(self, margin=0.4):
        super(CosineContrastiveLoss, self).__init__()
        self.margin = margin

    def forward(self, output1, output2, label):
        cos_sim = F.cosine_similarity(output1, output2)
        loss_cos_con = torch.mean((1-label) * torch.div(torch.pow((1.0-cos_sim), 2), 4) +
                                    (label) * torch.pow(cos_sim * torch.lt(cos_sim, self.margin), 2))
        return loss_cos_con

However, I’m getting an error saying:

TypeError: mul received an invalid combination of arguments - got (torch.cuda.ByteTensor), but expected one of:
 * (float value)
      didn't match because some of the arguments have invalid types: (torch.cuda.ByteTensor)
 * (torch.cuda.FloatTensor other)
      didn't match because some of the arguments have invalid types: (torch.cuda.ByteTensor)

I know that torch.lt() returns a ByteTensor, but if I try to coerce it to a FloatTensor with torch.Tensor.float() I get AttributeError: module 'torch.autograd.variable' has no attribute 'FloatTensor'.

I’m really not sure where to go from here. It seems logical to me to do an element-wise multiplication between the cosine similarity tensor and a tensor with 0 or 1 based on a less-than rule.

You have to coerce the tensor, not the variable. The variable is a container used to build graphs and do the backprop for you. You need to set the variable with a float tensor

That makes sense! Unfortunately I have no idea how to do that.

I tried:

    def forward(self, output1, output2, label):
        cos_sim = F.cosine_similarity(output1, output2)
        loss_cos_con = torch.mean((1-label) * torch.div(torch.pow((1.0-cos_sim), 2), 4) +
                                    (label) * torch.pow(cos_sim * torch.Tensor.float(torch.lt(cos_sim, self.margin)), 2))
        return loss_cos_con

but got the AttributeError listed above. How do I coerce the tensor?

Look at what input are you giving to the loss, what are you giving as output1 and output2

They’re just 1d (cuda) FloatTensors, from the Siamese Network model I defined.

Maybe I got your point, sorry. The result of lt is a byte tensor and represents booleans. One option is to get the tensor from the variable using torch.FloatTensor(lt(…).data)

That yields a TypeError:

TypeError: torch.FloatTensor constructor received an invalid combination of arguments - got (torch.cuda.ByteTensor), but expected one of:

  • no arguments
  • (int …)
    didn’t match because some of the arguments have invalid types: (torch.cuda.ByteTensor)
  • (torch.FloatTensor viewed_tensor)
    didn’t match because some of the arguments have invalid types: (torch.cuda.ByteTensor)
  • (torch.Size size)
    didn’t match because some of the arguments have invalid types: (torch.cuda.ByteTensor)
  • (torch.FloatStorage data)
    didn’t match because some of the arguments have invalid types: (torch.cuda.ByteTensor)
  • (Sequence data)
    didn’t match because some of the arguments have invalid types: (torch.cuda.ByteTensor)

Argh, I’m on my phone and cannot try, what if you do .data.float()? Or instead of float tensor you cast to long tensor? Cannot find info on tensor coercion in the doc

I’m kind of embarrassed to say the answer is just to call .float() directly. As in torch.gt(cos_sim, self.margin).float(). Thanks for your help on this though!

I also realised that the paper has the sign wrong and you want greater than rather than less than for the non-match loss.