ArcFace loss error

My ArcFace loss:

import math

class ArcFaceLoss(nn.Module):
    def __init__(self, num_classes, embedding_size=64, margin=0.5, scale=64):
        super(ArcFaceLoss, self).__init__()
        self.num_classes = num_classes
        self.embedding_size = embedding_size
        self.margin = margin
        self.scale = scale
        self.cos_m = math.cos(margin)
        self.sin_m = math.sin(margin)
        self.threshold = math.cos(math.pi - margin)
        self.mm = self.sin_m * margin
        self.weight = nn.Parameter(torch.FloatTensor(num_classes, embedding_size))
        nn.init.xavier_uniform_(self.weight)

    def forward(self, x, labels):
        cosine = F.linear(F.normalize(x), F.normalize(self.weight))
        sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
        phi = cosine * self.cos_m - sine * self.sin_m
        phi = torch.where(cosine > self.threshold, phi, cosine - self.mm)
        one_hot = torch.zeros(cosine.size(), device=x.device)
        one_hot.scatter_(1, labels.view(-1, 1).long(), 1)
        output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
        output *= self.scale
        return output

Getting this error:

Input In [89], in ArcFaceLoss.forward(self, x, labels)
17 def forward(self, x, labels):
—> 18 cosine = F.linear(F.normalize(x), F.normalize(self.weight))
19 sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
20 phi = cosine * self.cos_m - sine * self.sin_m

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x10171 and 64x10171)

mat1 and mat2 have same shape. If I use transpose it work, but I dont know is it right and after this I get this error:

Input In [136], in ArcFaceLoss.forward(self, x, labels)
21 phi = torch.where(cosine > self.threshold, phi, cosine - self.mm)
22 one_hot = torch.zeros(cosine.size(), device=x.device)
—> 23 one_hot.scatter_(1, labels.view(-1, 1).long(), 1)
24 output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
25 output *= self.scale

RuntimeError: index 4601 is out of bounds for dimension 1 with size 64
I know how to fix it, but i’m afraid that it’s somehow ruind my ArcFace loss.
When writing the function I relied on this notebook pytorch-losses

A matrix multiplication expects inputs as [M, K] and [K, N] so transposing should be the correct approach.

Could you describe this concern a bit more, please?
The error is raised since you are trying to use an out-of-bounds index in the scatter_ operation. You explained you would know how to fix it (I assume by either increasing the one_hot size of making sure only valid indices are used, but I don’t understand how it could invalidate the loss.

I got 10171 different people. Of each of them I have set of photo. I need to train my model to recognize faces and get embedding feature vector.

My resnet50 last layer has size of 10171. Each point for each class. Embedding vector size equals 256.

I use this ArcFace loss (I commented size of each variable):

class ArcFaceLoss(nn.Module):
    def __init__(self, num_classes, embedding_size=256, margin=0.5, scale=64):
        super(ArcFaceLoss, self).__init__()
        self.num_classes = num_classes
        self.embedding_size = embedding_size
        self.margin = margin
        self.scale = scale
        self.cos_m = math.cos(margin)
        self.sin_m = math.sin(margin)
        self.threshold = math.cos(math.pi - margin)
        self.mm = self.sin_m * margin
        self.weight = nn.Parameter(torch.FloatTensor(num_classes, embedding_size))
        nn.init.xavier_uniform_(self.weight)

    def forward(self, x, labels):
        # labels.shape = torch.Size([64])
        # F.normalize(x).shape = torch.Size([64, 10171])
        # F.normalize(self.weight).T.shape = torch.Size([256, 10171])
        cosine = F.linear(F.normalize(x), F.normalize(self.weight).T)
        # cosine.shape = torch.Size([64, 256])
        sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
        # sine.shape = torch.Size([64, 256])
        phi = cosine * self.cos_m - sine * self.sin_m
        phi = torch.where(cosine > self.threshold, phi, cosine - self.mm)
        # phi.shape = torch.Size([64, 256])
        one_hot = torch.zeros(cosine.size(), device=x.device)
        # one_hot.shape = torch.Size([64, 256])
        # labels.view(-1, 1).long().shape = torch.Size([64, 1])
        one_hot.scatter_(1, labels.view(-1, 1).long(), 1)
        output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
        output *= self.scale
        return output

The problem is that in the example that I threw off in the previous post, and judging by the arcface loss articles, we create a one-hot encoded vector based on cosine. But the number of classes significantly exceeds the size of one-hot.

Input In [31], in ArcFaceLoss.forward(self, x, labels)
29 print(one_hot.shape)
30 print(labels.view(-1, 1).long().shape)
—> 31 one_hot.scatter_(1, labels.view(-1, 1).long(), 1)
32 output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
33 output *= self.scale

RuntimeError: index 4458 is out of bounds for dimension 1 with size 256

I guess, that in this example of implementation ArcFace loss somehow didn’t used transpose operation, so this code doesnt raise error of mismatch of size of one-hot and number of classes.

So my question is how can I should implement this function right? Maby there is allredi implemented arcface loss in pytorch that I can’t find?

Yes, since the linked example initialized the self.weight as:

self.weight = nn.Parameter(torch.FloatTensor(out_features, in_features))

while you are using self.weight = nn.Parameter(torch.FloatTensor(num_classes, embedding_size)) where num_classes corresponds to the in_features and embedding_size to the out_features.

I don’t know what the label represents but since it’s used to index the one_hot tensor, which has the size of the embedding_size I would assume label contains some indices of this embedding dimension?

I understand where am I was wrong. I should make number of classes equal with number of embedding size. I guesse, I just did not fully understand the principle of arcface loss. Sorry for wasting time and thanks for help!