Hi,
I’m training a senet50 with a conv2d layer as its output, and I’m computing the loss based on the cosine similarity of pairs of embeddings, conditional on whether they belong to the same class or not. The problem is that my network is not learning at all. I have even tested it with a small batch of 20 images to check whether it was able to overfit it, but it did not. The same loss function works well with other architectures (where the output is a fully connected layer). I was wondering whether I can make it work without replacing the last conv2d layer of my network.
This is how my conv2d is used:
self.last_layer = self.conv2d(2, in_channels=2048, out_channels=256, kernel_size=(1,1), stride = (1,1), groups=1, bias=False)
Then my loss function is as follows:
def my_loss(margin, f_embed, s_embed, label):
** f_embed = f_embed.view(f_embed.shape[0], f_embed.shape[1])**
** s_embed = s_embed.view(s_embed.shape[0], s_embed.shape[1])**
** sim = CosineSimilarity().forward(f_embed, s_embed)**
** loss = (label * (1 - sim)) + ( (1-label) * torch.clamp(sim-margin,min=0.0 )**
** return loss**
The reason why I change the shape of the embeddings is because without doing it their shapes would be [batch_size, 256, 1, 1], which makes me get weird results when I compute the loss. With the shape transformation, my embeddings have instead the shape [batch_size, 256].