Network not learning when using conv2d as output

I’m training a senet50 with a conv2d layer as its output, and I’m computing the loss based on the cosine similarity of pairs of embeddings, conditional on whether they belong to the same class or not. The problem is that my network is not learning at all. I have even tested it with a small batch of 20 images to check whether it was able to overfit it, but it did not. The same loss function works well with other architectures (where the output is a fully connected layer). I was wondering whether I can make it work without replacing the last conv2d layer of my network.

This is how my conv2d is used:

self.last_layer = self.conv2d(2, in_channels=2048, out_channels=256, kernel_size=(1,1), stride = (1,1), groups=1, bias=False)

Then my loss function is as follows:

def my_loss(margin, f_embed, s_embed, label):
** f_embed = f_embed.view(f_embed.shape[0], f_embed.shape[1])**
** s_embed = s_embed.view(s_embed.shape[0], s_embed.shape[1])**
** sim = CosineSimilarity().forward(f_embed, s_embed)**
** loss = (label * (1 - sim)) + ( (1-label) * torch.clamp(sim-margin,min=0.0 )**
** return loss**

The reason why I change the shape of the embeddings is because without doing it their shapes would be [batch_size, 256, 1, 1], which makes me get weird results when I compute the loss. With the shape transformation, my embeddings have instead the shape [batch_size, 256].

As suggested in, I tried doing

f_embed = Variable(f_embed.view(f_embed.shape[0], f_embed.shape[1]), requires_grad=True)
s_embed = Variable(s_embed.view(s_embed.shape[0], s_embed.shape[1]), requires_grad=True)

but still my network is not learning

In fact, the issue was solved by performing a deepcopy of my original model, as shown below.

    old_model =
    state_dict = torch.load(model_state_dict_path)

    model = copy.deepcopy(old_model)

Anyways, I don’t know yet why these steps were necessary in order to make my model change its weights and be able to learn