Siamese net in libtorch

Hello,
I use a simple net for a siamese network.

torch::Tensor encConv1 = conv1->forward(x);                 // CINV1_MASK_COUNT x 256x256
    torch::Tensor encRelu1 = torch::leaky_relu(encConv1);       // CINV1_MASK_COUNT x 256x256
    torch::Tensor encBN1 = torch::batch_norm(encRelu1, bn1W, bnBias1W, bnmean1W, bnvar1W, true, 0.9, 0.001, true);

    torch::Tensor encP1 = torch::max_pool2d(encBN1, 2);         // CINV1_MASK_COUNT x 128x128

    // zweiter Schritt
    torch::Tensor encConv2 = conv2->forward(encP1);             // CINV2_MASK_COUNT x 128x128
    torch::Tensor encRelu2 = torch::leaky_relu(encConv2);       // CINV2_MASK_COUNT x 128x128
    
    torch::Tensor encBN2 = torch::batch_norm(encRelu2, bn2W, bnBias2W, bnmean2W, bnvar2W, true, 0.9, 0.001, true);
    torch::Tensor encP2 = torch::max_pool2d(encRelu2, 2);       // CINV2_MASK_COUNT x 64x64

    // dritter Schritt
    torch::Tensor encConv3 = conv3->forward(encP2);          // CINV3_MASK_COUNT x 64x64
    torch::Tensor encRelu3 = torch::leaky_relu(encConv3);                     // CINV3_MASK_COUNT x 64x64
    
    torch::Tensor encBN3 = torch::batch_norm(encRelu3, bn3W, bnBias3W, bnmean3W, bnvar3W, true, 0.9, 0.001, true);
    torch::Tensor encP3 = torch::max_pool2d(encBN3, 2);    // CINV3_MASK_COUNT x 32x32

    // vierter Schritt, latent space, kein weiteres Pooling
    torch::Tensor encConv4 = conv4->forward(encP3);             // CINV4_MASK_COUNT x 32x32
    torch::Tensor encRelu4 = torch::leaky_relu(encConv4);       // CINV4_MASK_COUNT x 32x32
    
    torch::Tensor encBN4 = torch::batch_norm(encRelu4, bn4W, bnBias4W, bnmean4W, bnvar4W, true, 0.9, 0.001, true);
    torch::Tensor encP4 = torch::max_pool2d(encBN4, 2);

    // fünfter, 16x16
    torch::Tensor encConv5 = conv5->forward(encP4);             // CINV4_MASK_COUNT x 32x32
    torch::Tensor encRelu5 = torch::leaky_relu(encConv5);       // CINV4_MASK_COUNT x 32x32
    torch::Tensor encBN5 = torch::batch_norm(encRelu5, bn5W, bnBias5W, bnmean5W, bnvar5W, true, 0.9, 0.001, true);
    
    encBN5 = encBN5.view({ -1, 16 * 16 * CINV3_MASK_COUNT });
    encBN5 = fc1->forward(encBN5);

    encBN5 = fc2->forward(encBN5);

Images, 256x256, are reduced to 16x16 and given to fully connected layers, 512 and 64 neurons.

The training loads batches with size 32, 1. image 32 times and 2., 3. 4… After wrap around 2. image 32 times and 3., 4., 5… and so on. Every image is an anchor. Only positive images.
Contrastive loss to compare the output of.

auto output1 = net->forward(input);
            auto output2 = net->forward(target);

            //auto loss = criterion(output, input);// target);

            torch::Tensor diff = output1 - output2;
            torch::Tensor dist_sq = sum(pow(diff, 2), 1);
            torch::Tensor loss = sqrt(dist_sq);

            try
            {
                loss.mean().backward();
            }
            catch (const c10::Error& e)
            {
                std::cout << e.msg() << std::endl;
            }

            optimizer.step();    // Does the update

Because the vector loss cannot be used, mean() is called before backward().

Did I understand siamese networks correctly? It is only a first try, I want to use SIMSIAM, which is built to use only positives.

Many thanks for your help.