How to create a Siamese network

I’m trying to send 2 images through a siamese network. It looks like it’s as easy as writing a for-loop, calling forward for each leg of the siamese net. Is this correct? I’ve written a baby siamese net below:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import pdb

class BabyNet(torch.nn.Module):
    def __init__(self):
        super(BabyNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1), bias=False)
        self.pool = nn.AvgPool2d((112,112), stride=(112,112))
        self.softmax = nn.Softmax()

    def forward(self, x, leg=0):
        leg = str(leg)
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.pool(x)
        x = self.softmax(x)

        return x

grads = {} # cache the gradients
def save_grad(name):
    def hook(grad):
        grads[name] = grad
    return hook

if __name__=='__main__':
    model = BabyNet()
    lr = 1e-3
    # Loss and Optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), momentum=0.9, lr=lr)
    for _ in range(100):
        out = []

        label_wts = torch.Tensor([1, 1, 1, 1])
        labels = Variable(torch.multinomial(label_wts, 2).long())
        labels = labels.cuda()
        for i_leg in range(2):
            X = Variable(torch.zeros(1,3,224,224).uniform_(0,255).cuda())
            Y = model.forward(X, leg=i_leg)
        preds =, dim=0)
        loss = criterion(preds, labels)
        print('Gradient comparison...')
        print('conv1: {}'.format(torch.equal(grads['conv1_0'], grads['conv1_1'])))
        print('bn1: {}'.format(torch.equal(grads['bn1_0'], grads['bn1_1'])))
        print('relu: {}'.format(torch.equal(grads['relu_0'], grads['relu_1'])))
        print('conv2: {}'.format(torch.equal(grads['conv2_0'], grads['conv2_1'])))
        print('pool: {}'.format(torch.equal(grads['pool_0'], grads['pool_1'])))
        print('softmax: {}'.format(torch.equal(grads['softmax_0'], grads['softmax_1'])))
        print(grads['out_0'], grads['out_1'])
        for name, parameter in model.named_parameters():
            grad_of_param[name] = parameter.grad

The code looks good. I assume you would like to calculate the similarity between two images?
Depending on the dataset you are using you could also get two images simultaneously and feed them both to your net. In the __forward__ method you could feed them one after the other.
But I think it’s a matter of taste.

However, you should call the model directly (model(input)) with your input instead of model.forward(input). This makes sure that all hooks are properly registered.
Just out of curiosity, did the hooks work in your case?

1 Like

Thanks for the confirmation/advice @ptrblck. I’m not calculating a similarity, but am calculating a correlation between intermediate feature maps. And, yes, the hooks appeared to work. I was concerned that gradients in one leg were being overwritten, but that doesn’t seem to be the case.

All gradients are accumulated in each forward pass as long as you don’t zero them with the optimizer or the model (optimizer.zero_grad(), model.zero_grad())

1 Like