Image Retrieval

DimTrigkakis · May 26, 2017, 7:58am

Hello everyone,

I was wondering how to approach the problem of image retrieval. There are a lot of loss functions that relate to positive and negative examples for every anchor example in the training set, and while it is easy to setup triplets of samples (anchor, positive , negative) I do not understand which loss function would be appropriate to be able to retrieve images similar to the original.

As a sidenote, it would be best if the same model could be used on all 3 images without tripling the memory requirements, in case that is possible.

Any ideas?

arnowaczynski · May 26, 2017, 1:58pm

I am working on very similar problem currently. You can use triplet margin loss: http://pytorch.org/docs/nn.html#tripletmarginloss. It works out of the box pretty well for me with thousands of classes and small number of examples per class. One network is used to compute feature vectors for anchor, positive and negative example. Then you split the output into 3 matrices and pass them into loss function.

DimTrigkakis · May 30, 2017, 5:22am

Thank you for your answer!

The network can use loss defined be the outputs only once, so it makes sense that during training it will be able to only have the current model in gpu memory. However, running the data through the model naively overflows the memory, meaning that it creates three copies of the network. How would one tell pytorch to only load one example (anchor, positive or negative) at a time and then produce loss by only having one copy of the model in memory?

data, target, data_pos, target_pos, data_neg, target_neg = data.cuda(), target.cuda(), data_pos.cuda(), target_pos.cuda(), data_neg.cuda(), target_neg.cuda()
data, target, data_pos, target_pos, data_neg, target_neg = Variable(data), Variable(target), Variable(data_pos), Variable(target_pos), Variable(data_neg), Variable(target_neg)

optimizer.zero_grad()
output = model(data)
output_pos = model(data_pos)
output_neg = model(data_neg)

arnowaczynski · June 4, 2017, 2:39pm

The answer is: create batch of your input data and split them after computing output from the model.
Example (assuming that anchor, positive, negative are tensors in CHW format):

anchor, positive, negative = get_data()
x = Variable(torch.stack([anchor, positive, negative],0))
output = model(x)
out_anchor = output[torch.LongTensor([0])]
out_positive = output[torch.LongTensor([1])]
out_negative = output[torch.LongTensor([2])]

jdhao · October 27, 2017, 3:47pm

@DimTrigkakis, I think another way is to make your network accept anchor, positive and negative image at once, inside
the network’s, you define a forward_once() method to compute the image feature, in your forward() method, you run forward_once() method 3 times with anchor, positive and negative images respectively and return their features. You can also find more info here.