I have implemented a Siamese network for text similarity. Here is something I observed. When I feed the two sequence batches (one batch of left sequences and another batch of right sequences in separate autograd vars) separately to the LSTM and then compute similarity on the last hidden state of the output, the model works just fine. But If I feed in both the text sequences as a single input batch and then use alternative indexing to separate out the first and second set of sequences (even indices are the left sequence and the odd indices are the right sequence) and then compute similarity, the model doesn’t converge and the outputs are random. Am I somehow losing the gradients when I am slicing the hidden state variable?
What version are you running on?
Bro , i am also trying to implement Siamses net for face images. Now , Once i get the feature vectors of the images, after passing the images thriough the CNN, do i have to take the sigmoid of the feature vectors, in order to bound the vector? and how do i chose the value of margin in my Triplet Loss? Pls help if u also had to face similar challenges ! Thanks