I’m trying to grade the similarity of two text inputs. I have approached the problem by creating a LSTM network which takes as input the text of one sample. I then combine the LSTM outputs of the two texts, and pass these on to fully connected layers - the last layer having output size of 1, which is the similarity. When I train the model, the loss does somewhat decrease, but in validation, the model just outputs the same nearly identical value each time. I’ve tried several learning rates ranging from 10^-5 to 10^-1
I’m using one-hot vectors to encode the data. Each text input is padded to be of equal length, and thus the input shape is (batch_size=1, seq_len=text_length, input_size=one_hot_vectore_length).
I’m quite convinced the problem is come clear mistake in my model definition / forward function. If there’s any insights or mistakes you could point out, it would be great.
Also overall feedback and guidance on the approach to modelling the problem and designing the model are greatly appreciated, since I’m very much a beginner with PyTorch and LSTMs.
Below is the code for the model
class Model( nn.Module ): def __init__(self, input_size, hidden_size, num_layers, output_size, input_size_2, batch_size): super().__init__() self.input_size = input_size self.output_size = output_size self.hidden_size = hidden_size self.num_layers = num_layers self.input_size_2 = input_size_2 self.batch_size = batch_size #Layers for analysing model - used for one text input self.lstm = nn.LSTM( input_size, hidden_size, num_layers ) self.fc1 = nn.Linear(hidden_size, output_size) #Layers for comparing model - input is concatenated outputs of two text-inputs passed to lstm & fc1 self.fc2 = nn.Linear(input_size_2, 512) self.fc3 = nn.Linear(512, 1) # 1 output - the similarity def forward(self, inputs ): i1, i2 = inputs # Two text inputs x1, hidden1 = i1 x2, hidden2 = i2 #Pass first text-input x1, (hidden1, cell1) = self.lstm(x1, hidden1) x1 = hidden1.view(-1) x1 = self.fc1(x1) #Pass second text-input x2, (hidden2, cell2) = self.lstm(x2, hidden2) x2 = hidden2.view(-1) x2 = self.fc1(x2) #Calculate similarity based on both texts' outputs outs = torch.cat( ( x1, x2 ) ) outs = self.fc2(outs) outs = self.fc3(outs) return outs
And below is the training loop
net = Model( input_size_1, hidden_size, n_layers, output_size, input_size_2, batch_size ) optimizer = torch.optim.Adam( net.parameters(), lr=lr ) loss_function = nn.L1Loss() def train( n_epochs ): tX, ty = prepare_data( train_X, train_y, is_train=True ) vX, vy = prepare_data( valid_X, valid_y ) for i in range(n_epochs): for count, x in enumerate(tX): optimizer.zero_grad() y = ty[count] out = net(x) loss = loss_function( out, y ) loss.backward() optimizer.step()
And finally the validation
with torch.no_grad(): for count, x in enumerate(vX): y = vy[count] out = net(x) # Here, output is identical for every validation sample loss = loss_function( out, y )