I’m trying to grade the similarity of two text inputs. I have approached the problem by creating a LSTM network which takes as input the text of one sample. I then combine the LSTM outputs of the two texts, and pass these on to fully connected layers - the last layer having output size of 1, which is the similarity. When I train the model, the loss does somewhat decrease, but in validation, the model just outputs the same nearly identical value each time. I’ve tried several learning rates ranging from 10^-5 to 10^-1
I’m using one-hot vectors to encode the data. Each text input is padded to be of equal length, and thus the input shape is (batch_size=1, seq_len=text_length, input_size=one_hot_vectore_length).
I’m quite convinced the problem is come clear mistake in my model definition / forward function. If there’s any insights or mistakes you could point out, it would be great.
Also overall feedback and guidance on the approach to modelling the problem and designing the model are greatly appreciated, since I’m very much a beginner with PyTorch and LSTMs.
Thanks!
Below is the code for the model
class Model( nn.Module ):
def __init__(self, input_size, hidden_size, num_layers, output_size, input_size_2, batch_size):
super().__init__()
self.input_size = input_size
self.output_size = output_size
self.hidden_size = hidden_size
self.num_layers = num_layers
self.input_size_2 = input_size_2
self.batch_size = batch_size
#Layers for analysing model - used for one text input
self.lstm = nn.LSTM( input_size, hidden_size, num_layers )
self.fc1 = nn.Linear(hidden_size, output_size)
#Layers for comparing model - input is concatenated outputs of two text-inputs passed to lstm & fc1
self.fc2 = nn.Linear(input_size_2, 512)
self.fc3 = nn.Linear(512, 1) # 1 output - the similarity
def forward(self, inputs ):
i1, i2 = inputs # Two text inputs
x1, hidden1 = i1
x2, hidden2 = i2
#Pass first text-input
x1, (hidden1, cell1) = self.lstm(x1, hidden1)
x1 = hidden1.view(-1)
x1 = self.fc1(x1)
#Pass second text-input
x2, (hidden2, cell2) = self.lstm(x2, hidden2)
x2 = hidden2.view(-1)
x2 = self.fc1(x2)
#Calculate similarity based on both texts' outputs
outs = torch.cat( ( x1, x2 ) )
outs = self.fc2(outs)
outs = self.fc3(outs)
return outs
And below is the training loop
net = Model( input_size_1, hidden_size, n_layers, output_size, input_size_2, batch_size )
optimizer = torch.optim.Adam( net.parameters(), lr=lr )
loss_function = nn.L1Loss()
def train( n_epochs ):
tX, ty = prepare_data( train_X, train_y, is_train=True )
vX, vy = prepare_data( valid_X, valid_y )
for i in range(n_epochs):
for count, x in enumerate(tX):
optimizer.zero_grad()
y = ty[count]
out = net(x)
loss = loss_function( out, y )
loss.backward()
optimizer.step()
And finally the validation
with torch.no_grad():
for count, x in enumerate(vX):
y = vy[count]
out = net(x) # Here, output is identical for every validation sample
loss = loss_function( out, y )