Hey there,
I guess I am still rather inexperienced with PyTorch and this is the first time I am using a sequence data based learning model, i.e. LSTM.
Currently I try to train on a multi-label language task with imbalanced class distribution. I have the following model, where I removed some of the feed forward layers to decrease factors in the chain of gradients.
Since the outputs are extremely weird during inference time (i.e. every prediction is class 1 of 32 and no others), I started to check the layers, esp. the LSTM layer to see if any inconsistencies occur.
First let me share my model-architecture with you.
class Bi_RNN(nn.Module):
""""
Embedding Dim 300
"""
def __init__(self, hidden_dim_lstm, in_2_dim, in_3_dim, in_4_dim, input_dim=300, output_dim=32, num_layers=1, batch_size=1):
super(Bi_RNN, self).__init__()
self.input_dim = input_dim
self.hidden_dim = hidden_dim_lstm*2*num_layers
self.hidden_dim_lstm = hidden_dim_lstm
self.batch_size = batch_size
self.num_layers = num_layers
self.in_2_dim = in_2_dim
self.in_3_dim = in_3_dim
self.in_4_dim = in_4_dim
self.act = nn.PReLU()
# Define the LSTM layer
self.lstm = nn.LSTM(self.input_dim, self.hidden_dim_lstm, self.num_layers, batch_first=True, bidirectional=True)
# Define the FFN
self.linear_layer_1 = nn.Linear(self.hidden_dim, self.in_4_dim)
self.linear_layer_last = nn.Linear(self.in_4_dim, output_dim)
def init_hidden(self):
# This is what we'll initialise our hidden state as
device = next(self.parameters()).device.type
return (torch.zeros(self.num_layers*2, self.batch_size, self.hidden_dim//2).to(device),
torch.zeros(self.num_layers*2, self.batch_size, self.hidden_dim//2).to(device))
def forward(self, input):
lstm_out, self.hidden = self.lstm(input, self.init_hidden())
h_n, c_n = self.hidden
c_n_merged = c_n.reshape(self.batch_size, -1)
layer_1_out = self.act(self.linear_layer_1(c_n_merged))
out = self.linear_layer_last(layer_1_out)
out = torch.sigmoid(out)
return out
This is the model state after training. Consider the following inputs x
each with shape torch.Size([7484, 300])
(it’s actually a batch with torch.Size([64, 7484, 300])
).
x_1 looks like this
tensor([[-0.1113, 0.1436, 0.1895, ..., 0.0342, 0.1602, -0.2500],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[-0.2910, 0.1787, 0.0500, ..., -0.0228, 0.1177, 0.3535],
...
x_2 looks like this
tensor([[ 0.1250, 0.0266, -0.0272, ..., -0.0864, -0.1621, -0.0337],
[ 0.0070, -0.0732, 0.1719, ..., 0.0112, 0.1641, 0.1069],
[ 0.0762, 0.0820, -0.1118, ..., -0.0942, -0.0684, 0.2266],
...
So when getting the LSTM out of the model and passing these vectors (as a batch) into the LSTM, the hidden states c_n, h_n
with shape torch.Size([800])
are identical (most of them are for the complete batch)
The c_n
look like this
tensor([[-0.1549, 0.0412, -0.0041, ..., -0.1105, -0.0761, 0.0696],
[-0.1549, 0.0412, -0.0041, ..., -0.1105, -0.0761, 0.0696]],
and the h_n
look like this
tensor([[-0.0746, 0.0206, -0.0020, ..., -0.0547, -0.0372, 0.0344],
[-0.0746, 0.0206, -0.0020, ..., -0.0547, -0.0372, 0.0344]],
I don’t understand how this is happening and I would be very grateful if somebody can point out what my misconception is.
I am sorry in advance if there are any rather stupid mistakes.
Thanks