I am trying to implement recurrence over an arbitrary layer:
Here is the forward function:
def forward(self, sequence, hidden):
i = 0
for x in sequence:
if(i==0):
hidden[i] = x # Initially it is the input
print("Hidden", hidden[i].data)
else:
hidden[i] = hidden[i-1].clone() # otherwise the previous time-step's output
for layer in self.highway_layers: # This the recurrence over layers
hidden[i] = layer(hidden[i].clone())
print("Hidden", hidden[i].data)
i = i +1
return hidden, hidden # since for rnn, hidden and output are the same, if you don't want to do softmax
The hidden values soon explode (they are roughly doubling? ) and finally I get Math overflow error in loss function.
thanks for the reply, I am still stuck at the issue, and the loss is just overflowing.
self.highway_layers is a list of highway layers which are implemented as:
class HighwayLayer(nn.Module):
def __init__(self, input_size, bias=-1):
super(HighwayLayer, self).__init__()
self.plain_layer = nn.Linear(input_size, input_size)
self.transform_layer = nn.Linear(input_size, input_size)
self.transform_layer.bias.data.fill_(bias)
def forward(self, x): # Has to get a hidden state? No.
plain_layer_output = nn.functional.relu(self.plain_layer(x)) # Wanted variable got tensor
transform_layer_output = nn.functional.softmax(self.transform_layer(x))
transform_value = torch.mul(plain_layer_output, transform_layer_output)
carry_value = torch.mul((1 - transform_layer_output), x)
return torch.add(carry_value, carry_value) # This returns the same size as input.
Please take a look, surely I am doing something wrong: I created the layers, checked if dimensions match, addded a hidden variable for recurrence, so it looks fine by me…but it is not working… thanks.
The models I work with on CPU are small recurrent models with < 100,000 parameters, and ~500,000 data samples of ~50 features. I get epochs ranging from 50s to 10 mins depending on the size of my model. That said, I have an old i5-2410M, so nothing will run particularly fast.
thanks for the reply. My model’s is recurring over a layer instead of a rnn/lstm cell, so that explains the abnormal number of parameters and even my batches are taking 15s!!
Also, I see my training loss fluctuating…it is stuck at some value is fluctuating around it after around 200 batches…
well, i haven’t worked on language modeling before…so can’t say… one more thing, I read perplexity = 2 ^ cross_entropy_loss, my loss is around 7 but perplexity is showing ~ 700 (according to the code used in pytorch examples word level language model), --> they are using e instead of 2… i guess this is the standard