I’m working on a language modeling problem with PyTorch. I have some following dead ends stuck points.
- I’m building a custom RNN given below. What I’ve found from literature is converted to code here. But in official tutorials of PyTorch of RNNs don’t use
tanh
orhidden state
as input for creatingy or output
. Why such an implementation?
class DetailedRNN(nn.Module):
"""
PyTorch give us the freedom of creating the custom models we need to define,
we are going to create our RNN cell rather than using inbuilt RNN cell of PyTorch
"""
def __init__(self, input_size, hidden_size, output_size):
super(DetailedRNN, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.tanh = nn.Tanh()
self.i2h = nn.Linear(self.input_size + self.hidden_size, self.hidden_size)
self.i2o = nn.Linear(self.hidden_size, self.output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden_layer = self.i2h(combined)
hidden_layer = self.tanh(hidden_layer)
output = self.i2o(hidden_layer)
output = self.softmax(output)
return output, hidden_layer
def init_hidden(self):
"""
Initialize hidden states and cell states
"""
return torch.zeros(1, hidden_size)
- I’m using a character level language model. I can’t see any significant level of loss reduction as the iterations pass by. The loss is jumbling up and down and not as typical loss curve.
for epoch in range(num_epochs):
random_lines = randomChunkGen(lines)
num_steps = len(random_lines) // seq_length
random_lines = random_lines[:num_steps * seq_length+1]
for i in range(0, num_steps * seq_length, seq_length):
# Get sequence length inputs and targets
input_line = random_lines[i:i+seq_length]
inputs = lineToInputTensor(input_line, vocab_size).to(device)
targets_line = random_lines[(i+1):(i+1)+seq_length]
targets = lineToTargetTensor(targets_line).to(device)
loss = 0
hidden_state = model.init_hidden().to(device)
optimizer.zero_grad()
for idx in range(len(input_line)):
# Forward pass
outputs, hidden_state = model(inputs[idx], hidden_state)
loss += criterion(outputs, targets[idx])
# Backward and optimize
loss_list.append(float(loss))
loss.backward(retain_graph=True)
optimizer.step()
-
Why
CrossEntropyLoss
is adamant ofLongTensor
target? -
I’m using this approach for training. I take a random chunk from the corpus. For an epoch, I traverse through this chunk, with a step size of sequence length. The loss is average loss after an entire sequence. I’ve learned that an epoch is meant to be traveling through the entire dataset once. But I switched to this model since I’ve seen official implementations like this. What is right/wrong here?
All the experts out here, I request great help from you guys