I’m working on a language modeling problem with PyTorch. I have some following dead ends stuck points.
- I’m building a custom RNN given below. What I’ve found from literature is converted to code here. But in official tutorials of PyTorch of RNNs don’t use
hidden stateas input for creating
y or output. Why such an implementation?
class DetailedRNN(nn.Module): """ PyTorch give us the freedom of creating the custom models we need to define, we are going to create our RNN cell rather than using inbuilt RNN cell of PyTorch """ def __init__(self, input_size, hidden_size, output_size): super(DetailedRNN, self).__init__() self.input_size = input_size self.hidden_size = hidden_size self.output_size = output_size self.tanh = nn.Tanh() self.i2h = nn.Linear(self.input_size + self.hidden_size, self.hidden_size) self.i2o = nn.Linear(self.hidden_size, self.output_size) self.softmax = nn.LogSoftmax(dim=1) def forward(self, input, hidden): combined = torch.cat((input, hidden), 1) hidden_layer = self.i2h(combined) hidden_layer = self.tanh(hidden_layer) output = self.i2o(hidden_layer) output = self.softmax(output) return output, hidden_layer def init_hidden(self): """ Initialize hidden states and cell states """ return torch.zeros(1, hidden_size)
- I’m using a character level language model. I can’t see any significant level of loss reduction as the iterations pass by. The loss is jumbling up and down and not as typical loss curve.
for epoch in range(num_epochs): random_lines = randomChunkGen(lines) num_steps = len(random_lines) // seq_length random_lines = random_lines[:num_steps * seq_length+1] for i in range(0, num_steps * seq_length, seq_length): # Get sequence length inputs and targets input_line = random_lines[i:i+seq_length] inputs = lineToInputTensor(input_line, vocab_size).to(device) targets_line = random_lines[(i+1):(i+1)+seq_length] targets = lineToTargetTensor(targets_line).to(device) loss = 0 hidden_state = model.init_hidden().to(device) optimizer.zero_grad() for idx in range(len(input_line)): # Forward pass outputs, hidden_state = model(inputs[idx], hidden_state) loss += criterion(outputs, targets[idx]) # Backward and optimize loss_list.append(float(loss)) loss.backward(retain_graph=True) optimizer.step()
CrossEntropyLossis adamant of
I’m using this approach for training. I take a random chunk from the corpus. For an epoch, I traverse through this chunk, with a step size of sequence length. The loss is average loss after an entire sequence. I’ve learned that an epoch is meant to be traveling through the entire dataset once. But I switched to this model since I’ve seen official implementations like this. What is right/wrong here?
All the experts out here, I request great help from you guys