Hi,
First of all, my apology if my question seems trivial and my English is not good enough.
As given in the name classification tutorial here NLP From Scratch: Classifying Names with a Character-Level RNN — PyTorch Tutorials 2.1.1+cu121 documentation
My first question is, when calling the loss function criterion(output, category_tensor) in
def train(category_tensor, line_tensor):
hidden = rnn.initHidden()
rnn.zero_grad()
for i in range(line_tensor.size()[0]):
output, hidden = rnn(line_tensor[i], hidden)
loss = criterion(output, category_tensor)
loss.backward()
# Add parameters' gradients to their values, multiplied by learning rate
for p in rnn.parameters():
p.data.add_(-learning_rate, p.grad.data)
return output, loss.item()
as I checked, the dimension of output is 18x1 (because there are 18 classes) and category_tensor is only a single valued tensor containing the label of the class in integer.
Is this the only valid parameter for the call, or can I pass the predicted vector for category_tensor? I couldn’t really understand the documentation that I found here (torch.nn — PyTorch 2.1 documentation). Possibly I understand it wrong, I tried this modification and doesn’t seem to work.
def randomTrainingExample():
category = randomChoice(all_categories)
line = randomChoice(category_lines[category])
# attempting to use one hot encoded value like SoftMax, but with the log value
category_tensor = nn.LogSoftmax()(torch.tensor((np.array(all_categories) == category).astype(np.int), dtype=torch.float))
line_tensor = lineToTensor(line)
return category, line, category_tensor, line_tensor
Secondly, in the section “turning names into tensor”, is the batch dimension in the second dimension?
To make a word we join a bunch of those into a 2D matrix
<line_length x 1 x n_letters>
.That extra 1 dimension is because PyTorch assumes everything is in batches - we’re just using a batch size of 1 here.
If I’m not mistaken, for image data, the batch dimension is in the first dimension, and I think this is more intuitive:
size_of_training_set = batch_size x number_of_channel x image_width x image_height
Can we define the batch size in the first dimension?
Thank you very much.