RNN Tutorial: NLLLoss call parameter and batch dimension

Hi,

First of all, my apology if my question seems trivial and my English is not good enough.

As given in the name classification tutorial here NLP From Scratch: Classifying Names with a Character-Level RNN — PyTorch Tutorials 2.1.1+cu121 documentation

My first question is, when calling the loss function criterion(output, category_tensor) in

def train(category_tensor, line_tensor):
    hidden = rnn.initHidden()

    rnn.zero_grad()

    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)

    loss = criterion(output, category_tensor)
    loss.backward()

    # Add parameters' gradients to their values, multiplied by learning rate
    for p in rnn.parameters():
        p.data.add_(-learning_rate, p.grad.data)

    return output, loss.item()

as I checked, the dimension of output is 18x1 (because there are 18 classes) and category_tensor is only a single valued tensor containing the label of the class in integer.

Is this the only valid parameter for the call, or can I pass the predicted vector for category_tensor? I couldn’t really understand the documentation that I found here (torch.nn — PyTorch 2.1 documentation). Possibly I understand it wrong, I tried this modification and doesn’t seem to work.

def randomTrainingExample():
    category = randomChoice(all_categories)
    line = randomChoice(category_lines[category])
    # attempting to use one hot encoded value like SoftMax, but with the log value
    category_tensor = nn.LogSoftmax()(torch.tensor((np.array(all_categories) == category).astype(np.int), dtype=torch.float))
    line_tensor = lineToTensor(line)
    return category, line, category_tensor, line_tensor

Secondly, in the section “turning names into tensor”, is the batch dimension in the second dimension?

To make a word we join a bunch of those into a 2D matrix <line_length x 1 x n_letters> .

That extra 1 dimension is because PyTorch assumes everything is in batches - we’re just using a batch size of 1 here.

If I’m not mistaken, for image data, the batch dimension is in the first dimension, and I think this is more intuitive:

size_of_training_set = batch_size x number_of_channel x image_width x image_height

Can we define the batch size in the first dimension?

Thank you very much.