RNN Tutorial: NLLLoss call parameter and batch dimension

kuntoro-adi · January 3, 2019, 6:50am

Hi,

First of all, my apology if my question seems trivial and my English is not good enough.

As given in the name classification tutorial here NLP From Scratch: Classifying Names with a Character-Level RNN — PyTorch Tutorials 2.1.1+cu121 documentation

My first question is, when calling the loss function criterion(output, category_tensor) in

def train(category_tensor, line_tensor):
    hidden = rnn.initHidden()

    rnn.zero_grad()

    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)

    loss = criterion(output, category_tensor)
    loss.backward()

    # Add parameters' gradients to their values, multiplied by learning rate
    for p in rnn.parameters():
        p.data.add_(-learning_rate, p.grad.data)

    return output, loss.item()

as I checked, the dimension of output is 18x1 (because there are 18 classes) and category_tensor is only a single valued tensor containing the label of the class in integer.

Is this the only valid parameter for the call, or can I pass the predicted vector for category_tensor? I couldn’t really understand the documentation that I found here (torch.nn — PyTorch 2.1 documentation). Possibly I understand it wrong, I tried this modification and doesn’t seem to work.

def randomTrainingExample():
    category = randomChoice(all_categories)
    line = randomChoice(category_lines[category])
    # attempting to use one hot encoded value like SoftMax, but with the log value
    category_tensor = nn.LogSoftmax()(torch.tensor((np.array(all_categories) == category).astype(np.int), dtype=torch.float))
    line_tensor = lineToTensor(line)
    return category, line, category_tensor, line_tensor

Secondly, in the section “turning names into tensor”, is the batch dimension in the second dimension?

To make a word we join a bunch of those into a 2D matrix <line_length x 1 x n_letters> .

That extra 1 dimension is because PyTorch assumes everything is in batches - we’re just using a batch size of 1 here.

If I’m not mistaken, for image data, the batch dimension is in the first dimension, and I think this is more intuitive:

size_of_training_set = batch_size x number_of_channel x image_width x image_height

Can we define the batch size in the first dimension?

Thank you very much.