Multiclass Sequential Loss function confusion

I am new to PyTorch, converting from Tensorflow (the static model was driving me crazy). I am a little confused about the parameters fed into the multiclass loss function (NLLLoss) during training. When I run my model I get the error: multi-target not supported at d:\downloads\pytorch-master-1\torch\lib\thnn\generic/ClassNLLCriterion.c:20 when calling the loss function. I am calling the loss function with a mini-batch of predicted labels (64x5) and actual labels (64x5). I get the impression that the loss function works on a sample at a time.

Here is the model section of my code:

        model = torch.nn.Sequential(
            torch.nn.Linear(num_features, 200),
            torch.nn.ReLU(),
            torch.nn.Linear(200, num_classes),
            torch.nn.LogSoftmax()
        )
        model.double()
        loss_fn = torch.nn.NLLLoss()

        learning_rate = self.modelParameters.learning_rate
        optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

        for epoch in range(self.modelParameters.num_epochs):
            epoch_cost = 0.
            num_minibatches = int(num_samples / self.modelParameters.minibatch_size)
            seed = seed + 1
            minibatches = self.randomMiniBatch(X_train, Y_train, self.modelParameters.minibatch_size, seed)

            for minibatch in minibatches:
                (minibatch_X, minibatch_Y) = minibatch

                x = Variable(torch.from_numpy(minibatch_X.transpose()), requires_grad=False)
                y_actual = Variable(torch.from_numpy(minibatch_Y.transpose().astype(np.int64)), requires_grad=False)
                y_predicted = model(x)

                loss = loss_fn(y_predicted.detach(), y_actual)

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                for param in model.parameters():
                    param.data -= learning_rate * param.grad.data

                epoch_cost += loss / num_minibatches

To add to my confusion, I come across a Pytorch multiclass tutorial that iterates over every training example, not just each minibatch. http://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html?highlight=nllloss

Here is the key section of their code:

loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Usually you want to pass over the training data several times.
# 100 is much bigger than on a real data set, but real datasets have more than
# two instances.  Usually, somewhere between 5 and 30 epochs is reasonable.
for epoch in range(100):
    for instance, label in data:
        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance
        model.zero_grad()

        # Step 2. Make our BOW vector and also we must wrap the target in a
        # Variable as an integer. For example, if the target is SPANISH, then
        # we wrap the integer 0. The loss function then knows that the 0th
        # element of the log probabilities is the log probability
        # corresponding to SPANISH
        bow_vec = autograd.Variable(make_bow_vector(instance, word_to_ix))
        target = autograd.Variable(make_target(label, label_to_ix))

        # Step 3. Run our forward pass.
        log_probs = model(bow_vec)

        # Step 4. Compute the loss, gradients, and update the parameters by
        # calling optimizer.step()
        loss = loss_function(log_probs, target)
        loss.backward()
        optimizer.step()

So I am confused how to do a basic feed forward sequential NN with multiclass. Suggestions?

OK, I think I have it figured out. A couple of things I was doing wrong:

  1. optimizer.step updates the parameters, I do not need to update the parameters manually
  2. the NLLLoss function does work fine on mini-batches. The input to the function is the output from the model (for the mini batch) and the actual labels. This issue was the actual labels array. This must be a 1D tensor array of indexes of the classifications - not an array of boolean 1 and 0. This info is in the documentation.

Anyways, here is my updated code:

   model = torch.nn.Sequential(
            torch.nn.Linear(num_features, 200),
            torch.nn.ReLU(),
            torch.nn.Linear(200, num_classes),
            torch.nn.LogSoftmax()
        )
        model.double()

        loss_function = torch.nn.NLLLoss()

        learning_rate = self.modelParameters.learning_rate
        optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

        for epoch in range(self.modelParameters.num_epochs):

            epoch_cost = 0.
            num_minibatches = int(num_samples / self.modelParameters.minibatch_size)
            seed = seed + 1
            minibatches = self.randomMiniBatch(X_train, Y_train, self.modelParameters.minibatch_size, seed)

            for minibatch in minibatches:
                (minibatch_X, minibatch_Y) = minibatch

                x = Variable(torch.from_numpy(minibatch_X.transpose()), requires_grad=False)
                y_predictions = model(x)

                # the "target" is the actual Y labels rolled into the index values.
                # One column for each training example, the value is the index of the class
                y_actual_class_index  = Variable(torch.from_numpy(minibatch_Y.argmax(axis=0).astype(np.int64)), requires_grad=False)

                loss = loss_function(y_predictions, y_actual_class_index)

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                epoch_cost += loss / num_minibatches
2 Likes