I am new to PyTorch, converting from Tensorflow (the static model was driving me crazy). I am a little confused about the parameters fed into the multiclass loss function (NLLLoss) during training. When I run my model I get the error: multi-target not supported at d:\downloads\pytorch-master-1\torch\lib\thnn\generic/ClassNLLCriterion.c:20 when calling the loss function. I am calling the loss function with a mini-batch of predicted labels (64x5) and actual labels (64x5). I get the impression that the loss function works on a sample at a time.
Here is the model section of my code:
model = torch.nn.Sequential(
torch.nn.Linear(num_features, 200),
torch.nn.ReLU(),
torch.nn.Linear(200, num_classes),
torch.nn.LogSoftmax()
)
model.double()
loss_fn = torch.nn.NLLLoss()
learning_rate = self.modelParameters.learning_rate
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range(self.modelParameters.num_epochs):
epoch_cost = 0.
num_minibatches = int(num_samples / self.modelParameters.minibatch_size)
seed = seed + 1
minibatches = self.randomMiniBatch(X_train, Y_train, self.modelParameters.minibatch_size, seed)
for minibatch in minibatches:
(minibatch_X, minibatch_Y) = minibatch
x = Variable(torch.from_numpy(minibatch_X.transpose()), requires_grad=False)
y_actual = Variable(torch.from_numpy(minibatch_Y.transpose().astype(np.int64)), requires_grad=False)
y_predicted = model(x)
loss = loss_fn(y_predicted.detach(), y_actual)
optimizer.zero_grad()
loss.backward()
optimizer.step()
for param in model.parameters():
param.data -= learning_rate * param.grad.data
epoch_cost += loss / num_minibatches
To add to my confusion, I come across a Pytorch multiclass tutorial that iterates over every training example, not just each minibatch. http://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html?highlight=nllloss
Here is the key section of their code:
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
# Usually you want to pass over the training data several times.
# 100 is much bigger than on a real data set, but real datasets have more than
# two instances. Usually, somewhere between 5 and 30 epochs is reasonable.
for epoch in range(100):
for instance, label in data:
# Step 1. Remember that Pytorch accumulates gradients.
# We need to clear them out before each instance
model.zero_grad()
# Step 2. Make our BOW vector and also we must wrap the target in a
# Variable as an integer. For example, if the target is SPANISH, then
# we wrap the integer 0. The loss function then knows that the 0th
# element of the log probabilities is the log probability
# corresponding to SPANISH
bow_vec = autograd.Variable(make_bow_vector(instance, word_to_ix))
target = autograd.Variable(make_target(label, label_to_ix))
# Step 3. Run our forward pass.
log_probs = model(bow_vec)
# Step 4. Compute the loss, gradients, and update the parameters by
# calling optimizer.step()
loss = loss_function(log_probs, target)
loss.backward()
optimizer.step()
So I am confused how to do a basic feed forward sequential NN with multiclass. Suggestions?