RuntimeError: multi-target not supported at /Users/soumith/miniconda2/conda-bld/pytorch_1532623076075/work/aten/src/THNN/generic/ClassNLLCriterion.c:21

Steven_Chan · December 2, 2018, 12:18pm

I am working on a video animation project using PyTorch. My dataset contains 3904x60 mfcc audio features(input) and corresponding 3904x3 video features(output). The goal is to train a neural network model such that given an unknown audio feature, the model maps it into its corresponding video feature. In other words, the neural network performs a 60 to 3 feature mapping. I have already built the neural network following this tutorial and my CNN architecture looks like:

class ConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Sequential(
            nn.Conv1d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv1d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=2, stride=2))
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(15 * 64, 1000)
        self.fc2 = nn.Linear(1000, 3)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.drop_out(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

and my training code looks like:

model = ConvNet()

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)


for epoch in range(num_epochs):
    for i, (a, v) in enumerate(train_loader):
        # Run the forward pass
        a = a.float()
        v = v.long()
        outputs = model(a.view(a.size(0),1,a.size(1)))
        loss = criterion(outputs, v)
        loss_list.append(loss.item())

        # Backprop and perform Adam optimisation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Track the accuracy
        total = labels.size(0)
        _, predicted = torch.max(outputs.data, 1)
        correct = (predicted == labels).sum().item()
        acc_list.append(correct / total)

        if (i + 1) % 100 == 0:
            print('Epoch[{}/{}],Step[{}/{}],Loss{:.4f},Accuracy{:.2f}%'
              .format(epoch + 1, num_epochs, i + 1, total_step, loss.item(),
                      (correct / total) * 100))

But as I ran the code I received an error:

—> 13 loss = criterion(outputs, v)
RuntimeError: multi-target not supported at /Users/soumith/miniconda2/conda-bld/pytorch_1532623076075/work/aten/src/THNN/generic/ClassNLLCriterion.c:21

I defined the batch size to be 4 so each a and v in the iteration should be a 4 by 60 tensor and a 4 by 3 tensor, respectively. One solution suggests that I should reshape the tensor into a categorical vector. However, this solution doesn’t actually apply to this case since I want the 3 output feature integrated as a whole instead of treating each feature as an independent label. How do I solve this problem?

ptrblck · December 2, 2018, 12:22pm

If you are using nn.CrossEntropyLoss or nn.NLLLoss your target should contain the class indices as your link suggests.
However, currently your target seems to contain some kind of video features.
Could you explain a bit more about the values in the target?

E.g. if you are dealing with float features, you might want to use nn.MSELoss, which expects the output and target to have the same shape.

EDIT: It looks like you are casting the target to torch.long, which indicates some kind of class indices?
Could you post an example target with values?

Steven_Chan · December 2, 2018, 1:51pm

Thanks for your reply. By changing CrossEntropyLoss to MSELoss the problem was solved. For your interest, I can explain more details that I didn’t specify clearly in the original post. Essentially, the project tries to map some words that a person speaks to a video of the person moving his lips. So the audio features are MFCC of the audio signal, while the video features are parameters that measure how the lip on each frame differs from the original image(a closed mouth) so I can create different images of the lip to make a video. I converted the target to torch.long because the CrossEntropyLoss requires the argument to be LongTensor otherwise there is an error. By changing the criterion to MSELoss I don’t need to worry about it. Really appreciate your answer!

Just as a follow-up, since I am using MSE loss to measure the accuracy, I can no longer calculate the accuracy by dividing total samples by correct predictions. Is there a reasonable metric/function to measure the performance of the model when the loss function is MSE Loss?