Multihead architecture for reg. and clf. task :UserWarning: Using a target size (torch.Size([400])) that is different to the input size (torch.Size([400, 1]))

I’m currently switching from tensorflow to pytorch and facing the warning UserWarning: Using a target size (torch.Size([400])) that is different to the input size (torch.Size([400, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size
I came across that unsqueeze(1) on my target could help to resolve my problem, however, I do so obtain problems in regard of the multitarget which results from the shape my loss function (crossentropy) expects.

Here is a minimal example to my code:

import torch
import torch.nn as nn
import torch.optim as optim 
from torch.utils.data import Dataset, DataLoader, TensorDataset
import torch.nn.functional as F


X1 = torch.randn(400, 1, 9999)
X2 = torch.randn((400,1, 9999))
aux1 = torch.randn(400,1)
aux2 = torch.randn(400,1)
aux3 = torch.randn(400,1)
y1 = torch.rand(400,)
y2 = torch.rand(400,)
y3 = torch.rand(400,)

import torch
import torch.nn as nn
import torch.optim as optim 
from torch.utils.data import Dataset, DataLoader, TensorDataset
import torch.nn.functional as F



# In[18]:


class MultiTaskDataset:
    def __init__(self, 
                 amplitude, 
                 phase, 
                 weight,
                 temperature,
                 humidity,
                 shelf_life_clf,
                 shelf_life_pred,
                 thickness_pred
                 ):
        self.amplitude = amplitude
        self.phase = phase
        self.weight = weight
        self.temperature = temperature
        self.humidity = humidity
        self.shelf_life_clf = shelf_life_clf
        self.shelf_life_pred = shelf_life_pred
        self.thickness_pred = thickness_pred
        

    def __len__(self):
        return self.amplitude.shape[0]

    def __getitem__(self, idx):
        #inputs
        amplitude = self.amplitude[idx]
        phase = self.phase[idx]
        weight = self.weight[idx]
        temperature = self.temperature[idx]
        humidity = self.humidity[idx]
        
        #outputs
        shelf_life_clf = self.shelf_life_clf[idx]
        shelf_life_reg = self.shelf_life_pred[idx]
        thickness_pred = self.thickness_pred[idx]
        
        return ([torch.tensor(amplitude, dtype=torch.float32),
                torch.tensor(phase, dtype=torch.float32),
                torch.tensor(weight, dtype=torch.float32),
                torch.tensor(temperature, dtype=torch.float32),
                torch.tensor(humidity, dtype=torch.float32)],
                [torch.tensor(shelf_life_clf, dtype=torch.long),
                torch.tensor(shelf_life_reg, dtype=torch.float32),
                torch.tensor(thickness_pred, dtype=torch.float32)])


# In[19]:


# train loader
dataset = MultiTaskDataset(X1, X2, aux1, aux2, aux3, 
                           y1,y2,y3)
train_loader = DataLoader(dataset, batch_size=512, shuffle=True, num_workers=0)

# test loader



# In[20]:


class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.features_amp = nn.Sequential(
            nn.LazyConv1d(1, 3, 1),
        )
        self.features_phase = nn.Sequential(
            nn.LazyConv1d(1, 3, 1),
        )
        
        
        self.backbone1 = nn.Sequential(
            nn.LazyConv1d(64,3,1),
            nn.LazyConv1d(64,3,1),
            nn.AvgPool1d(3),
            nn.Dropout(0.25),
        )
        
        self.backbone2 = nn.Sequential(
            nn.Conv1d(64, 32,3,1),
            nn.Conv1d(32, 32,3,1),
            nn.AvgPool1d(3),
            nn.Dropout(0.25),
        )
        
        self.backbone3 = nn.Sequential(
            nn.Conv1d(32, 16,3,1),
            nn.Conv1d(16, 16,3,1),
            nn.AvgPool1d(3),
            nn.Dropout(0.25),
        )
        
        
        self.classifier = nn.LazyLinear(2)
        self.shelf_life_reg = nn.LazyLinear(1)
        self.thickness_reg = nn.LazyLinear(1)

    def forward(self, x1, x2, aux1, aux2, aux3):
        x1 = self.features_amp(x1)
        x2 = self.features_phase(x2)
                                                                                                                                                                                                                                                
        x1 = x1.view(x1.size(0),-1)                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                     
        x2 = x2.view(x2.size(0),-1)                                                                                                                                                                                                                                                    
        x = torch.cat((x1, x2), dim=-1)
        print(x.size())

        x = x.unsqueeze(1)
        print(x.size())
        x = self.backbone1(x)
        print(x.size())

        x = torch.flatten(x, start_dim=1, end_dim=-1)

 
        x = torch.cat([x, aux1, aux2, aux3], dim=-1)
        
        
        shelf_life_clf = self.classifier(x)     
        shelf_life_reg = self.shelf_life_reg(x)
        thickness_reg = self.thickness_reg(x)
        return (shelf_life_clf,
                shelf_life_reg,
                thickness_reg)


model = MyModel()

optimizer = optim.Adam(model.parameters(), lr=0.003)

criterion1 = nn.CrossEntropyLoss()
criterion2 = nn.MSELoss()
criterion3 = nn.MSELoss()







# In[21]:


def train(epoch):
    model.train()
    #exp_lr_scheduler.step()
    arr_loss = []
    #first_batch = next(iter(train_loader))
    for batch_idx, (data, target) in enumerate(train_loader):
        #amp, phase = data
        clf, reg1, reg2 = target
        
        #print(amp.shape, phase.shape)
        #print(target[2].shape)

        if torch.cuda.is_available():
            device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
            data = [data[i].cuda() for i in range(len(data))]
            target = [target[i].cuda() for i in range(len(target))]
            model.to(device)
  
            


        optimizer.zero_grad()
        output1, output2, output3 = model(*data)
        
        #losses
        loss = criterion1(output1, target[0].long())
        loss1 = criterion2(output2, target[1].float())
        loss2 = criterion3(output3, target[2].float())
        loss = loss + loss1 + loss2
        
        #metrices
        
        
        loss.backward()
        optimizer.step()

        print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),
                100. * (batch_idx + 1) / len(train_loader), loss.data))
        arr_loss.append(loss.data)
    return arr_loss

def averaged_accuracy(outputs, targets):
    assert len(outputs) != len(targets), "number of outputs should equal the number of targets"
    accuracy = []
    for i in range(len(outputs)):
        _, predicted = torch.max(output1.data, 1)
        total += target[0].size(0)
        correct += (predicted == target[0]).sum()
        acc = correct / total *100
        accuracy.append(acc)
    return torch.mean(accuracy)


# In[22]:

optimizer = optim.Adam(model.parameters(), lr=0.00003)

criterion1 = nn.CrossEntropyLoss()
criterion2 = nn.MSELoss()
criterion3 = nn.MSELoss()


n_epochs = 10

for epoch in range(n_epochs):
    train(epoch)

Can anybody provide guidance to resolve this problem?

The warning is most likely riased by nn.MSELoss, which would calculate the wrong loss as broadcasting is used unless you unsqueeze(1) the target.
E.g.:

criterion_raw = nn.MSELoss(reduction='none')
criterion = nn.MSELoss()

x = torch.tensor([[1.], [2.], [3.], [4.]])
print(x.shape)
# torch.Size([4, 1])

y = torch.tensor([1., 2., 3., 4.])
print(y.shape)
# torch.Size([4])


loss = criterion_raw(x, y)
print(loss)
# tensor([[0., 1., 4., 9.],
#         [1., 0., 1., 4.],
#         [4., 1., 0., 1.],
#         [9., 4., 1., 0.]])

loss = criterion(x, y)
print(loss)
# tensor(2.5000) # should be 0.

# fix with unsqueeze
loss = criterion_raw(x, y.unsqueeze(1))
print(loss)
# tensor([[0.],
#         [0.],
#         [0.],
#         [0.]])

loss = criterion(x, y.unsqueeze(1))
print(loss)
# tensor(0.)

nn.CrossEntropyLoss expects a target in the shape [batch_size] containing class indices in the range [0, nb_classes-1] for a multi-class classification, so you should not unsqueeze the target in this case.

Thank you ptrblck for the answer. I was able to solve the problem, however, I’m now facing the problem that my code is running very slowly. Are there tweaks in pytorch to make it faster? I was able to make it run without problems for single task learning but for the multihead architecture I’m running into problems. I previously implemented it in tensorflow and it was running pretty smoothly, however, I’m planning to switch to pytorch. Thank your help in advance (if you need more information let me know).

Generally, I would recommend to check the performance guide to see which optimizations could be applied. E.g. avoiding unnecessary synchronizations, making sure the data loading is fast enough etc. might be a good starter to check.

Oh perfect. I was not aware of it. Thank you very much. I will have a read.