ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 16])

I am trying to reproduce the multi-input neural network of this tutorial: tutorial. This article uses PyTorch Lightning, while I want to use PyTorch, so I am adapting to my case. Basically I created my dataloaders and my network:

# Define loaders
from torch.utils.data import DataLoader
train_loader = DataLoader(train_set, batch_size=64, num_workers=2, drop_last=True, shuffle=True)
val_loader   = DataLoader(val_set,   batch_size=64, num_workers=2, drop_last=False, shuffle=False)
test_loader  = DataLoader(test_set,  batch_size=64, num_workers=2, drop_last=False, shuffle=False)

def conv_block(input_size, output_size):
    block = nn.Sequential(
        nn.Conv2d(input_size, output_size, (3, 3)), nn.BatchNorm2d(output_size), nn.ReLU(), nn.MaxPool2d((2, 2)),
    )

    return block

class SimpleCNN(nn.Module):

  #Constructor
  def __init__(self):
    # Call parent contructor
    super().__init__()
    self.conv1 = conv_block(3, 16)
    self.conv2 = conv_block(16, 32)
    self.conv3 = conv_block(32, 64)

    self.ln1 = nn.Linear(64 * 26 * 26, 16)
    self.relu = nn.ReLU()
    self.batchnorm = nn.BatchNorm1d(16)
    self.dropout = nn.Dropout2d(0.5)
    self.ln2 = nn.Linear(16, 5)

    self.ln4 = nn.Linear(5, 10)
    self.ln5 = nn.Linear(10, 10)
    self.ln6 = nn.Linear(10, 5)
    self.ln7 = nn.Linear(10, 1)
  # Forward
  def forward(self, img, tab):
    img = self.conv1(img)

    img = self.conv2(img)
    img = self.conv3(img)
    img = img.reshape(img.shape[0], -1)
    img = self.ln1(img)
    img = self.relu(img)
    img = self.batchnorm(img)
    img = self.dropout(img)
    img = self.ln2(img)
    img = self.relu(img)

    tab = self.ln4(tab)
    tab = self.relu(tab)
    tab = self.ln5(tab)
    tab = self.relu(tab)
    tab = self.ln6(tab)
    tab = self.relu(tab)

    x = torch.cat((img, tab), dim=1)
    x = self.relu(x)

    return self.ln7(x)

Now, I try to pass a sample to my network to check if it works correctly, in order to start training later:

# Create the model
model = SimpleCNN()
img_x, tab_x, label_x = train_set[0]
print(img_x.shape, tab_x, label_x)
img_x
img_x = img_x.unsqueeze(dim=0)
output = model(img_x, tab_x)
output.shape

But here, I have this error: ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 16]) at this line:

----> 4 output = model(img_x, tab_x)
........
---> 37 img = self.batchnorm(img)

I know that probably is because I am passing only one element to my network (I am not sure). However, If I remove that line from my neural network, so basically changing the forward method in this way:

def forward(self, img, tab):
    img = self.conv1(img)

    img = self.conv2(img)
    img = self.conv3(img)
    img = img.reshape(img.shape[0], -1)
    img = self.ln1(img)
    img = self.relu(img)
    img = self.dropout(img)
    img = self.ln2(img)
    img = self.relu(img)

    tab = self.ln4(tab)
    tab = self.relu(tab)
    tab = self.ln5(tab)
    tab = self.relu(tab)
    tab = self.ln6(tab)
    tab = self.relu(tab)

    x = torch.cat((img, tab), dim=1)
    x = self.relu(x)

    return self.ln7(x)

then I have this other error: RuntimeError: Tensors must have same number of dimensions: got 2 and 1 but this time referred to this line: ---> 49 x = torch.cat((img, tab), dim=1)

Edit: I solved the second error RuntimeError: Tensors must have same number of dimensions: got 2 and 1: basically I needed to unsqueeze also the tab_x in input:

model = SimpleCNN()
img_x, tab_x, label_x = train_set[0]
img_x = img_x.unsqueeze(dim=0)
tab_x = tab_x.unsqueeze(dim=0)
output = model(img_x, tab_x)
output.shape

and my output shape is torch.Size([1, 1]). However, I am still not able to fix the first error (the one in title)

Yes, that’s correct. BatchNorm layers calculate the stats (mean and std) from the input activation during training and need more than a single element to avoid computing a NaN std. If you want to keep the BatchNorm layer, increase the number of samples or change the model architecture e.g. such that these layers work on a sequential input with multiple values (e.g. in [batch_size, channels, seq_len]).