ValueError: Expected input batch_size (324) to match target batch_size (4)

Sir i am new to pytorch.
My code is working fine for batch size=1 but when i am trying to change the batch size to 32 …i get error------>RuntimeError: Given groups=1, weight of size [10, 1, 5, 5], expected input[1, 32, 67, 50] to have 1 channels, but got 32 channels instead


class Net(nn.Module):

def __init__(self):
    super(Net, self).__init__()
    self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
    self.conv2 = nn.Conv2d(10, 20, kernel_size=5) = nn.MaxPool2d(2)
    self.fc = nn.Linear(2340, 2)

def forward(self, x):
    #in_size = x.size(0)
    x = F.relu(
    x = F.relu(
    x = x.view(in_size, -1)  # flatten the tensor
    x = self.fc(x)
    return F.log_softmax(x,dim=1)


torch.Size([8054, 67, 50]) torch.Size([8054])
torch.Size([3968, 67, 50]) torch.Size([3968])

pls help to resolve this…i dont know ho to change in feature , out feature value

In above code i changed conv1 to self.conv1 = nn.Conv2d(32, 10, kernel_size=5)
Now I am getting error →

ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_10968/ in
21 print(b_labels.shape)
22 b_labels = b_labels.view(batch_size)
—> 23 loss = loss_fn(outputs,b_labels.long())
25 #loss =F.nll_loss(outputs,b_labels.long())

~\anaconda3\envs\for_CharBert\lib\site-packages\torch\nn\modules\ in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []

~\anaconda3\envs\for_CharBert\lib\site-packages\torch\nn\modules\ in forward(self, input, target)
1149 def forward(self, input: Tensor, target: Tensor) → Tensor:
→ 1150 return F.cross_entropy(input, target, weight=self.weight,
1151 ignore_index=self.ignore_index, reduction=self.reduction,
1152 label_smoothing=self.label_smoothing)

~\anaconda3\envs\for_CharBert\lib\site-packages\torch\nn\ in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
2844 if size_average is not None or reduce is not None:
2845 reduction = _Reduction.legacy_get_string(size_average, reduce)
→ 2846 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

ValueError: Expected input batch_size (1) to match target batch_size (32).

The reported input shape of [1, 32, 67, 50] doesn’t match your use case which is to increase the batch size to 32.
nn.Conv2d layers expect an input in the shape [batch_size, channels, height, width] so dim0 should have the size of 32 not dim1.
I’m also unsure how you are passing the data to the model. [8054, 67, 50] seems to be missing the channel dimension, so you could try to use x = x.unsqueeze(0), pass the data tensor to e.g. a TensorDataset and then to a DataLoader which would create the batches.

Sir i unsqueezed both x & y…
dimensions now are–>
print(X_train.shape,y_train.shape)->torch.Size([1, 8054, 67, 50]) torch.Size([1, 8054])
print(X_test.shape,y_test.shape)-> torch.Size([1, 3968, 67, 50]) torch.Size([1, 3968])

Changed this to->

class Net(nn.Module):

def __init__(self):
    super(Net, self).__init__()
    self.conv1 = nn.Conv2d(1, 32, kernel_size=5)
    self.conv2 = nn.Conv2d(32, 20, kernel_size=5) = nn.MaxPool2d(2)
    self.fc = nn.Linear(2340, 2)

def forward(self, x):
    in_size = x.size(0)
    x = F.relu(
    x = F.relu(
    x = x.view(in_size, -1)  # flatten the tensor
    x = self.fc(x)
    return F.log_softmax(x,dim=1)

Now error is

RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_1468/ in
17 optimizer.zero_grad()
—> 19 outputs = model(b_input_ids[None, …])
20 print(b_input_ids.shape)
21 print(b_labels.shape)

~\anaconda3\envs\for_CharBert\lib\site-packages\torch\nn\modules\ in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []

~\AppData\Local\Temp/ipykernel_1468/ in forward(self, x)
10 def forward(self, x):
11 in_size = x.size(0)
—> 12 x = F.relu(
13 x = F.relu(
14 x = x.view(in_size, -1) # flatten the tensor

~\anaconda3\envs\for_CharBert\lib\site-packages\torch\nn\modules\ in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []

~\anaconda3\envs\for_CharBert\lib\site-packages\torch\nn\modules\ in forward(self, input)
445 def forward(self, input: Tensor) → Tensor:
→ 446 return self._conv_forward(input, self.weight, self.bias)
448 class Conv3d(_ConvNd):

~\anaconda3\envs\for_CharBert\lib\site-packages\torch\nn\modules\ in _conv_forward(self, input, weight, bias)
440 weight, bias, self.stride,
441 _pair(0), self.dilation, self.groups)
→ 442 return F.conv2d(input, weight, bias, self.stride,
443 self.padding, self.dilation, self.groups)

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 1, 5, 5], but got 5-dimensional input of size [1, 1, 8054, 67, 50] instead

On changing line —> 19 outputs = model(b_input_ids[None, …])
to —> 19 outputs = model(b_input_ids)

This error appears->

RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_1468/ in
17 optimizer.zero_grad()
—> 19 outputs = model(b_input_ids)
20 print(b_input_ids.shape)
21 print(b_labels.shape)

~\anaconda3\envs\for_CharBert\lib\site-packages\torch\nn\modules\ in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = ,

~\AppData\Local\Temp/ipykernel_1468/ in forward(self, x)
10 def forward(self, x):
11 in_size = x.size(0)
—> 12 x = F.relu(
13 x = F.relu(
14 x = x.view(in_size, -1) # flatten the tensor

~\anaconda3\envs\for_CharBert\lib\site-packages\torch\nn\modules\ in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = ,

~\anaconda3\envs\for_CharBert\lib\site-packages\torch\nn\modules\ in forward(self, input)
445 def forward(self, input: Tensor) → Tensor:
→ 446 return self._conv_forward(input, self.weight, self.bias)
448 class Conv3d(_ConvNd):

~\anaconda3\envs\for_CharBert\lib\site-packages\torch\nn\modules\ in _conv_forward(self, input, weight, bias)
440 weight, bias, self.stride,
441 _pair(0), self.dilation, self.groups)
→ 442 return F.conv2d(input, weight, bias, self.stride,
443 self.padding, self.dilation, self.groups)

RuntimeError: Given groups=1, weight of size [32, 1, 5, 5], expected input[1, 8054, 67, 50] to have 1 channels, but got 8054 channels instead

In my previous post the code contains a mistake, as you should unsqueeze dim1 (not dim0 as you already have the batch dimension) so change it to x = x.unsqueeze(1).

PS: you can post code snippets by wrapping them into three backticks ```, which would make debugging easier.

Data Generator & augmentation

datagen = ImageDataGenerator(rescale = 1./255, validation_split=0.2)
IMAGE_SIZE = (256,256,3)
train_ds = datagen.flow_from_dataframe(dataframe=df_train,
x_col = ‘Image_name’,
y_col = ‘Plane’,

valid_ds = datagen.flow_from_dataframe(dataframe=df_train,
x_col = ‘Image_name’,
y_col = ‘Plane’,
test_ds = datagen.flow_from_dataframe(dataframe=df_test,
x_col = ‘Image_name’,
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
class FeedforwardNeuralNetModel(nn.Module):
def init(self, input_dim, hidden_dim, output_dim):
super(FeedforwardNeuralNetModel, self).init()
# Linear function
self.fc1 = nn.Linear(2562563, 100)

    # Non-linearity
    self.sigmoid = nn.Sigmoid()

    # Linear function (readout)
    self.fc2 = nn.Linear(hidden_dim, output_dim)  

def forward(self, x):
    # Linear function  # LINEAR
    out = self.fc1(x)

    # Non-linearity  # NON-LINEAR
    out = self.sigmoid(out)

    # Linear function (readout)  # LINEAR
    out = self.fc2(out)
    return out

input_dim = 2562563
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)
criterion = nn.CrossEntropyLoss() # create an object of crossentropy loss
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) # setting the hyperparameter tunning, set learning rate and optimizer is taken as schostic gradiant decent

FC 1 Parameters


FC 1 Bias Parameters


FC 2 Parameters


FC 2 Bias Parameters

iter = 0
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
# Load images with gradient accumulation capabilities
images = images.view(-1, 256*256).requires_grad_()

    # Clear gradients w.r.t. parameters

    # Forward pass to get output/logits
    outputs = model(images)

    # Calculate Loss: softmax --> cross entropy loss
    loss = criterion(outputs, labels)

    # Getting gradients w.r.t. parameters

    # Updating parameters

    iter += 1

    if iter % 500 == 0:
        # Calculate Accuracy         
        correct = 0
        total = 0
        # Iterate through test dataset
        for images, labels in test_loader:
            # Load images with gradient accumulation capabilities
            images = images.view(-1, 256*256).requires_grad_()

            # Forward pass only to get logits/output
            outputs = model(images)

            # Get predictions from the maximum value
            _, predicted = torch.max(, 1)

            # Total number of labels
            total += labels.size(0)

            # Total correct predictions
            correct += (predicted == labels).sum()

        accuracy = 100 * correct / total

        # Print Loss
        print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

this is my sample code and am getting error as ValueError: Expected input batch_size (3200) to match target batch_size (100)

I would guess this view operation is wrong:

images = images.view(-1, 256*256)

as was already discussed in previous posts in this topic, e.g. here.

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier.

Hi Sir, I Have a similar error but the .py files are different:
the code has been taken from GitHub: GitHub - Leo-Q-316/ImGAGN: Imbalanced Network Embedding vi aGenerative Adversarial Graph Networks
and the data had been transformed as it says in GitHub. The only problem I faces in the features.cora part.

so I have this error in the following code: ValueError: Expected input batch_size (18) to match target batch_size (15).

from __future__ import division
from __future__ import print_function

import time
import argparse
import numpy as np
import scipy.sparse as sp
import torch
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

from utils import load_data, accuracy, add_edges
from models import GCN
from models import Generator

# Training settings
parser = argparse.ArgumentParser()
parser.add_argument('--no-cuda', action='store_true', default=False,
                    help='Disables CUDA training.')
parser.add_argument('--fastmode', action='store_true', default=False,
                    help='Validate during training pass.')
parser.add_argument('--seed', type=int, default=42, help='Random seed.')
parser.add_argument('--epochs', type=int, default=100,
                    help='Number of epochs to train.')
parser.add_argument('--hidden', type=int, default=128,
                    help='Number of hidden units.')
parser.add_argument('--dropout', type=float, default=0.5,
                    help='Dropout rate (1 - keep probability).')
parser.add_argument('--epochs_gen', type=int, default=10,
                    help='Number of epochs to train for gen.')
parser.add_argument('--ratio_generated', type=float, default=1,
                    help='ratio of generated nodes.')
parser.add_argument('--dataset', choices=['cora', 'citeseer','pubmed', 'dblp', 'wiki'], default='cora')

args = parser.parse_args()
args.cuda = not args.no_cuda and torch.cuda.is_available()

if args.cuda:

dataset = args.dataset
path = "../Dataset/" + dataset+"/"

if dataset=='wiki':
    num = 3
    num = 10

# Specfic Parameters to get the best result
if dataset=='wiki':
elif dataset=='dblp':

if dataset == 'cora':
    weight_decay = 0.0008
elif dataset == 'citeseer':
    weight_decay = 0.0005
elif dataset == 'pubmed':
    weight_decay = 0.00008
elif dataset == 'dblp':
    weight_decay = 0.003
elif dataset == 'wiki':
    weight_decay = 0.0005

def train(features, adj):
    global max_recall, test_recall, test_f1, test_AUC, test_acc, test_pre
    output, output_gen, output_AUC = model(features, adj)
    labels_true =, torch.LongTensor(num_false).fill_(1)))

    if args.cuda:

    loss_dis = - euclidean_dist(features[minority], features[majority]).mean()
    loss_train = F.nll_loss(output[idx_train], labels[idx_train]) \
                 + F.nll_loss(output_gen[idx_train], labels_true) \


    if not args.fastmode:
        output, output_gen, output_AUC = model(features, adj)

    recall_val, f1_val, AUC_val, acc_val, pre_val = accuracy(output[idx_val], labels[idx_val], output_AUC[idx_val])
    recall_train, f1_train, AUC_train, acc_train, pre_train = accuracy(output[idx_val], labels[idx_val], output_AUC[idx_val])

    if max_recall < (recall_val + acc_val)/2:
        output, output_gen, output_AUC = model(features, adj)
        recall_tmp, f1_tmp, AUC_tmp, acc_tmp, pre_tmp = accuracy(output[idx_test], labels[idx_test], output_AUC[idx_test])
        test_recall = recall_tmp
        test_f1 = f1_tmp
        test_AUC = AUC_tmp
        test_acc = acc_tmp
        test_pre = pre_tmp
        max_recall = (recall_val + acc_val)/2

    return recall_val, f1_val, acc_val, recall_train, f1_train, acc_train

def euclidean_dist(x, y):
    m, n = x.size(0), y.size(0)
    xx = torch.pow(x, 2).sum(1, keepdim=True).expand(m, n)
    yy = torch.pow(y, 2).sum(1, keepdim=True).expand(n, m).t()
    dist = xx + yy
    dist.addmm_(1, -2, x, y.t())
    dist = dist.clamp(min=1e-12).sqrt()  # for numerical stability
    return dist

# ratio_arr = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
# for ratio in ratio_arr:
adj, adj_real, features, labels, idx_temp, idx_test, generate_node, minority, majority, minority_all = load_data(args.ratio_generated, path=path, dataset=dataset)
# Model and optimizer
model = GCN(nfeat=features.shape[1],
    nclass=labels.max().item() + 1,
    generate_node= generate_node,
    min_node = minority)
optimizer = optim.Adam(model.parameters(),lr=lr, weight_decay=weight_decay)

# num_real = features.shape[0]
num_false = labels.shape[0]- features.shape[0] #diff bw lengths of first row

model_generator = Generator(minority_all.shape[0])
optimizer_G = torch.optim.Adam(model_generator.parameters(),
                       lr=lr, weight_decay=weight_decay)

max_recall = 0
test_recall = 0
test_f1 = 0
test_AUC = 0
test_pre =0

if args.cuda:
    features = features.cuda()
    adj = adj.cuda()
    labels = labels.cuda()
    idx_temp = idx_temp.cuda()
    idx_test = idx_test.cuda()

for epoch_gen in range(args.epochs_gen):
    part = epoch_gen % num
    range_val_maj = range(int(part*len(majority)/num), int((part+1)*len(majority)/num))
    range_val_min = range(int(part * len(minority) / num), int((part + 1) * len(minority) / num))

    range_train_maj = list(range(0,int(part*len(majority)/num)))+ list(range(int((part+1)*len(majority)/num),len(majority)))
    range_train_min = list(range(0,int(part*len(minority)/num)))+ list(range(int((part+1)*len(minority)/num),len(minority)))

    idx_val =[range_val_maj], minority[range_val_min]))
    idx_train =[range_train_maj], minority[range_train_min]))
    idx_train =, generate_node))
    num_real = features.shape[0] - len(idx_test) -len(idx_val)

    # Train model
    z = Variable(torch.FloatTensor(np.random.normal(0, 1, (generate_node.shape[0], 100))))
    if args.cuda:

    adj_min = model_generator(z)
    gen_imgs1 =[:,0:minority.shape[0]], dim=1), features[minority])
    gen_imgs1_all =, dim=1), features[minority_all])

    matr = F.softmax(adj_min[:,0:minority.shape[0]], dim =1).data.cpu().numpy()
    adj_temp = sp.coo_matrix((np.ones(pos[0].shape[0]),(generate_node[pos[0]].numpy(), minority_all[pos[1]].numpy())),
                             shape=(labels.shape[0], labels.shape[0]),

    adj_new = add_edges(adj_real, adj_temp)
    if args.cuda:

    t_total = time.time()
    # model.eval()
    output, output_gen, output_AUC = model(,,0), adj)

    labels_true = torch.LongTensor(num_false).fill_(0)
    labels_min = torch.LongTensor(num_false).fill_(1)
    if args.cuda:
        labels_true = labels_true.cuda()
        labels_min = labels_min.cuda()

    g_loss = F.nll_loss(output_gen[generate_node], labels_true) \
             + F.nll_loss(output[generate_node], labels_min) \
             + euclidean_dist(features[minority], gen_imgs1).mean()


    for epoch in range(args.epochs):
        recall_val, f1_val, acc_val, recall_train, f1_train, acc_train = train(,,0), adj_new)
    print("Epoch:", '%04d' % (epoch_gen + 1),
          "train_recall=", "{:.5f}".format(recall_train), "train_f1=", "{:.5f}".format(f1_train),"train_acc=", "{:.5f}".format(acc_train),
          "val_recall=", "{:.5f}".format(recall_val), "val_f1=", "{:.5f}".format(f1_val),"val_acc=", "{:.5f}".format(acc_val))

print("Test Recall: ", test_recall)
print("Test Accuracy: ", test_acc)
print("Test F1: ", test_f1)
print("Test precision: ", test_pre)
print("Test AUC: ", test_AUC)

I can see that the target_batch size = num_Real+num_false where :
features = sp.csr_matrix(idx_features_labels[:, 0:-1], dtype=np.float32) #reverse order of features cora ndarray
labels = idx_features_labels[:, -1]

Ps this is required for a group project which is due on 24th so I would be grateful if u could help out

Check which line of code is raising the shape mismatch as your current script calculates the loss in a few places. Once isolated, check if the input tensor matches the output tensor in its batch size. If not, then check the model implementation and in particular its forward method to narrow down where the batch size changes. On the other hand, if the input matches the output in the batch size, make sure the target also has the same batch size. If not, check how the target is created and why the batch size is different.

Hi, @ptrblck! I am kinda in the same situation. I am trying a very basic model:

class Conv2DModel(nn.Module):
    def __init__(self, n_input=1, n_output=35, stride=16, n_channel=32):
        super(Conv2DModel, self).__init__()
        self.conv1 = nn.Conv2d(n_input, n_channel, kernel_size=(1,80), stride=stride)
        self.conv2 = nn.Conv2d(n_channel, n_channel, kernel_size=(1,3))
        self.fc1 = nn.Linear(1976, n_channel)
        self.fc2 = nn.Linear(n_channel, n_output)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

The error:

ValueError                                Traceback (most recent call last)
Input In [174], in <cell line: 9>()
      9 with tqdm(total=n_epoch) as pbar:
     10     for epoch in range(1, n_epoch + 1):
---> 11         train(model, epoch, log_interval)
     12         test(model, epoch)
     13         scheduler.step()

Input In [173], in train(model, epoch, log_interval)
     15 output = model(data)
     17 # negative log-likelihood for a tensor of size (batch x 1 x n_output)
---> 18 loss = F.nll_loss(output.squeeze(), target)
     20 optimizer.zero_grad()
     21 loss.backward()

File ~/anaconda3/envs/user/lib/python3.8/site-packages/torch/nn/, in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2669 if size_average is not None or reduce is not None:
   2670     reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2671 return torch._C._nn.nll_loss_nd(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

ValueError: Expected input batch_size (32) to match target batch_size (256).

I don’t know what your input shapes are so could you post a minimal, executable code snippet please?

input is torch.Size([1, 256, 8000]).

optimizer = optim.Adam(model.parameters(), lr=0.01, weight_decay=0.0001)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.1) 

def train(model, epoch, log_interval):
    for batch_idx, (data, target) in enumerate(train_loader):

        data =
        target =
        # apply transform and model on whole batch directly on device
        data = transform(data)
        data = torch.transpose(data, 1, 0)

        output = model(data)

        # negative log-likelihood for a tensor of size (batch x 1 x n_output)
        loss = F.nll_loss(output.squeeze(), target)


        # print training stats
        if batch_idx % log_interval == 0:
            print(f"Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} ({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}")

        # update progress bar
        # record loss
log_interval = 20
n_epoch = 2

pbar_update = 1 / (len(train_loader) + len(test_loader))
losses = []

# The transform needs to live on the same device as the model and the data.
transform =
with tqdm(total=n_epoch) as pbar:
    for epoch in range(1, n_epoch + 1):
        train(model, epoch, log_interval)
        test(model, epoch)

This doesn’t seem to be the case, as this input shape fails with:

model = Conv2DModel()
x = torch.randn(1, 256, 8000)
out = model(x)
> RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 1, 1, 80], but got 3-dimensional input of size [1, 256, 8000] instead

which is expected, since you are using nn.Conv2d as the first layer while the input is 3-dimensional.
Adding the channel dimension as 1 fails with:

x = torch.randn(1, 1, 256, 8000)
out = model(x)
> RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x63232 and 1976x32)

I am working on a notebook, seems like I was loading another model with a similar name. My bad. In that case, I am getting the “mat1 and mat2” error now.
Seems like I have to set the 63232 to the fc1 like this: self.fc1 = nn.Linear(63232, n_channel).
How do I compute this 63232 without the need to having to set it manually?

Greetings!! Sorry, I feel like I’m asking something largely discussed, but can’t fix it on my own.
I get the error mat1 and mat2 shapes cannot be multiplied (16x441 and 7056x64)
This is my implementation:

class CNN(nn.Module):
	def __init__(self, history_length=0, n_classes=3):
        super(CNN, self).__init__()
        self.convo1 = nn.Conv2d(1, 6, kernel_size=5)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.convo2 = nn.Conv2d(6, 16, kernel_size=5)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.linear1 = nn.Linear(21 * 21 * 16, 64)
        self.linear2 = nn.Linear(64, n_classes)

    def forward(self, x):
        x = self.convo1(x)
        x = F.relu(x)
        x = self.pool1(x)
        x = self.convo2(x)
        x = F.relu(x)
        x = self.pool2(x)

        print(x.shape) # Returns torch.Size([16, 21, 21])
        x = x.view(x.size(0), -1)
		print(x.shape) # Returns torch.Size([16, 441])
        x = self.linear1(x)
        x = self.linear2(x)
        x = x.softmax(dim=1)

        return x

How can I fix it? Any help would be appreciated :slight_smile:

Set the in_features of self.linear to 441 and it should work:

self.linear1 = nn.Linear(441, 64)

since the incoming activation has a shape of [batch_size=16, features=441] as given in your code.

PS: I’m not familiar with your use case, but be careful about the usage of .softmax.
If you are working on a multi-class classification and are using nn.CrossEntropyLoss, remove the .softmax call as raw logits are expected.

Just after posting, I noticed about the softmax issue. Changed the features and now it works! Thanks!

Hi @ptrblck, I have a problem with a transformer model with batch_size

from datasets import load_dataset, load_metric
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from import DataLoader
from transformers import get_scheduler, AdamW
import torch
from import tqdm
from collections import defaultdict
import torch.nn as nn

n_classes = 39

dataset = load_dataset('csv', data_files={'train': 'train.csv', 'val': 'val.csv', 'test': 'test.csv'})

tokenizer = AutoTokenizer.from_pretrained("sismetanin/rubert-ru-sentiment-rusentiment")

def tokenize_function(examples):
  max_length = len(max(data.text, key=len))
  return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=512) # обрезаем все сообщения до 512 символов

tokenized_datasets =, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(["text"])
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")

batch_size = 1

def my_collate(batch):
  data = defaultdict(list)
  {data[key].append(sub[key]) for sub in batch for key in sub}
  data = {key: torch.vstack(value) for key, value in data.items()}
  data['labels'] = torch.nn.functional.one_hot(data['labels'].to(torch.int64)-1, n_classes).view(-1)
  return data

train_dataloader = DataLoader(tokenized_datasets["train"], shuffle=True, batch_size=batch_size, collate_fn=my_collate)
eval_dataloader = DataLoader(tokenized_datasets["test"], batch_size=batch_size, collate_fn=my_collate)
model = AutoModelForSequenceClassification.from_pretrained("sismetanin/rubert-ru-sentiment-rusentiment", num_labels=n_classes, ignore_mismatched_sizes=True)

model.classifier = nn.Sequential(nn.Linear(in_features=768, out_features=78),
                                 nn.Linear(in_features=78, out_features=n_classes))

for param in list(model.bert.embeddings.parameters())[:-1]:
  param.requires_grad = False

optimizer = AdamW(model.parameters(), lr=5e-5)

num_epochs = 3
num_training_steps = num_epochs * len(train_dataloader)
lr_scheduler = get_scheduler(

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

for epoch in tqdm(range(num_epochs)):
  for batch in tqdm(train_dataloader):
    outputs = model(**batch)
    loss = outputs.loss


the full error message:

ValueError                                Traceback (most recent call last)
<ipython-input-52-4f0f72f0c5ef> in <module>
     76 for epoch in tqdm(range(num_epochs)):
     77   for batch in tqdm(train_dataloader):
---> 78     outputs = model(**batch)
     79     loss = outputs.loss
     80     loss.backward()

4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/ in _call_impl(self, *input, **kwargs)
   1128         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130             return forward_call(*input, **kwargs)
   1131         # Do not call functions when jit is used
   1132         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/ in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
   1589             elif self.config.problem_type == "single_label_classification":
   1590                 loss_fct = CrossEntropyLoss()
-> 1591                 loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
   1592             elif self.config.problem_type == "multi_label_classification":
   1593                 loss_fct = BCEWithLogitsLoss()

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/ in _call_impl(self, *input, **kwargs)
   1128         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130             return forward_call(*input, **kwargs)
   1131         # Do not call functions when jit is used
   1132         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/ in forward(self, input, target)
   1164         return F.cross_entropy(input, target, weight=self.weight,
   1165                                ignore_index=self.ignore_index, reduction=self.reduction,
-> 1166                                label_smoothing=self.label_smoothing)

/usr/local/lib/python3.7/dist-packages/torch/nn/ in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   3012     if size_average is not None or reduce is not None:
   3013         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3014     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

ValueError: Expected input batch_size (1) to match target batch_size (39).

Could you explain what batch contains and check the shapes?
The error is raised due to a mismatch in the shape of the model output and the target, but I can’t see how these shapes are defined and what might be causing the issue.
Often users are flattening an activation tensor in a wrong way and change the batch size by accident, ut I also don’t see the model’s forward method.