RuntimeError: size mismatch, m1: [16384 x 1], m2: [128 x 2]

Trying to make a model that reads in a simple vector containing a review of a product and outputs a classification (favorable or unfavorable). The code for my classifier and training loop is below:

Model class definition:

class Classifier(nn.Module):
    def __init__(self, initial_n_channels, n_classes, network_n_channels):
        super(Classifier, self).__init__()
        self.network = nn.Sequential(
            nn.Conv1d(in_channels=initial_n_channels,
                      out_channels=network_n_channels,
                      kernel_size=args["kernel_size"]),
            nn.ReLU(),
            nn.Conv1d(in_channels=network_n_channels,
                      out_channels=network_n_channels,
                      kernel_size=args["kernel_size"],
                      stride=args["stride"]),
            nn.ReLU(),
            nn.Conv1d(in_channels=network_n_channels,
                      out_channels=network_n_channels,
                      kernel_size=args["kernel_size"],
                      stride=args["stride"]),
            nn.ReLU(),
            nn.Conv1d(in_channels=network_n_channels,
                      out_channels=network_n_channels,
                      kernel_size=args["kernel_size"],
                      stride=args["stride"]),
            nn.ReLU()
        )
        self.fc = nn.Linear(network_n_channels, n_classes)
        
    def forward(self, x_in, apply_sigmoid=False):
        # diagnostics
        print("classifier diagnostics", "\n",
              "---------------------------------", "\n")
        print("classifier x_in size: ", x_in.size())
        print("classifier weight size: ", self.fc.weight.size())
        
        features = self.network(x_in)
        prediction_vector = self.fc(features)
        if apply_sigmoid:
            prediction_vector = F.sigmoid(prediction_vector, dim=1)
        return prediction_vector.double()

Instantiation:

# dataset and vectorizer
dataset = ReviewDataset.load_and_vectorize(args["review_csv"])
vectorizer = dataset.get_vectorizer()

# model
classifier = Classifier(initial_n_channels=len(vectorizer.review_vocab),
                        n_classes=len(vectorizer.rating_vocab),
                        network_n_channels=args["num_channels"]).double()

# loss and optimizer
loss_func = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(classifier.parameters(), lr=args["learning_rate"])

Training loop:

for epoch_index in range(args["num_epochs"]):
    train_state["epoch_index"] = epoch_index
    
    # set up batch generator, initialize loss and 
    # accuracy each outer loop, set train mode on
    dataset.set_split("train")
    dataloader = DataLoader(dataset=dataset,
                            batch_size=args["batch_size"],
                            drop_last=args["drop_last"])
    running_loss = 0.0
    running_acc = 0.0
    classifier.train()
    
    for batch_index, batch_dict in enumerate(dataloader):
        # five-step training routine
        
        # diagnostic stats
        print("\n", "training loop diagnostics", "\n",
              "---------------------------------", "\n")
        print("batch tensor dimensions: ", batch_dict["x_data"].shape)
        print("labels: ", batch_dict["y_target"])
        
        # i. zero the gradients
        optimizer.zero_grad()
        
        # ii. compute the output
        y_pred = classifier.forward(x_in=batch_dict["x_data"].unsqueeze(dim=2))
        
        # iii. compute the loss
        loss = loss_func(y_pred, batch_dict["y_target"].float())
        loss_batch = loss.item()
        running_loss += (loss_batch - running_loss) / (batch_index + 1)
        
        # iv. use loss to produce gradients
        loss.backward()
        
        # v. use optimizer to take gradient step
        optimizer.step()
        
        # -----------------------------------
        # compute accuracy score
        acc_batch = compute_accuracy(y_pred, batch_dict["y_target"])
        running_acc += (acc_batch - running_acc) / (batch_index + 1)
        
    train_state["train_loss"].append(running_loss)
    train_state["train_acc"].append(running_acc)
    
    # iterate over validation dataset
    
    # set up batch generator, set loss and acc to
    # zero, and set eval mode on
    dataset.set_split("val")
    dataloader = DataLoader(dataset=dataset, batch_size=args.batch_size)
    running_loss = 0.0
    running_acc = 0.0
    classifier.eval()
    
    for batch_index, batch_dict in enumerate(dataloader):
        # i. compute output
        y_pred = classifier.forward(x_in=batch_dict["x_data"].unsqueeze(dim=2))
        
        # ii. compute loss
        loss = loss_func(y_pred, batch_dict["y_target"].float())
        loss_batch = loss.item()
        running_loss += (loss_batch - running_loss) / (batch_index + 1)
        
        # iii. compute accuracy
        acc_batch = compute_accuracy(y_pred, batch_dict["y_target"])
        running_acc += (acc_batch - running_acc) / (batch_index + 1)
        
    train_state["val_loss"].append(running_loss)
    train_state["val_acc"].append(running_acc)

Output:


training loop diagnostics 
 --------------------------------- 

batch tensor dimensions:  torch.Size([128, 7882])
labels:  tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0])
classifier diagnostics 
 --------------------------------- 

classifier x_in size:  torch.Size([128, 7882, 1])
classifier weight size:  torch.Size([2, 128])

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-213-2327bb74133d> in <module>
     26 
     27         # ii. compute the output
---> 28         y_pred = classifier.forward(x_in=batch_dict["x_data"].unsqueeze(dim=2))
     29 
     30         # iii. compute the loss

<ipython-input-209-c8b508905fa0> in forward(self, x_in, apply_sigmoid)
     45         print("classifier weight size: ", self.fc.weight.size())
     46         features = self.network(x_in)
---> 47         prediction_vector = self.fc(features)
     48         if apply_sigmoid:
     49             prediction_vector = F.sigmoid(prediction_vector, dim=1)

... blah blah ...

RuntimeError: size mismatch, m1: [16384 x 1], m2: [128 x 2] at ../aten/src/TH/generic/THTensorMath.cpp:752 

Selected parameters/hyperparameters:

args = {
    ... blah blah ...
    # Model Hyperparameters
    "num_channels": 128,
    "kernel_size": 1,
    "stride": 1,
    # Training Hyperparameters
    "batch_size": 128,
    "early_stopping_criteria": 5,
    "learning_rate": 0.001,
    "num_epochs": 100,
    "drop_last": True
}

How can I fix the dimensions of my batch tensors so that they’re of the proper size?

Could you print the shape of features before passing them to self.fc?
Usually you would flatten the output of a conv layer, as currently features should have the shape [batch_size, channels, seq_length].

The output I got for features.size() is:

torch.Size([128, 128, 1])

What do you mean by seq_length ? Should I take the length of the longest review and make that seq_length for all vector representations of my data points?

The tensor given to the network is of size [128, 7882, 1], which means it is seen by Conv1 as a batch of sequences with seq_length=1 . Then, it goes through all the Conv1 modules, keeping this seq_length=1 (because of the kernel_size=1, stride=1).
So it is not seen as a sequence.
I find it very weird, as you could achieve the same result using only nn.Linear() modules, instead of convolutions, if you removed the unsqueezed dimension.
Please tell me if there is something I am missing.

Anyway, after the convolutions, you end up with a batch of 128 sequences with num_channels=128, seq_length=1.
So if you squeeze the tensor features before passing it to self.fc, it would remove the unused seq_len dimension, and you should be fine.

I’m trying to implement convolution to pick up n-grams in the data and then use them as features for classification. I’ve since changed the kernel_size to 2 now that I’ve implemented a 2D one-hot matrix for each observation rather than a single collapsed one-hot vector for each (hence why I couldn’t use a 2x2 kernel before). Noob here so please bear with me.

Refactored the vectorizer so that my classifier inputs are now of the shape [batch_size, channels, seq_length] as ptrblock suggested. I’m currently using the following helper function to flatten inputs before passing them to my fc layer:

def flatten(x):
    shape = torch.prod(torch.tensor(x.shape[1:])).item()
    return x.view(-1, shape)

Still, I’m getting the following error when I run the forward pass on my model:

RuntimeError: size mismatch, m1: [512 x 34688], m2: [128 x 1] at ../aten/src/TH/generic/THTensorMath.cpp:752

512 is my batch_size, 3995 is my initial_n_features (the words appearing often enough in the text to be used as features), and 275 is my seq_length (the maximum length of any review passed into my model).

I’m also using 128 as the number of features used by my intermediate layers (so out_channels in each of my Conv1d’s), and 1 as the number of classes in my final prediction vector.

What am I doing wrong? I’m using the same Classifier as above except I flatten the output of the Conv layers before passing them into the fc layer using the function in this post.

I’m still not sure to understand the use case completely.
Would you like to get a single output for each sequence you are passing to the model?
If so, you could need to adapt the number of input features in your linear layer to match the output features of your last conv layer (out_channels*seq_length).
Or would you like to get an output of [batch_size, seq_length]?