ValueError: Target size (torch.Size([60])) must be the same as input size (torch.Size([66]))

Hey guys, I am newbie over here, and I am trying to use some frankensteined pytorch code to create a sentiment analysis using CNN for a final project. I’ve been getting mistakes and fixing them, but I really can’t figure out this one and all the explanations involve really complicated mathematical things. Any help is extremely appreciated.

BATCH_SIZE = 64

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size = BATCH_SIZE,
    device = device)
    
import torch.nn as nn
import torch.nn.functional as F

class CNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, 
                 dropout, pad_idx):
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        
        self.convs = nn.ModuleList([
                                    nn.Conv2d(in_channels = 1, 
                                              out_channels = n_filters, 
                                              kernel_size = (fs, embedding_dim)) 
                                    for fs in filter_sizes
                                    ])
        
        self.fc = nn.Linear(len(filter_sizes) * n_filters, output_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text):
        
        #text = [sent len, batch size]
        
        text = text.permute(1, 0)
                
        #text = [batch size, sent len]
        
        embedded = self.embedding(text)
                
        #embedded = [batch size, sent len, emb dim]
        
        embedded = embedded.unsqueeze(1)
        
        #embedded = [batch size, 1, sent len, emb dim]
        
        conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs]
            
        #conv_n = [batch size, n_filters, sent len - filter_sizes[n]]
        
        pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]
        
        #pooled_n = [batch size, n_filters]
        
        cat = self.dropout(torch.cat(pooled, dim = 1))

        #cat = [batch size, n_filters * len(filter_sizes)]
            
        return self.fc(cat)
        
INPUT_DIM = len(TWEET.vocab)
EMBEDDING_DIM = 20
N_FILTERS = 100
FILTER_SIZES = [3,4,5]
OUTPUT_DIM = 1
DROPOUT = 0.5
PAD_IDX = TWEET.vocab.stoi[TWEET.pad_token]

model = CNN(INPUT_DIM, EMBEDDING_DIM, N_FILTERS, FILTER_SIZES, OUTPUT_DIM, DROPOUT, PAD_IDX)

def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

embeddings = TWEET.vocab.vectors

model.embedding.weight.data.copy_(embeddings)

UNK_IDX = TWEET.vocab.stoi[TWEET.unk_token]

model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)

import torch.optim as optim

optimizer = optim.Adam(model.parameters())

criterion = nn.BCEWithLogitsLoss()

model = model.to(device)
criterion = criterion.to(device)

def binary_accuracy(preds, y):
    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float()
    acc = correct.sum() / len(correct)
    return acc
    
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
                
        predictions = model(batch.tweet).squeeze(1)
        
        loss = criterion(predictions, batch.label)
        
        acc = binary_accuracy(predictions, batch.label)
        
        loss.backward()
            
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
                
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:
            
            predictions = model(batch.tweet).squeeze(1)
            
            loss = criterion(predictions, batch.label)
            
            acc = binary_accuracy(predictions, batch.label)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)
    
import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs
    
N_EPOCHS = 10
FREEZE_FOR = 5

best_valid_loss = float('inf')

#freeze embeddings
model.embedding.weight.requires_grad = unfrozen = False

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s | Frozen? {not unfrozen}')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'tutC-model.pt')
    
    if (epoch + 1) >= FREEZE_FOR:
        #unfreeze embeddings
        model.embedding.weight.requires_grad = unfrozen = True

ValueError                                Traceback (most recent call last)
<ipython-input-30-10509ec63b58> in <module>()
     11     start_time = time.time()
     12 
---> 13     train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
     14     valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
     15 

3 frames
<ipython-input-27-80e3304debd7> in train(model, iterator, optimizer, criterion)
     12         predictions = model(batch.tweet).squeeze(1)
     13 
---> 14         loss = criterion(predictions, batch.label)
     15 
     16         acc = binary_accuracy(predictions, batch.label)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
    615                                                   self.weight,
    616                                                   pos_weight=self.pos_weight,
--> 617                                                   reduction=self.reduction)
    618 
    619 

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
   2431 
   2432     if not (target.size() == input.size()):
-> 2433         raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
   2434 
   2435     return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)

ValueError: Target size (torch.Size([60])) must be the same as input size (torch.Size([66]))

I would recommend to check the shapes of predictions (and all activations, which were used to compute predictions) as well as batch.label.
Since both should have the batch size in dim0, I assume you might be seeing this error in the last batch, which might be smaller?
This could explain the target batch size of 60, but wouldn’t explain the 66 for the predictions.
You could use drop_last=True nevertheless to remove the last and smaller batch.

I added drop_last = True as you recommended, and I don’t get why the batch_size of 66 once again finds its way into the error.

Do you see this issue in the first iteration or later during the training?
Could you add these print statements and check the shapes?

print(batch.tweet.shape)
print(predictions.shape)
print(batch.label.shape)
acc = binary_accuracy(predictions, batch.label)

At the very first place, you shouldn’t have copied the code blindly from a well known repo. Secondly, have you tried creating your own mini model first rather than using CNNs on text? The error is very clear, and what Ptrblck has suggested should help with that.

sorry sir, indeed I am trying from someone else’s repo, and I have also previously asked permission to try it

Try removing that permute and re-run?

Do you see this issue in the first iteration or later during the training?
Could you add these print statements and check the shapes?

print(batch.tweet.shape)
print(predictions.shape)
print(batch.label.shape)
acc = binary_accuracy(predictions, batch.label)

Add print statements in the forward to see yourself where it’s breaking;

if code :

text = text.permute(1, 0)

i delete it, at this time I found an error :

TypeError: '<' not supported between instances of 'Example' and 'Example'

is the reason is that if sort_key is not specified, it defers it to the underlying dataset. In the tutorial they used the IMDB dataset, which defines the sort_key to be x.text? if yes, how to specify it manually sir? :wink: