ValueError: Expected input batch_size (324) to match target batch_size (4)

SRJ1 · March 16, 2024, 7:29am

Hi,
I am facing issue. pls help
Labels shape: torch.Size([4])
inputs shape: torch.Size([4, 512])

ValueError Traceback (most recent call last)
in <cell line: 14>()
20 print(“Labels shape:”, labels.shape)
21 print(“inputs shape:”, inputs.shape)
—> 22 outputs = model(inputs, attention_mask=attention_mask, labels=labels)
23 loss = outputs.loss # Use the loss directly from the model’s output
24 loss.backward()

9 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
3057 if size_average is not None or reduce is not None:
3058 reduction = _Reduction.legacy_get_string(size_average, reduce)
→ 3059 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
3060
3061

ValueError: Expected input batch_size (2048) to match target batch_size (4).

code

class NERDataset(Dataset):
def init(self, texts, labels, tokenizer,label_map):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.label_map=label_map

def __len__(self):
    return len(self.texts)

def encode_label_to_id(self,label):
  return self.label_map[label]

def __getitem__(self, idx):
  text = self.texts[idx]
  label = self.labels[idx]
  encoding = self.tokenizer(text, truncation=True, padding='max_length', return_tensors='pt', max_length=512)
  # Tokenize the text and convert to input features
  # Extract input_ids and attention_mask
  input_ids = encoding['input_ids'].flatten()
  attention_mask = encoding['attention_mask'].flatten()
  label_id =self.encode_label_to_id(label)

  return {
      'input_ids': input_ids,
      'attention_mask': attention_mask,
      'labels': torch.tensor(label_id, dtype=torch.long) # Use the encoded label
  }

train_dataset = NERDataset(train_texts, train_labels, tokenizer,label_map)

val_dataset = NERDataset(val_texts, val_labels, tokenizer,label_map)
batch_size = 4 # Adjust as needed

train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_dataloader = DataLoader(val_dataset, batch_size=batch_size)
import torch
from transformers import AdamW
from transformers import get_linear_schedule_with_warmup
import numpy as np

optimizer = AdamW(model.parameters(), lr=5e-5) # Example learning rate, adjust as needed
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=len(train_dataset))

Define your loss function

loss_fn = torch.nn.CrossEntropyLoss()

Fine-tune the model

model.train()
for epoch in range(1):
for batch in train_dataloader:
optimizer.zero_grad()
inputs = batch[‘input_ids’]
attention_mask = batch[‘attention_mask’]
labels = batch[‘labels’]
print(“Labels shape:”, labels.shape)
print(“inputs shape:”, inputs.shape)
outputs = model(inputs, attention_mask=attention_mask, labels=labels)
loss = outputs.loss # Use the loss directly from the model’s output
loss.backward()
optimizer.step()
scheduler.step()

Evaluate the fine-tuned model

model.eval()
for batch in val_dataloader:
inputs, labels = batch
with torch.no_grad():
outputs = model(**inputs)

ptrblck · March 16, 2024, 1:51pm

It seems you are flattening the output at one point, which causes the shape mismatch.
Your code is neither properly formatted nor executable so check your model definition and isolate where the forward activations are reshaped to a flat tensor.

Uts_geek · April 4, 2024, 9:37am

I’m encountering a ValueError during the training phase of my Sequence classification labeling model using PyTorch. I am doing vectorization using word2vec and feeding the numerical data into RNN model.
The error suggests a mismatch between the input batch size and the target batch size. Here’s the section of the code where the error occurs:

Running the training loops

N_EPOCHS = 3 # Number of epochs

best_valid_loss = float(‘inf’)
print(f"Using {‘GPU’ if str(DEVICE) == ‘cuda’ else ‘CPU’} for training.")

for epoch in range(N_EPOCHS):
start_time = time.time() # Start time of the epoch

train_loss, train_acc = train(model, train_loader, optimizer, criterion, DEVICE)
print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')

valid_loss, valid_acc = evaluate(model, valid_loader, criterion, DEVICE)
print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

end_time = time.time()  # End time of the epoch

epoch_mins, epoch_secs = epoch_time(start_time, end_time)  # Calculate epoch duration

print(f'Epoch: {epoch+1} | Epoch Time: {epoch_mins}m {epoch_secs}s')
print(f'\tTrain Loss: {train_loss:.3f}, Train Acc: {train_acc*100:.2f}%')
print(f'\tValid Loss: {valid_loss:.3f}, Valid Acc: {valid_acc*100:.2f}%')

if valid_loss < best_valid_loss:
    best_valid_loss = valid_loss
    torch.save(model.state_dict(), 'rnn_model.pt')

I have tried so many things but I can’t get rid of the issue. Please help!

@ptrblck

ptrblck · April 4, 2024, 12:37pm

Your current code snippet does not show anything interesting as I assume the error is raised in the train method. Try to come up with a minimal and executable code snippet reproducing the issue.

Uts_geek · April 4, 2024, 8:34pm

@ptrblck Thanks for the prompt response!
I will try to write minimal code (sorry it is a bit long coz everything is linked) related to the issue.

My optimizer and loss function is:

optimizer = optim.SGD(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss(ignore_index=-100)

I did some manual prints (during epoch stage) to understand the error:

Batch shapes - Vectors: torch.Size([1, 300]), Lengths: torch.Size([1]), Labels: torch.Size([1, 61])
Before Forward Pass - Input Vectors Shape: torch.Size([1, 300]), Input Lengths Shape: torch.Size([1])
After Forward Pass - Predictions Shape: torch.Size([1, 300, 4])
Before Loss Calculation - Predictions Shape: torch.Size([300, 4]), Labels Shape: torch.Size([61])

My Error: ValueError: Expected input batch_size (300) to match target batch_size (61).

Relevant code

class TextDataset(Dataset):
    def __init__(self, vectors, labels):
      self.vectors = vectors
      self.labels = labels

    def __len__(self):
      return len(self.vectors)

    def __getitem__(self, idx):
      return self.vectors[idx], self.labels[idx]
        # vector = self.vectors[idx]
        # label = self.labels[idx]
        # return torch.tensor(vector, dtype=torch.float), torch.tensor(label, dtype=torch.long) # Converting vec,label to tensors

train_vectors = df_train['sentence_vectors'].tolist()
train_dataset = TextDataset(train_vectors, train_labels)

def collate_fn(batch):
    vectors, labels = zip(*batch)
    vectors_padded = pad_sequence([torch.tensor(vec, dtype=torch.float) for vec in vectors], batch_first=True, padding_value=0)
    labels_padded = pad_sequence([torch.tensor(label, dtype=torch.long) for label in labels], batch_first=True, padding_value=-100) 

    lengths = torch.tensor([len(vec) for vec in vectors])

# Then I created dataloaders for each split. Just writing one here 
batch_size = 1
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_fn, num_workers=2)
    return vectors_padded, lengths, labels_padded

class RNN(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, pretrained_embeddings):
        super().__init__()
        self.embedding = nn.Embedding(input_dim, embedding_dim)
        self.embedding.weight.data.copy_(pretrained_embeddings)  # Copy Word2Vec embeddings
        self.embedding.weight.requires_grad = False  # Optionally freeze embeddings
        self.rnn = nn.RNN(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)  # Multiple classes (4)

    def forward(self, texts, lengths):
        texts = texts.long()  # Convert to LongTensor
        embedded = self.embedding(texts)
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, lengths.cpu(), batch_first=True, enforce_sorted=False)
        packed_output, _ = self.rnn(packed_embedded)
        output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output, batch_first=True)
        # Apply the fully connected layer to each time step output
        output = self.fc(output)
        return output

Lastly this is my model

INPUT_DIM = len(word2vec_model.wv) # Vocabsize
EMBEDDING_DIM = num_features
HIDDEN_DIM = 256
OUTPUT_DIM = 4 # 4 labels 
PRETRAINED_EMBEDDINGS = torch.FloatTensor(word2vec_model.wv.vectors) # Convert Word2Vec embeddings to Torch tensor
model = RNN(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM, PRETRAINED_EMBEDDINGS)

Ashwin_Binu_Abraham · April 26, 2024, 3:42pm

ValueError: Expected input batch_size (6128) to match target batch_size (1904)

Can you help me with this issue,

class QAModel(L.LightningModule):
  def __init__(self, model_id, config):                                                 
    super().__init__()
    self.model = None
    if(config):
      self.model = AutoModelForCausalLM.from_pretrained("4Ashwin/phi-2-medquad", trust_remote_code=True)
    self.config = config

  def forward(self, input_ids, attention_mask, labels=None):
    output = self.model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
    return output.loss, output.logits

  def training_step(self, batch, batch_idx):
    input_ids = batch["input_ids"]
    attention_mask = batch["attention_mask"]
    labels = batch["labels"]
    print("Training Step: Input Shape - input_ids:", input_ids.shape, "attention_mask:", attention_mask.shape, "labels:", labels.shape)
    loss, outputs = self(input_ids, attention_mask, labels)
    self.log("train_loss", loss, prog_bar=True, logger=True)
    return {"loss": loss, "predictions":outputs, "labels": labels}

  def validation_step(self, batch, batch_idx):
    input_ids = batch["input_ids"]
    attention_mask = batch["attention_mask"]
    labels = batch["labels"]
    print("Validation Step : Input Shape - input_ids:", input_ids.shape, "attention_mask:", attention_mask.shape, "labels:", labels.shape)
    loss, outputs = self(input_ids, attention_mask, labels)
    self.log("val_loss", loss, prog_bar=True, logger=True)
    return loss

  def test_step(self, batch, batch_idx):
    input_ids = batch["input_ids"]
    attention_mask = batch["attention_mask"]
    labels = batch["labels"]
    print("Test Step -  Input Shape - input_ids:", input_ids.shape, "attention_mask:", attention_mask.shape, "labels:", labels.shape)
    loss, outputs = self(input_ids, attention_mask, labels)
    self.log("test_loss", loss, prog_bar=True, logger=True)
    return loss

  def configure_optimizers(self):
    return AdamW(self.parameters(), lr=0.0001)


model = QAModel(model_id, config)

# Check model's final layer
final_layer = list(model.model.children())[-1]
in_features = final_layer.in_features
out_features = final_layer.out_features
print("hellooooo",in_features,out_features)


logger = TensorBoardLogger("lightning_logs", name="medqa-model")

def get_callback(filename="medqa-model"):
    checkpoint_callback = ModelCheckpoint(
        dirpath="checkpoints",
        filename=filename,
        save_top_k=1,
        verbose=True,
        monitor="val_loss",
        mode="min"
    )
    return checkpoint_callback

checkpoint_callback = get_callback("best-checkpoint-teacher")
trainer = L.Trainer(
    logger=logger,
    callbacks=[checkpoint_callback],
    num_nodes=1,
    max_epochs=1
)
trainer.fit(model, data_module)

output contains:
hellooooo 2560 51200
Validation Step : Input Shape - input_ids: torch.Size([16, 384]) attention_mask: torch.Size([16, 384]) labels: torch.Size([16, 120])

I am trying to apply knowledge distillation on my finetuned language model,
the i got the KD code from online of a Huggingface model t5-small, with dataset SQuAD,
while i am trying to apply it another model and have data_module for corresponding custom data.

ptrblck · April 27, 2024, 3:09am

It’s hard to tell what exactly is causing the shape mismatch as the model’s internals are unknown. You might want to check make sure the batch sizes of the input and target are the same. If so, check the stacktrace and based on it try to narrow down which layer or operation fails. Once isolated check if any previous operations change the batch size.

DILLIGANESH · August 7, 2024, 7:22am

import os
import json
import numpy as np
import torch
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from collections import Counter
import pickle

Constants

OUTPUT_DIM = 126 # Each landmark point has 3 coordinates (x, y, z)
HIDDEN_DIM = 126
NUM_LAYERS = 4
BATCH_SIZE = 32
EPOCHS = 2
LEARNING_RATE = 0.001
MAX_LENGTH = 100 # Maximum sentence length
NUM_LANDMARKS = 42 # Number of landmark points
EOS_TOKEN = “”
DEBUG = False
MODEL_NAME = “sign_language_model_CSLT_MMDA_edu_5082024”
ROOT_FOLDER = r"D:\Downloads\Constient\sign-motion-regeneration\data"
VOCAB_NAME = “vocab_5082024.pkl”

class LoadData:
def init(self, NUM_LANDMARKS=42, DEBUG=False, NUM_SENTENCES=2):
self.NUM_LANDMARKS = NUM_LANDMARKS
self.DEBUG = DEBUG
self.DEBUG_LIMIT = NUM_SENTENCES

def load_landmarks_from_files(self, root_folder):
    landmarks = []
    labels = []

    for limit, class_name in enumerate(os.listdir(root_folder)):
        annotation_text = os.path.join(root_folder, class_name)

        print("=" * 42)
        print(annotation_text, limit)
        if self.DEBUG and limit >= self.DEBUG_LIMIT:
            break

        if os.path.isdir(annotation_text):
            for video_dir in os.listdir(annotation_text):
                video_landmarks_dir = os.path.join(annotation_text, video_dir)

                for file_name in os.listdir(video_landmarks_dir):
                    if file_name.endswith('.json'):
                        file_path = os.path.join(video_landmarks_dir, file_name)
                        with open(file_path, 'r') as f:
                            data_list = json.load(f)
                        landmarks_array = np.array([list(data.values()) for data in data_list])

                        if landmarks_array.shape[0] == self.NUM_LANDMARKS:
                            landmarks.append(landmarks_array)
                            labels.append(class_name)

    return np.array(landmarks), np.array(labels)

def load_landmarks_from_files_as_tensor(self, root_folder):
    landmarks, labels = self.load_landmarks_from_files(root_folder)
    return torch.from_numpy(landmarks), labels

class Vocabulary:
def init(self, freq_threshold):
self.itos = {0: “”, 1: “”, 2: “”, 3: “”}
self.stoi = {“”: 0, “”: 1, “”: 2, “”: 3}
self.freq_threshold = freq_threshold

def __len__(self):
    return len(self.itos)

@staticmethod
def tokenize(text):
    return text.lower().split()

def build_vocabulary(self, sentence_list):
    frequencies = Counter()
    idx = 4

    for sentence in sentence_list:
        for word in self.tokenize(sentence):
            frequencies[word] += 1
            if frequencies[word] == self.freq_threshold:
                self.stoi[word] = idx
                self.itos[idx] = word
                idx += 1

def numericalize(self, text):
    tokenized_text = self.tokenize(text)
    return [
        self.stoi[token] if token in self.stoi else self.stoi["<UNK>"]
        for token in tokenized_text
    ]

Custom dataset

class SignLanguageDataset(Dataset):
def init(self, sentences, landmarks, vocab):
self.landmarks = landmarks
self.sentences = sentences
self.vocab = vocab

def __len__(self):
    return len(self.landmarks)

def __getitem__(self, idx):
    landmark = torch.tensor(self.landmarks[idx], dtype=torch.float32)
    raw_sentence = self.sentences[idx]
    numericalized_sentence = [self.vocab.stoi["<SOS>"]]
    numericalized_sentence += self.vocab.numericalize(raw_sentence)
    numericalized_sentence.append(self.vocab.stoi["<EOS>"])
    return torch.tensor(numericalized_sentence,dtype=torch.long),landmark

Encoder class

class Encoder(nn.Module):
def init(self, vocab_size, hidden_dim, num_layers):
super().init()
self.embedding = nn.Embedding(vocab_size, hidden_dim)
self.lstm = nn.LSTM(hidden_dim, hidden_dim, num_layers, batch_first=True)

def forward(self, x):
    x = self.embedding(x)
    print(f"Encoder: {x.shape}")
    _, (hidden, cell) = self.lstm(x)
    return hidden, cell

Decoder class

class Decoder(nn.Module):
def init(self, hidden_dim, output_dim, num_layers):
super().init()
self.lstm = nn.LSTM(hidden_dim, hidden_dim, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_dim, output_dim) # output_dim is the number of landmarks * 3 (for x, y, z)

def forward(self, x, hidden, cell):
    print(f"Decoder: {x.shape}")
    output, (hidden, cell) = self.lstm(x, (hidden, cell))
    prediction = self.fc(output)
    return prediction, hidden, cell

Seq2Seq model

class Seq2Seq(nn.Module):
def init(self, encoder, decoder):
super().init()
self.encoder = encoder
self.decoder = decoder

def forward(self, src, trg):
    print(f"trg:{trg.shape}")
    batch_size = trg.shape[0]
    trg_len = trg.shape[1]
    trg_dim = self.decoder.fc.out_features
    # print(f"trg_dim:{trg_dim}")
    # print(f"trg_len:{trg_len}")
    outputs = torch.zeros(batch_size, trg_len, trg_dim).to(trg.device)

    hidden, cell = self.encoder(src)
    
    input = torch.zeros((batch_size, 1, trg_dim), device=trg.device)  # Initial input, usually the embedding of <SOS>
    print(input.size())
    for t in range(1, trg_len):
        output, hidden, cell = self.decoder(input, hidden, cell)
        outputs[:, t] = output.squeeze(1)
        # teacher_force = torch.rand(1).item() < 0.5
        # input = trg[:, t].unsqueeze(1) if teacher_force else output 
    print(f"output shape:{outputs.shape}")
    return outputs

Function to train the model

def train(model, iterator, optimizer, criterion, device):
model.train()
epoch_loss = 0
for batch in iterator:
src, trg = batch
src, trg = src.to(device), trg.to(device)
print(src.size())
print(trg.size())
optimizer.zero_grad()
output = model(src, trg) #trg[::-1] changed
output = output.contiguous().view(-1, output.shape[-1])
trg = trg[:, 1:].contiguous().view(-1)

    loss = criterion(output, trg)
    loss.backward()
    optimizer.step()

    epoch_loss += loss.item()

return epoch_loss / len(iterator)

Function to save the model

def save_model(model, path):
torch.save(model.state_dict(), path)

Function to load the model

def load_model(model, path):
model.load_state_dict(torch.load(path))
return model

Infer function

def infer(model, text, vocab, device):
model.eval()
with torch.no_grad():
# Tokenize and numericalize the input text
tokens = [vocab.stoi[“”]] + vocab.numericalize(text) + [vocab.stoi[“”]]
src = torch.tensor(tokens, dtype=torch.long).unsqueeze(0).to(device)

    # Pass the tokenized text through the encoder
    hidden, cell = model.encoder(src)

    # Initialize the input for the decoder with a zero tensor
    input = torch.zeros((1, 1, NUM_LANDMARKS * INPUT_DIM), device=device)
    
    outputs = []

    # Generate landmarks
    for _ in range(MAX_LENGTH):
        output, hidden, cell = model.decoder(input, hidden, cell)
        output_np = output.squeeze(0).cpu().numpy()
        outputs.append(output_np)
        # Check if all landmark points are zeros
        if np.all(output_np == 0):
            break
        input = output  
            # Convert the list of outputs to a numpy array
    outputs = np.array(outputs)

return outputs

Main execution

if name == “main”:

load_data = LoadData(NUM_LANDMARKS=42, DEBUG=DEBUG, NUM_SENTENCES=2)

train_landmarks, train_sentences = load_data.load_landmarks_from_files(ROOT_FOLDER)


# Create vocabulary
vocab = Vocabulary(freq_threshold=2)
vocab.build_vocabulary(train_sentences)

import pickle

with open(os.path.join('output', VOCAB_NAME), 'wb') as f:
    pickle.dump(vocab, f)

INPUT_DIM = len(vocab)
# Create dataset and dataloader
train_dataset = SignLanguageDataset(train_sentences, train_landmarks,vocab)
train_iterator = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)

# Initialize model
encoder = Encoder(INPUT_DIM, HIDDEN_DIM, NUM_LAYERS)
decoder = Decoder(HIDDEN_DIM, OUTPUT_DIM,NUM_LAYERS)
model = Seq2Seq(encoder, decoder)
print(model)
# Define optimizer and loss
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)
criterion = nn.CrossEntropyLoss(ignore_index=vocab.stoi["<PAD>"])

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("[INFO] Available device: ", device)
model = model.to(device)

loss_history = []
# Training loop
for epoch in range(EPOCHS):
    train_loss = train(model, train_iterator, optimizer, criterion, device)
    loss_history.append(train_loss)
    print(f"Epoch: {epoch:02}, Train Loss: {train_loss:.3f}")
    if epoch % 10 == 0:
        print("[INFO] Saving model for epoch ", epoch)
        save_model(model, os.path.join('output', f'{MODEL_NAME}_{epoch}.pt'))

print("Loss History: ", loss_history)

# Save the model
save_model(model, os.path.join('output',f'{MODEL_NAME}.pt'))

==========================================
D:\Downloads\Constient\sign-motion-regeneration\data\Bicycle 0
Seq2Seq(
(encoder): Encoder(
(embedding): Embedding(5, 126)
(lstm): LSTM(126, 126, num_layers=4, batch_first=True)
)
(decoder): Decoder(
(lstm): LSTM(126, 126, num_layers=4, batch_first=True)
(fc): Linear(in_features=126, out_features=126, bias=True)
)
)
[INFO] Available device: cpu
torch.Size([32, 3])
torch.Size([32, 42, 3])
trg:torch.Size([32, 42, 3])
Encoder: torch.Size([32, 3, 126])
torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
Decoder: torch.Size([32, 1, 126])
output shape:torch.Size([32, 42, 126])
Traceback (most recent call last):
File “D:\Downloads\Constient\sign-motion-regeneration\test.py”, line 280, in
train_loss = train(model, train_iterator, optimizer, criterion, device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Downloads\Constient\sign-motion-regeneration\test.py”, line 190, in train
loss = criterion(output, trg)
^^^^^^^^^^^^^^^^^^^^^^
File “D:\Downloads\Constient\sign-motion-regeneration\venv\Lib\site-packages\torch\nn\modules\module.py”, line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Downloads\Constient\sign-motion-regeneration\venv\Lib\site-packages\torch\nn\modules\module.py”, line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Downloads\Constient\sign-motion-regeneration\venv\Lib\site-packages\torch\nn\modules\loss.py”, line 1188, in forward
return F.cross_entropy(input, target, weight=self.weight,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Downloads\Constient\sign-motion-regeneration\venv\Lib\site-packages\torch\nn\functional.py”, line 3104, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Expected input batch_size (1344) to match target batch_size (3936).
Anyone please help me with this