What is resolving 'RuntimeError: Size Mismatch in Tensor Dimension' in Signal Generation using Pre-trained RoBERTa?

I want to learn the obtained signal with pre-trained RoBERTa and generate the signal, but I get a RuntimeError: The size of tensor a (768) must match the size of tensor b (374125) at non-singleton dimension 1 error. What can I do to improve it?

import torch
from transformers import RobertaTokenizer, RobertaModel
from torch.nn import Linear
from torch.optim import Adam

# Load RoBERTa tokenizer and model
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = RobertaModel.from_pretrained('roberta-base')

# Add a linear layer on top of RoBERTa to generate signal
signal_size = 2993 * 125
model.classifier = Linear(model.config.hidden_size, 374125)
# Prepare your signal data as input sequences and target signals
input_sequences = [tokenizer.encode("signal {}".format(i), return_tensors='pt') for i in range(16)]
target_signals = [train_all_data.transpose(1,3,0,2).reshape(16, signal_size)[i] for i in range(16)]
target_signals = [torch.tensor(signal) for signal in target_signals]

# Define loss function and optimizer
criterion = torch.nn.MSELoss()
optimizer = Adam(model.parameters(), lr=0.001)

# Train the model on your signal data
for epoch in range(1000):
    for input_ids, target in zip(input_sequences, target_signals):
        outputs = model(input_ids)[0].mean(dim=1)
        loss = criterion(outputs, target)
# Use the fine-tuned model to generate a new signal
input_ids = torch.tensor(tokenizer.encode("Generate a new signal", return_tensors='pt')).unsqueeze(0)
outputs = model(input_ids)[0].mean(dim=1)
new_signal = outputs.detach().numpy()

It’s not clear what the issue is without more context on where the error is coming from. It is a shape mismatch and it looks like it is originating from some step past the classifier layer (perhaps that output is used somehow with the original hidden dim size before being returned), but it is difficult to say without more detailed information about what part of the model is producing the error.

This is done with loss = criterion(outputs, target).
What we want to do is first run the RoBERTa model with 1,000 training steps on 20 channels of EEG data for each class. After that, there is a paper that uses n RoBERTa to generate n classes of synthetic data.

Right, I mean that it’s difficult to debug this without a runnable example script (e.g., the definition of train_all_datais missing).