How can I change self attention layer numbers and multihead attention head numbers in my model with Pytorch?

I working on sarcasm dataset and my model like below:

I first tokenize my input text:


PRETRAINED_MODEL_NAME = “roberta-base”
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)
import torch
from torch.utils.data import Dataset, DataLoader

MAX_LEN = 100

then I defined class for my dataset:

class SentimentDataset (Dataset):
    def __init__(self,dataframe):
        self.dataframe = dataframe

    def __len__(self):
        return len(self.dataframe)
    
    def __getitem__(self, idx):
        df = self.dataframe.iloc[idx]

        text = [df["comment"]]
        label = [df["label"]]

        data_t = tokenizer(text,max_length = MAX_LEN, return_tensors="pt", padding="max_length", truncation=True)
        label_t = torch.LongTensor(label)

        return {
             "input_ids":data_t["input_ids"].squeeze().to(device),
             "label": label_t.squeeze().to(device),
        }

then I create obj from my class for training set and set other parameters:

train_dataset = SentimentDataset(train_df)
BATCH_SIZE = 32
train_dataloader = DataLoader(train_dataset, batch_size = BATCH_SIZE)
from transformers import AutoModelForSequenceClassification, AutoConfig

# For loading model stucture and pretrained weights:
model = AutoModelForSequenceClassification.from_pretrained(PRETRAINED_MODEL_NAME).to(device)

import transformers


optimizer = torch.optim.Adam(model.parameters(), lr=2e-5, weight_decay=1e-5)

Then I use dataloader for training my data:

train_dataloader = DataLoader(train_dataset, batch_size = BATCH_SIZE)
EPOCHS = 5
for epoch in range(EPOCHS):
    print("\n******************\n epoch=",epoch)
    i = 0
    logits_list = []
    labels_list = []
    for batch in train_dataloader:
        i += 1
        optimizer.zero_grad()
        output_model = model(input_ids = batch["input_ids"], labels = batch["label"])
        loss = output_model.loss
        logits = output_model.logits
        logits_list.append(logits.cpu().detach().numpy())
        labels_list.append(batch["label"].cpu().detach().numpy())
        loss.backward()
        optimizer.step()
    #scheduler.step()
        if i % 50 ==0 :
            print("training loss:",loss.item())
            #print("validation loss:",loss.item())
    logits_list = np.concatenate(logits_list, axis=0)
    labels_list = np.concatenate(labels_list, axis=0)
    logits_list = np.argmax(logits_list, axis =1)
    print(classification_report(labels_list, logits_list))

My question is how can I change self attention layers number and head of multihead attention in my model?

I do not think you can change the number of self attention layers in the network from the transformers library. Think about it. If you even manage to change the self attention layers number, the dimension of data that goes into the subsequent layers will change, which will cause a problem with dimensions.
Moreover, the size of the pre-trained weighs of the “roberta-base” is fixed, changing the any parameter(number of heads and self attention layers number), will make it not possible to load the pre-trained onto the “roberta-base” architecture.

You can take a look Transformer Multihead Attention