Need help with implementing my own model based on a pretrained model

Hello,

I am trying to implement my own model based on RoBERTa. My Python code looks like below.

class My_RoBERTa(nn.Module):
    def __init__(self, some_config, some_version):
        super().__init__()
        self.roberta = RobertaModel.from_pretrained(some_version)
    def forward(self, input_ids, attn_masks, labels=None):
        print(1)
        print(input_ids.size())
        print('* *')
        print(attn_masks.size())
        print('* *')
        outputs = self.roberta(input_ids, attn_masks)
        print(3)
        print(4)
        print(5)

If I instantiate this model and pass 3 examples with a length = 25, then my expectation of output is below

1
torch.Size([3,25])
* *
torch.Size([3,25])
* *
3
4
5

However, when I run it, I actually get something like below.

1
torch.Size([3,25])
* *
torch.Size([3,25])
* *
1
torch.Size([3,25])
* *
torch.Size([3,25])
* *
3
3
4
4
5
5

I think this is some kind of overwriting issue caused by the “forward” method, but I do not know how to resolve this problem. Could anyone please help me out?

Thank you so much in advance!

Are you using a data parallel approach (e.g. via nn.DataParallel or DistributedDataParallel)?
If so, then the output would be expected as the model is executed on each device.