The problem is I want to initialize the label embedding with a pretrained embedding.
My original network is like this
def Network(RobertaPreTrainedModel): self.roberta = RobertaModel(config, add_pooling_layer=False) self.label_emb = nn.Embedding(config.num_labels, config.hidden_size)
Now I want to have label embedding comes from label description like
self.label_ids = #Some ID Tensor self.label_attention_mask = #Some MaskTensor self.label_emb=self.roberta(self.label_ids,attention_mask=self.label_attention_mask)
I try like above but when casting the model to GPU by
model, optimizer, eval_dataloader = accelerator.prepare(model, optimizer, eval_dataloader)
, it has error like
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! error
Then I find out self.label_emb is still in CPU device.
After adding code to convert it to GPU device it can perform the forward part ( loss= model.forward()) but cannot do gradient update (optimizer.step())
The error is like
One of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor  is at version 2; expected version 1 instead
Then does anyone help me to solve this problem, or guide me to initialize label embedding with a pretrained model?