How to initialize label embedding with pretrained model

The problem is I want to initialize the label embedding with a pretrained embedding.
My original network is like this

def Network(RobertaPreTrainedModel):
     self.roberta = RobertaModel(config, add_pooling_layer=False)
     self.label_emb = nn.Embedding(config.num_labels, config.hidden_size)

Now I want to have label embedding comes from label description like

self.label_ids = #Some ID Tensor
self.label_attention_mask = #Some MaskTensor

I try like above but when casting the model to GPU by

model, optimizer, eval_dataloader = accelerator.prepare(model, optimizer, eval_dataloader)

, it has error like

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! error
Then I find out self.label_emb is still in CPU device.

After adding code to convert it to GPU device it can perform the forward part ( loss= model.forward()) but cannot do gradient update (optimizer.step())
The error is like
One of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [768] is at version 2; expected version 1 instead

Then does anyone help me to solve this problem, or guide me to initialize label embedding with a pretrained model?