Input, output and indices must be on the current device

Hello, I have the below code which uses a fine tuned BERT model to predict if a sentence is positive or negative. I have 2 problems:

  1. The BertClassifier class is defined in another .py file where i trained and fine tuned the model. If i try to import just the class by from bert import BertClassifier and run the code, it will start to train the model again and that’s why i defined the class again. Is there any way i can use the import without starting the training?
  2. If I use the below code, I receive the error Input, output and indices must be on the current device. It works fine if it is run on cpu but it takes some time to return the result and this is why i would like to run it on GPU. What can i do, modify, to resolve this?
import torch
from transformers import BertTokenizer, BertModel
import torch.nn as nn

class BertClassifier(nn.Module):
 """Bert Model for Classification Tasks."""
 def __init__(self, freeze_bert=False):
  super(BertClassifier, self).__init__()
  # Instantiate BERT model
  self.bert = BertModel.from_pretrained('bert-base-multilingual-uncased')
  self.lstm = nn.LSTM(768, 50, batch_first=True, bidirectional=True)
  self.linear = nn.Linear(50*2 , 2)
  # Freeze the BERT model
  if freeze_bert:
   for param in self.bert.parameters():
    param.requires_grad = False

 def forward(self, input_ids, attention_mask):
  # Feed input to BERT
  outputs = self.bert(input_ids=input_ids,attention_mask=attention_mask)
  sequence_output = outputs[0]
  sequence_output, _ = self.lstm(sequence_output)
  linear_output = self.linear(sequence_output[:, -1])

  return linear_output
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = BertClassifier(freeze_bert=False)
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-uncased', do_lower_case=True)
model.load_state_dict(torch.load('finetuned_model.pt'))
max_input_len = tokenizer.max_model_input_sizes['bert-base-multilingual-uncased']
init_token_id = tokenizer.cls_token_id # input token
eos_token_id  = tokenizer.sep_token_id # end of sentence token
# function to make sentiment prediction during inference
def predict_sentiment(model, tokenizer, sentence):
  model.eval()
  tokens = tokenizer.tokenize(sentence)
  tokens = tokens[:max_input_len - 2]
  indexed = [init_token_id] + tokenizer.convert_tokens_to_ids(tokens) + [eos_token_id]
  tensor = torch.LongTensor(indexed).to(device)
  tensor = tensor.unsqueeze(0)
  padded_sequences = tokenizer([sentence], padding=True)
  attention_mask = padded_sequences["attention_mask"]
  attention_mask = torch.LongTensor(attention_mask)
  prediction = torch.sigmoid(model(tensor,attention_mask))
  if prediction[0][0] > prediction[0][1]:
     return "Negative"
  else:
      return "Positive"


sentiment = predict_sentiment(model, tokenizer, "It is such a wonderful weather outside.")
print(sentiment)
  1. I guess the other .py file doesn’t use a guard such as if __name__ == "__main__", but executes code in the global space directly. If that’s the case, move the executable code (which would start the training) into the if-clause protection.

  2. Make sure all model parameters as well as the inputs and targets are transferred properly to the GPU via .cuda() or .to('cuda'). Based on the error message it seems that at least some tensors are still on the CPU.

Thank you so much !!!