I am training a simple classifier with BERT pretrained, the example code is very simple
# ------ Define Model MyModel(nn.Module): def __init__(self, output_size): self.bert = BertModel.from_pretrained("...", output_attention=False, ouput_hidden_states=False) self.linear = nn.Linear(bert.config.hidden_size, output_size) def forward(self, tokenized_input): return self.linear(self.bert(tokenized_input).last_hidden_state) model = MyModel().to(device) #------ for tokenized_inputs, tokenized_lables in data_loader: tokenized_inputs = tokenized_inputs.to(device) tokenized_labels = tokenized_labels.to(device) pred = model(tokenized_inputs) loss = loss_function(pred, tokenized_lables)
When I run the trainer, I had specified the device by script parameter, say ,If I specify gpu_id=1, then inside the script, device = get_device(“cuda:1”) will be used.
But I found that , not only my CUDA:1 card is used, and the CUDA:0 card is also taken by other process. I had iterate all parameters in MyModel instance, all the weights are in CUDA:1, and all my data batches are in CUDA:1, So why there is the another card used?
and I found that no matter which card I had specified with script argument, there is always a second process taking the CUDA:0;
Can any body explain why and give me suggestion to let my training take only one process and only taken the card I had specified 【except using the CUDA_VISIBLE_DEVICES】?