I am training a simple classifier with BERT pretrained, the example code is very simple
# ------ Define Model
MyModel(nn.Module):
def __init__(self, output_size):
self.bert = BertModel.from_pretrained("...", output_attention=False, ouput_hidden_states=False)
self.linear = nn.Linear(bert.config.hidden_size, output_size)
def forward(self, tokenized_input):
return self.linear(self.bert(tokenized_input).last_hidden_state)
model = MyModel().to(device)
#------
for tokenized_inputs, tokenized_lables in data_loader:
tokenized_inputs = tokenized_inputs.to(device)
tokenized_labels = tokenized_labels.to(device)
pred = model(tokenized_inputs)
loss = loss_function(pred, tokenized_lables)
When I run the trainer, I had specified the device by script parameter, say ,If I specify gpu_id=1, then inside the script, device = get_device(“cuda:1”) will be used.
But I found that , not only my CUDA:1 card is used, and the CUDA:0 card is also taken by other process. I had iterate all parameters in MyModel instance, all the weights are in CUDA:1, and all my data batches are in CUDA:1, So why there is the another card used?
and I found that no matter which card I had specified with script argument, there is always a second process taking the CUDA:0;
Can any body explain why and give me suggestion to let my training take only one process and only taken the card I had specified 【except using the CUDA_VISIBLE_DEVICES】?
...