Why the cuda 0 is aways used in my case?

I am training a simple classifier with BERT pretrained, the example code is very simple

# ------ Define Model
MyModel(nn.Module):
    def __init__(self, output_size):
         self.bert = BertModel.from_pretrained("...",  output_attention=False, ouput_hidden_states=False)
         self.linear = nn.Linear(bert.config.hidden_size, output_size)

    def forward(self,  tokenized_input):
         return self.linear(self.bert(tokenized_input).last_hidden_state)

model = MyModel().to(device)
#------
for tokenized_inputs,  tokenized_lables in data_loader:
      tokenized_inputs = tokenized_inputs.to(device)
      tokenized_labels = tokenized_labels.to(device)
      pred = model(tokenized_inputs)
      loss = loss_function(pred, tokenized_lables)

When I run the trainer, I had specified the device by script parameter, say ,If I specify gpu_id=1, then inside the script, device = get_device(“cuda:1”) will be used.

But I found that , not only my CUDA:1 card is used, and the CUDA:0 card is also taken by other process. I had iterate all parameters in MyModel instance, all the weights are in CUDA:1, and all my data batches are in CUDA:1, So why there is the another card used?
and I found that no matter which card I had specified with script argument, there is always a second process taking the CUDA:0;

Can any body explain why and give me suggestion to let my training take only one process and only taken the card I had specified 【except using the CUDA_VISIBLE_DEVICES】?

  ...