CUDA parallel for loop with BERT tokenizer

I need to detokenize a batch of 8 input_ids tensors and apply a function to each single sentence tensor. I have a function():

def function(sentence):
    for source in sentence:
        for target in sentence:
            # DO STUFF WITH source AND target

And a model with a forward() method:

def forward(input_ids, tokenizer):
    sentences_batch = tokenizer.batch_decode(input_ids, skip_special_tokens=False)
    for sentence in sentences_batch:
        tensor = function(sentence)
        batch.append(tensor)
    result = torch.stack(batch)
    # DO STUFF WITH result

Does exist a way to leverage CUDA to run in parallel the for loop in the forward() method? Will .to(device) solve my problem? If yes, how I can put this statement?
I run the training script where forward() appears with:

python3 -m torch.distributed.launch --nproc_per_node 1 training.py

Thanks in advance.