I need to detokenize a batch of 8 input_ids tensors and apply a function to each single sentence tensor. I have a function()
:
def function(sentence):
for source in sentence:
for target in sentence:
# DO STUFF WITH source AND target
And a model with a forward()
method:
def forward(input_ids, tokenizer):
sentences_batch = tokenizer.batch_decode(input_ids, skip_special_tokens=False)
for sentence in sentences_batch:
tensor = function(sentence)
batch.append(tensor)
result = torch.stack(batch)
# DO STUFF WITH result
Does exist a way to leverage CUDA to run in parallel the for loop in the forward()
method? Will .to(device)
solve my problem? If yes, how I can put this statement?
I run the training script where forward()
appears with:
python3 -m torch.distributed.launch --nproc_per_node 1 training.py
Thanks in advance.