Parallel computations on a whole batch


I have a problem where i need to calculate a simularity matrix before i can calculate my loss function. This simularity matrix takes a lot of time and is only dependant on the batch data itself. That is why i would like to calculate it in parrallel before the batch has started but i don’t know how i can do that. Is it possible with the torch.Dataloader? I know you can preprocess the data there and transform each sample on his own but can you aswell return a computation on the whole batch itself? Or is there another way to do this?

Depending on the speed of this operation, you might calculate the similarity matrix inside the DataLoader loop, while the GPU is busy with the model training.
However, I’m not sure what method you are using and how slow/fast it is on the CPU and GPU.

Could you post the applied method and the relative time it’s using in a training iteration?