Element wise tensor multiplication and sums

Can DataParallel / DistributedDataParallel be used for basic tensor operations and sums without a model being involved?

For example, I have a custom loss function with computation that blows up the size of the output tensors from the model and I want to split these operations across GPUs. It is mostly basic operations such as element wise multiplication, sums and cumsums. I tried using DDP but it requires a module that has parameters which is not the case with the loss function.

No, DistributedDataParallel wasn’t designed to share operations but you could manually move parts of your tensors to another device via the .to() operation.