Element-wise multiplication between constant tensor and variable tensor

Is this way suitable on multiple GPUs?

I use the weighted L1 loss on multiple GPUs, but it failed. The detailed information is shown in Weighted L1 loss in parallel training

Any suggestions?